Release v0.12.0 · aphp/edsnlp

Changelog

Added

The eds.transformer component now accepts prompts (passed to its preprocess method, see breaking change below) to add before each window of text to embed.
LazyCollection.map / map_batches now support generator functions as arguments.
Window stride can now be disabled (i.e., stride = window) during training in the eds.transformer component by training_stride = False
Added a new eds.ner_overlap_scorer to evaluate matches between two lists of entities, counting true when the dice overlap is above a given threshold
edsnlp.load now accepts EDS-NLP models from the huggingface hub 🤗 !
New python -m edsnlp.package command to package a model for the huggingface hub or pypi-like registries

Changed

Trainable embedding components now all use foldedtensor to return embeddings, instead of returning a tensor of floats and a mask tensor.
💥 TorchComponent __call__ no longer applies the end to end method, and instead calls the forward method directly, like all torch modules.
The trainable eds.span_qualifier component has been renamed to eds.span_classifier to reflect its general purpose (it doesn't only predict qualifiers, but any attribute of a span using its context or not).
omop converter now takes the note_datetime field into account by default when building a document
span._.date.to_datetime() and span._.date.to_duration() now automatically take the note_datetime into account
nlp.vocab is no longer serialized when saving a model, as it may contain sensitive information and can be recomputed during inference anyway
💥 Major breaking change in trainable components, moving towards a more "task-centric" design:
- the eds.transformer component is no longer responsible for deciding which spans of text ("contexts") should be embedded. These contexts are now passed via the preprocess method, which now accepts more arguments than just the docs to process.
- similarly the eds.span_pooler is now longer responsible for deciding which spans to pool, and instead pools all spans passed to it in the preprocess method.

Consequently, the eds.transformer and eds.span_pooler no longer accept their span_getter argument, and the eds.ner_crf, eds.span_classifier, eds.span_linker and eds.span_qualifier components now accept a context_getter argument instead, as well as a span_getter argument for the latter two. This refactoring can be summarized as follows:

- eds.transformer.span_getter
+ eds.ner_crf.context_getter
+ eds.span_classifier.context_getter
+ eds.span_linker.context_getter

- eds.span_pooler.span_getter
+ eds.span_qualifier.span_getter
+ eds.span_linker.span_getter

and as an example for the eds.span_linker component:

nlp.add_pipe(
    eds.span_linker(
        metric="cosine",
        probability_mode="sigmoid",
+       span_getter="ents",
+       # context_getter="ents",  -> by default, same as span_getter
        embedding=eds.span_pooler(
            hidden_size=128,
-           span_getter="ents",
            embedding=eds.transformer(
-               span_getter="ents",
                model="prajjwal1/bert-tiny",
                window=128,
                stride=96,
            ),
        ),
    ),
    name="linker",
)

Fixed

edsnlp.data.read_json now correctly read the files from the directory passed as an argument, and not from the parent directory.
Overwrite spacy's Doc, Span and Token pickling utils to allow recursively storing Doc, Span and Token objects in the extension values (in particular, span._.date.doc)
Removed pendulum dependency, solving various pickling, multiprocessing and missing attributes errors

Pull Requests

Drop codecov by @percevalw in #292
Fix dates by @percevalw in #288
Loading models from the hf hub by @percevalw in #293
Fix: only reinstall hf model when cache files are changed by @percevalw in #295
feat: expose package script to cli by @percevalw in #294
chore: bump version to 0.12.0 by @percevalw in #296

Full Changelog: v0.11.2...v0.12.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.12.0

Changelog

Added

Changed

Fixed

Pull Requests

Contributors