v0.12.0
Changelog
Added
- The
eds.transformer
component now acceptsprompts
(passed to itspreprocess
method, see breaking change below) to add before each window of text to embed. LazyCollection.map
/map_batches
now support generator functions as arguments.- Window stride can now be disabled (i.e., stride = window) during training in the
eds.transformer
component bytraining_stride = False
- Added a new
eds.ner_overlap_scorer
to evaluate matches between two lists of entities, counting true when the dice overlap is above a given threshold edsnlp.load
now accepts EDS-NLP models from the huggingface hub 🤗 !- New
python -m edsnlp.package
command to package a model for the huggingface hub or pypi-like registries
Changed
- Trainable embedding components now all use
foldedtensor
to return embeddings, instead of returning a tensor of floats and a mask tensor. - 💥 TorchComponent
__call__
no longer applies the end to end method, and instead calls theforward
method directly, like all torch modules. - The trainable
eds.span_qualifier
component has been renamed toeds.span_classifier
to reflect its general purpose (it doesn't only predict qualifiers, but any attribute of a span using its context or not). omop
converter now takes thenote_datetime
field into account by default when building a documentspan._.date.to_datetime()
andspan._.date.to_duration()
now automatically take thenote_datetime
into accountnlp.vocab
is no longer serialized when saving a model, as it may contain sensitive information and can be recomputed during inference anyway- 💥 Major breaking change in trainable components, moving towards a more "task-centric" design:
- the
eds.transformer
component is no longer responsible for deciding which spans of text ("contexts") should be embedded. These contexts are now passed via thepreprocess
method, which now accepts more arguments than just the docs to process. - similarly the
eds.span_pooler
is now longer responsible for deciding which spans to pool, and instead pools all spans passed to it in thepreprocess
method.
- the
Consequently, the eds.transformer
and eds.span_pooler
no longer accept their span_getter
argument, and the eds.ner_crf
, eds.span_classifier
, eds.span_linker
and eds.span_qualifier
components now accept a context_getter
argument instead, as well as a span_getter
argument for the latter two. This refactoring can be summarized as follows:
- eds.transformer.span_getter
+ eds.ner_crf.context_getter
+ eds.span_classifier.context_getter
+ eds.span_linker.context_getter
- eds.span_pooler.span_getter
+ eds.span_qualifier.span_getter
+ eds.span_linker.span_getter
and as an example for the eds.span_linker
component:
nlp.add_pipe(
eds.span_linker(
metric="cosine",
probability_mode="sigmoid",
+ span_getter="ents",
+ # context_getter="ents", -> by default, same as span_getter
embedding=eds.span_pooler(
hidden_size=128,
- span_getter="ents",
embedding=eds.transformer(
- span_getter="ents",
model="prajjwal1/bert-tiny",
window=128,
stride=96,
),
),
),
name="linker",
)
Fixed
edsnlp.data.read_json
now correctly read the files from the directory passed as an argument, and not from the parent directory.- Overwrite spacy's Doc, Span and Token pickling utils to allow recursively storing Doc, Span and Token objects in the extension values (in particular, span._.date.doc)
- Removed pendulum dependency, solving various pickling, multiprocessing and missing attributes errors
Pull Requests
- Drop codecov by @percevalw in #292
- Fix dates by @percevalw in #288
- Loading models from the hf hub by @percevalw in #293
- Fix: only reinstall hf model when cache files are changed by @percevalw in #295
- feat: expose package script to cli by @percevalw in #294
- chore: bump version to 0.12.0 by @percevalw in #296
Full Changelog: v0.11.2...v0.12.0