Releases · aphp/edsnlp

22 Sep 09:06

percevalw

v0.9.1

d6d4c92

v0.9.1

Changelog

Changed

Improve negation patterns
Abstent disorders now set the negation to True when matched as ABSENT
Default qualifier is now None instead of False (empty string)

Fixed

span_getter is not incompatible with on_ents_only anymore
ContextualMatcher now supports empty matches (e.g. lookahead/lookbehind) in assign patterns

Pull Requests

Fix negations by @percevalw in #216
Chore: bump version to 0.9.1 by @percevalw in #218

Full Changelog: v0.9.0...v0.9.1

Contributors

percevalw

Assets 2

15 Sep 16:31

percevalw

v0.9.0

9157b96

v0.9.0

Changelog

Added

New to_duration method to convert an absolute date into a date relative to the note_datetime (or None)

Changes

Input and output of components are now specified by span_getter and span_setter arguments.
💥 Score / disorders / behaviors entities now have a fixed label (passed as an argument), instead of being dynamically set from the component name. The following scores may have a different name than the current one in your pipelines:
- eds.emergency.gemsa → emergency_gemsa
- eds.emergency.ccmu → emergency_ccmu
- eds.emergency.priority → emergency_priority
- eds.charlson → charlson
- eds.elston_ellis → elston_ellis
- eds.SOFA → sofa
- eds.adicap → adicap
- eds.measuremets → size, weight, ... instead of eds.size, eds.weight, ...
eds.dates now separate dates from durations. Each entity has its own label:
- spans["dates"] → entities labelled as date with a span._.date parsed object
- spans["durations"] → entities labelled as duration with a span._.duration parsed object
the "relative" / "absolute" / "duration" mode of the time entity is now stored in
the mode attribute of the span._.date/duration
the "from" / "until" period bound, if any, is now stored in the span._.date.bound attribute
to_datetime now only return absolute dates, converts relative dates into absolute if doc._.note_datetime is given, and None otherwise

Fixed

export_to_brat issue with spans of entities on multiple lines.

Pull Requests

Fix export_to_brat when there are spaces before new lines by @TheooJ in #211
Refacto of the extensions by @percevalw in #213
chore: bump version to 0.9.0 by @percevalw in #215

New Contributors

@TheooJ made their first contribution in #211

Full Changelog: v0.8.1...v0.9.0

Contributors

percevalw and TheooJ

Assets 2

20 Jul 13:24

percevalw

v0.8.1.post

b3c7ddd

v0.8.1

Post-release to synchronize Zenodo

Assets 2

31 May 11:42

percevalw

v0.8.1

b3c7ddd

v0.8.1

What's changed

Fix release to allow installation from source.

Pull Requests

Ship cython files in sdist by @percevalw in #210

Full Changelog: v0.8.0...v0.8.1

Contributors

percevalw

Assets 2

24 May 16:00

percevalw

v0.8.0

406fdc1

v0.8.0

Changelog

Added

New trainable component for multi-label, multi-class span qualification (any attribute/extension)
Add range measurements (like la tumeur fait entre 1 et 2 cm) to eds.measurements matcher
Add eds.CKD component
Add eds.COPD component
Add eds.alcohol component
Add eds.cerebrovascular_accident component
Add eds.congestive_heart_failure component
Add eds.connective_tissue_disease component
Add eds.dementia component
Add eds.diabetes component
Add eds.hemiplegia component
Add eds.leukemia component
Add eds.liver_disease component
Add eds.lymphoma component
Add eds.myocardial_infarction component
Add eds.peptic_ulcer_disease component
Add eds.peripheral_vascular_disease component
Add eds.solid_tumor component
Add eds.tobacco component
Add eds.spaces (or eds.normalizer with spaces=True) to detect space tokens, and add ignore_space_tokens to EDSPhraseMatcher and SimstringMatcher to skip them
Add ignore_space_tokens option in most components
eds.tables: new pipeline to identify formatted tables
New merge_mode parameter in eds.measurements to normalize existing entities or detect
measures only inside existing entities
Tokenization exceptions (Mr., Dr., Mrs.) and non end-of-sentence periods are now tokenized with the next letter in the eds tokenizer

Changed

Disable EDSMatcher preprocessing auto progress tracking by default
Moved dependencies to a single pyproject.toml: support for pip install -e '.[dev,docs,setup]'
ADICAP matcher now allow dot separators (e.g. B.H.HP.A7A0)

Fixed

Abbreviation and number tokenization issues in the eds tokenizer
eds.adicap : reparsed the dictionnary used to decode the ADICAP codes (some of them were wrongly decoded)
Fix build for python 3.9 on Mac M1/M2 machines.

What's changed

Pull Requests

docs: mention INRIA in the acknowledgment by @percevalw in #170
Umls fixes by @percevalw in #183
fix typo by @gammaeva in #179
add link and definiton for sofa in documentation by @strayMat in #182
CI fail exploration by @Thomzoy in #189
Repare parsing errors of the ADICAP dict by @etienneguevel in #187
Move dependencies to pyproject.toml by @percevalw in #190
Add tokenization exceptions and detect some false positive EOS by @percevalw in #192
Bump version to 0.8.0 by @percevalw in #194
Update docs by @percevalw in #196
Ignore space tokens by @percevalw in #198
pipe tables by @aricohen93 in #180
Range measurements by @percevalw in #195
SpanQualifier trainable component by @percevalw in #193
18 pipes from the Charlson Comorbidity Index by @Thomzoy in #205
Bump version to v0.8.0 by @percevalw in #209

New Contributors

@gammaeva made their first contribution in #179
@strayMat made their first contribution in #182

Full Changelog: v0.7.4...v0.8.0

Contributors

percevalw, strayMat, and 4 other contributors

Assets 2

12 Dec 14:36

percevalw

v0.7.4

acbd55a

v0.7.4

Changelog

Added

eds.history : Add the option to consider only the closest dates in the sentence (dates inside the boundaries and if there is not, it takes the closest date in the entire sentence).
eds.negation : It takes into account following past participates and preceding infinitives.
eds.hypothesis: It takes into account following past participates hypothesis verbs.
eds.negation & eds.hypothesis : Introduce new patterns and remove unnecessary patterns.
eds.dates : Add a pattern for preceding relative dates (ex: l'embolie qui est survenue à 10 jours).
Improve patterns in the eds.pollution component to account for multiline footers
Add QuickExample object to quickly try a pipeline.
Add UMLS terminology matcher eds.umls
New RegexMatcher method to create spans from groupdicts
New eds.dates option to disable time detection

Changed

Improve date detection by removing false positives

Fixed

eds.hypothesis : Remove too generic patterns.
EDSTokenizer : It now tokenizes "rechereche d'" as ["recherche", "d'"], instead of ["recherche", "d", "'"].
Fix small typos in the documentation and in the docstring.
Harmonize processing utils (distributed custom_pipe) to have the same API for Pandas and Pyspark
Fix BratConnector file loading issues with complex file hierarchies

Pull Requests

👓 Feedbacks from EDS-TeVa study by @Aremaki in #157
feat: 🩺 Update negation and hypothesis pipelines by @Aremaki in #162
Harmonize processing utils by @aricohen93 in #160
Update pattern footer (pollution) by @aricohen93 in #159
feat: add UMLS terminology (#147) by @percevalw in #165
Relax pydantic version constraints by @percevalw in #167
Allow back spacy dot components for backward compatibility by @percevalw in #152
Update docs by @percevalw in #168
Bump version to 0.7.3 by @percevalw in #169
Quick example by @Thomzoy in #166
Update index.md by @Thomzoy in #171
Fix brat file path search for complex file hierarchies by @percevalw in #172
Improve dates by @percevalw in #149
Bump version to 0.7.4 by @percevalw in #173

Full Changelog: v0.7.2...v0.7.4

Contributors

percevalw, Thomzoy, and 2 other contributors

Assets 2

26 Oct 20:54

percevalw

v0.7.2

66f2cce

v0.7.2

Changelog

Added

Improve the eds.history component by taking into account the date extracted from eds.dates component.
New pop up when you click on the copy icon in the termynal widget (docs).
Add NER eds.elston-ellis pipeline to identify Elston Ellis scores
Add flags=re.MULTILINE to eds.pollution and change pattern of footer

Fixed

Remove the warning in the eds.sections when eds.normalizer is in the pipe.
Fix filter_spans for strictly nested entities
Fill eds.remove-lowercase "assign" metadata to run the pipeline during EDSPhraseMatcher preprocessing

Pull Requests

Update patterns pollution by @aricohen93 in #145
feat: ✨ Improve eds.history component with eds.dates by @Aremaki in #144
Small fixes by @percevalw in #146
Elston and Ellis by @etienneguevel in #148
Fix setup.py by @percevalw in #151
Patch patterns norm by @aricohen93 in #150
Bump version to 0.7.2 by @percevalw in #153

Full Changelog: v0.7.1...v0.7.2

Contributors

percevalw, aricohen93, and 2 other contributors

Assets 2

13 Oct 09:34

percevalw

v0.7.1

cafaa4a

v0.7.1

Changelog

Added

Add new patterns (footer, web entities, biology tables, coding sections) to pipeline normalisation (pollution)

Changed

Improved TNM detection algorithm
Account for more modifiers in ADICAP codes detection

Fixed

Add nephew, niece and daughter to family qualifier patterns
EDSTokenizer (spacy.blank('eds')) now recognizes non-breaking whitespaces as spaces and does not split float numbers
eds.dates pipeline now allows new lines as space separators in dates

Pull Requests

add: new patterns to pollution by @Thomzoy in #132
docs: fix cim10 docs by @percevalw in #130
Remove print statement by @Thomzoy in #133
fix: param sampling AdicapCode by @etienneguevel in #131
Add nephew, niece and daughter to family qualifier patterns by @julienduquesne in #135
Modification of the TNM ner by @etienneguevel in #136
modification of the ADICAP ner by @etienneguevel in #137
EDSTokenizer: split on non-breaking spaces and don't split float numbers by @percevalw in #141
Allow newlines in dates by @percevalw in #142
new pattern norm pollution by @aricohen93 in #139
Bump version to 0.7.1 by @percevalw in #143

New Contributors

@etienneguevel made their first contribution in #131
@julienduquesne made their first contribution in #135

Full Changelog: v0.7.0...v0.7.1

Contributors

percevalw, Thomzoy, and 3 other contributors

Assets 2

06 Sep 16:39

percevalw

v0.7.0

be5c394

v0.7.0

Changelog

Added

New nested NER trainable nested_ner pipeline component
Support for nested entities and attributes in BratDataConnector
Pytorch wrappers and experimental training utils
Add attribute section to entities
Add new cases for separator pattern when components of the TNM score are separated by a forward slash
Add NER eds.adicap pipeline to identify ADICAP codes

Changed

Update of the ContextualMatcher (and all pipelines depending on it), rendering it more flexible to use
Rename R component of score TNM as "resection_completeness"

Fixed

Prevent section titles from capturing surrounding tokens, causing overlaps (#113)
Enhance existing patterns for section detection and add patterns for previously ignored sections (introduction, evolution, modalites de sortie, vaccination) .
Fix explain mode, which was always triggered, in eds.history factory.
Fix test in eds.sections. Previously, no check was done
Remove SOFA scores spurious span suffixes

Pull requests

Change links to streamlit demo by @percevalw in #111
Restore demo links by @percevalw in #112
Prevent section titles from capturing surrounding tokens by @percevalw in #114
Section upgrade by @paul-bssr in #115
Nested NER trainable pipeline component by @percevalw in #84
Fix history factory parameter type by @clementjumel in #117
Rename R component (TNM) by @aricohen93 in #119
Update separator pattern score TNM by @aricohen93 in #121
add section info to entities by @aricohen93 in #120
Adicap pipeline by @aricohen93 in #123
ContextualMatcher + ADICAP Update by @Thomzoy in #124
fix: handle single entity in contextual matcher by @Thomzoy in #126
Adicap model by @percevalw in #127
chore: bump version to 0.7.0 by @percevalw in #125
v0.7.0 + fixed package_data by @percevalw in #129

New Contributors

@paul-bssr made their first contribution in #115
@clementjumel made their first contribution in #117

Full Changelog: v0.6.2...v0.7.0

Contributors

percevalw, Thomzoy, and 3 other contributors

Assets 2

02 Aug 11:56

percevalw

v0.6.2

cbc585b

v0.6.2

Changelog

Added

New SimstringMatcher matcher to perform fuzzy term matching, and algorithm parameter in terminology components and eds.matcher component

Changed

Add consultation date pattern "CS", and False Positive patterns for dates (namely phone numbers and pagination).
Update the pipeline score eds.TNM. Now it is possible to return a dictionary where the results are either str or int values

Fixed

Add new patterns to the negation qualifier
Numpy header issues with binary distributed packages
Simstring dependency on Windows

Pull Requests

chore: add acknowledgement by @bdura in #102
TNM by @aricohen93 in #103
fix: eds.sentences behaviour with dates by @bdura in #99
Add consultation date pattern and date False Positive by @JCharline in #107
Simstring by @percevalw in #94
Fix numpy header issues with binary packages by @percevalw in #109
fix: add "non" preceding pattern by @bdura in #105
Bump version to v0.6.2 by @percevalw in #110

New Contributors

@JCharline made their first contribution in #107

Full Changelog: v0.6.1...v0.6.2

Contributors

percevalw, bdura, and 2 other contributors

Assets 2

Releases: aphp/edsnlp

v0.9.1

Changelog

Changed

Fixed

Pull Requests

Contributors

v0.9.0

Changelog

Added

Changes

Fixed

Pull Requests

New Contributors

Contributors

v0.8.1

v0.8.1

What's changed

Pull Requests

Contributors

v0.8.0

Changelog

Added

Changed

Fixed

What's changed

Pull Requests

New Contributors

Contributors

v0.7.4

Changelog

Added

Changed

Fixed

Pull Requests

Contributors

v0.7.2

Changelog

Added

Fixed

Pull Requests

Contributors

v0.7.1

Changelog

Added

Changed

Fixed

Pull Requests

New Contributors

Contributors

v0.7.0

Changelog

Added

Changed

Fixed

Pull requests

New Contributors

Contributors

v0.6.2

Changelog

Added

Changed

Fixed

Pull Requests

New Contributors

Contributors