Releases: aphp/edsnlp
Releases · aphp/edsnlp
v0.9.1
Changelog
Changed
- Improve negation patterns
- Abstent disorders now set the negation to True when matched as
ABSENT
- Default qualifier is now
None
instead ofFalse
(empty string)
Fixed
span_getter
is not incompatible with on_ents_only anymoreContextualMatcher
now supports empty matches (e.g. lookahead/lookbehind) inassign
patterns
Pull Requests
- Fix negations by @percevalw in #216
- Chore: bump version to 0.9.1 by @percevalw in #218
Full Changelog: v0.9.0...v0.9.1
v0.9.0
Changelog
Added
- New
to_duration
method to convert an absolute date into a date relative to the note_datetime (or None)
Changes
- Input and output of components are now specified by
span_getter
andspan_setter
arguments. - 💥 Score / disorders / behaviors entities now have a fixed label (passed as an argument), instead of being dynamically set from the component name. The following scores may have a different name than the current one in your pipelines:
eds.emergency.gemsa
→emergency_gemsa
eds.emergency.ccmu
→emergency_ccmu
eds.emergency.priority
→emergency_priority
eds.charlson
→charlson
eds.elston_ellis
→elston_ellis
eds.SOFA
→sofa
eds.adicap
→adicap
eds.measuremets
→size
,weight
, ... instead ofeds.size
,eds.weight
, ...
eds.dates
now separate dates from durations. Each entity has its own label:spans["dates"]
→ entities labelled asdate
with aspan._.date
parsed objectspans["durations"]
→ entities labelled asduration
with aspan._.duration
parsed object
- the "relative" / "absolute" / "duration" mode of the time entity is now stored in
themode
attribute of thespan._.date/duration
- the "from" / "until" period bound, if any, is now stored in the
span._.date.bound
attribute to_datetime
now only return absolute dates, converts relative dates into absolute ifdoc._.note_datetime
is given, and None otherwise
Fixed
export_to_brat
issue with spans of entities on multiple lines.
Pull Requests
- Fix export_to_brat when there are spaces before new lines by @TheooJ in #211
- Refacto of the extensions by @percevalw in #213
- chore: bump version to 0.9.0 by @percevalw in #215
New Contributors
Full Changelog: v0.8.1...v0.9.0
v0.8.1
Post-release to synchronize Zenodo
v0.8.1
What's changed
Fix release to allow installation from source.
Pull Requests
- Ship cython files in sdist by @percevalw in #210
Full Changelog: v0.8.0...v0.8.1
v0.8.0
Changelog
Added
- New trainable component for multi-label, multi-class span qualification (any attribute/extension)
- Add range measurements (like
la tumeur fait entre 1 et 2 cm
) toeds.measurements
matcher - Add
eds.CKD
component - Add
eds.COPD
component - Add
eds.alcohol
component - Add
eds.cerebrovascular_accident
component - Add
eds.congestive_heart_failure
component - Add
eds.connective_tissue_disease
component - Add
eds.dementia
component - Add
eds.diabetes
component - Add
eds.hemiplegia
component - Add
eds.leukemia
component - Add
eds.liver_disease
component - Add
eds.lymphoma
component - Add
eds.myocardial_infarction
component - Add
eds.peptic_ulcer_disease
component - Add
eds.peripheral_vascular_disease
component - Add
eds.solid_tumor
component - Add
eds.tobacco
component - Add
eds.spaces
(oreds.normalizer
withspaces=True
) to detect space tokens, and addignore_space_tokens
toEDSPhraseMatcher
andSimstringMatcher
to skip them - Add
ignore_space_tokens
option in most components eds.tables
: new pipeline to identify formatted tables- New
merge_mode
parameter ineds.measurements
to normalize existing entities or detect
measures only inside existing entities - Tokenization exceptions (
Mr.
,Dr.
,Mrs.
) and non end-of-sentence periods are now tokenized with the next letter in theeds
tokenizer
Changed
- Disable
EDSMatcher
preprocessing auto progress tracking by default - Moved dependencies to a single pyproject.toml: support for
pip install -e '.[dev,docs,setup]'
- ADICAP matcher now allow dot separators (e.g.
B.H.HP.A7A0
)
Fixed
- Abbreviation and number tokenization issues in the
eds
tokenizer eds.adicap
: reparsed the dictionnary used to decode the ADICAP codes (some of them were wrongly decoded)- Fix build for python 3.9 on Mac M1/M2 machines.
What's changed
Pull Requests
- docs: mention INRIA in the acknowledgment by @percevalw in #170
- Umls fixes by @percevalw in #183
- fix typo by @gammaeva in #179
- add link and definiton for sofa in documentation by @strayMat in #182
- CI fail exploration by @Thomzoy in #189
- Repare parsing errors of the ADICAP dict by @etienneguevel in #187
- Move dependencies to pyproject.toml by @percevalw in #190
- Add tokenization exceptions and detect some false positive EOS by @percevalw in #192
- Bump version to 0.8.0 by @percevalw in #194
- Update docs by @percevalw in #196
- Ignore space tokens by @percevalw in #198
- pipe tables by @aricohen93 in #180
- Range measurements by @percevalw in #195
- SpanQualifier trainable component by @percevalw in #193
- 18 pipes from the Charlson Comorbidity Index by @Thomzoy in #205
- Bump version to v0.8.0 by @percevalw in #209
New Contributors
Full Changelog: v0.7.4...v0.8.0
v0.7.4
Changelog
Added
eds.history
: Add the option to consider only the closest dates in the sentence (dates inside the boundaries and if there is not, it takes the closest date in the entire sentence).eds.negation
: It takes into account following past participates and preceding infinitives.eds.hypothesis
: It takes into account following past participates hypothesis verbs.eds.negation
&eds.hypothesis
: Introduce new patterns and remove unnecessary patterns.eds.dates
: Add a pattern for preceding relative dates (ex: l'embolie qui est survenue à 10 jours).- Improve patterns in the
eds.pollution
component to account for multiline footers - Add
QuickExample
object to quickly try a pipeline. - Add UMLS terminology matcher
eds.umls
- New
RegexMatcher
method to create spans from groupdicts - New
eds.dates
option to disable time detection
Changed
- Improve date detection by removing false positives
Fixed
eds.hypothesis
: Remove too generic patterns.EDSTokenizer
: It now tokenizes"rechereche d'"
as["recherche", "d'"]
, instead of["recherche", "d", "'"]
.- Fix small typos in the documentation and in the docstring.
- Harmonize processing utils (distributed custom_pipe) to have the same API for Pandas and Pyspark
- Fix BratConnector file loading issues with complex file hierarchies
Pull Requests
- 👓 Feedbacks from EDS-TeVa study by @Aremaki in #157
- feat: 🩺 Update negation and hypothesis pipelines by @Aremaki in #162
- Harmonize processing utils by @aricohen93 in #160
- Update pattern footer (pollution) by @aricohen93 in #159
- feat: add UMLS terminology (#147) by @percevalw in #165
- Relax pydantic version constraints by @percevalw in #167
- Allow back spacy dot components for backward compatibility by @percevalw in #152
- Update docs by @percevalw in #168
- Bump version to 0.7.3 by @percevalw in #169
- Quick example by @Thomzoy in #166
- Update index.md by @Thomzoy in #171
- Fix brat file path search for complex file hierarchies by @percevalw in #172
- Improve dates by @percevalw in #149
- Bump version to 0.7.4 by @percevalw in #173
Full Changelog: v0.7.2...v0.7.4
v0.7.2
Changelog
Added
- Improve the
eds.history
component by taking into account the date extracted fromeds.dates
component. - New pop up when you click on the copy icon in the termynal widget (docs).
- Add NER
eds.elston-ellis
pipeline to identify Elston Ellis scores - Add
flags=re.MULTILINE
toeds.pollution
and change pattern of footer
Fixed
- Remove the warning in the
eds.sections
wheneds.normalizer
is in the pipe. - Fix filter_spans for strictly nested entities
- Fill eds.remove-lowercase "assign" metadata to run the pipeline during EDSPhraseMatcher preprocessing
Pull Requests
- Update patterns pollution by @aricohen93 in #145
- feat: ✨ Improve
eds.history
component witheds.dates
by @Aremaki in #144 - Small fixes by @percevalw in #146
- Elston and Ellis by @etienneguevel in #148
- Fix setup.py by @percevalw in #151
- Patch patterns norm by @aricohen93 in #150
- Bump version to 0.7.2 by @percevalw in #153
Full Changelog: v0.7.1...v0.7.2
v0.7.1
Changelog
Added
- Add new patterns (footer, web entities, biology tables, coding sections) to pipeline normalisation (pollution)
Changed
- Improved TNM detection algorithm
- Account for more modifiers in ADICAP codes detection
Fixed
- Add nephew, niece and daughter to family qualifier patterns
- EDSTokenizer (
spacy.blank('eds')
) now recognizes non-breaking whitespaces as spaces and does not split float numbers eds.dates
pipeline now allows new lines as space separators in dates
Pull Requests
- add: new patterns to pollution by @Thomzoy in #132
- docs: fix cim10 docs by @percevalw in #130
- Remove print statement by @Thomzoy in #133
- fix: param sampling AdicapCode by @etienneguevel in #131
- Add nephew, niece and daughter to family qualifier patterns by @julienduquesne in #135
- Modification of the TNM ner by @etienneguevel in #136
- modification of the ADICAP ner by @etienneguevel in #137
- EDSTokenizer: split on non-breaking spaces and don't split float numbers by @percevalw in #141
- Allow newlines in dates by @percevalw in #142
- new pattern norm pollution by @aricohen93 in #139
- Bump version to 0.7.1 by @percevalw in #143
New Contributors
- @etienneguevel made their first contribution in #131
- @julienduquesne made their first contribution in #135
Full Changelog: v0.7.0...v0.7.1
v0.7.0
Changelog
Added
- New nested NER trainable
nested_ner
pipeline component - Support for nested entities and attributes in BratDataConnector
- Pytorch wrappers and experimental training utils
- Add attribute
section
to entities - Add new cases for separator pattern when components of the TNM score are separated by a forward slash
- Add NER
eds.adicap
pipeline to identify ADICAP codes
Changed
- Update of the
ContextualMatcher
(and all pipelines depending on it), rendering it more flexible to use - Rename R component of score TNM as "resection_completeness"
Fixed
- Prevent section titles from capturing surrounding tokens, causing overlaps (#113)
- Enhance existing patterns for section detection and add patterns for previously ignored sections (introduction, evolution, modalites de sortie, vaccination) .
- Fix explain mode, which was always triggered, in
eds.history
factory. - Fix test in
eds.sections
. Previously, no check was done - Remove SOFA scores spurious span suffixes
Pull requests
- Change links to streamlit demo by @percevalw in #111
- Restore demo links by @percevalw in #112
- Prevent section titles from capturing surrounding tokens by @percevalw in #114
- Section upgrade by @paul-bssr in #115
- Nested NER trainable pipeline component by @percevalw in #84
- Fix
history
factory parameter type by @clementjumel in #117 - Rename R component (TNM) by @aricohen93 in #119
- Update separator pattern score TNM by @aricohen93 in #121
- add section info to entities by @aricohen93 in #120
- Adicap pipeline by @aricohen93 in #123
- ContextualMatcher + ADICAP Update by @Thomzoy in #124
- fix: handle single entity in contextual matcher by @Thomzoy in #126
- Adicap model by @percevalw in #127
- chore: bump version to 0.7.0 by @percevalw in #125
- v0.7.0 + fixed package_data by @percevalw in #129
New Contributors
- @paul-bssr made their first contribution in #115
- @clementjumel made their first contribution in #117
Full Changelog: v0.6.2...v0.7.0
v0.6.2
Changelog
Added
- New
SimstringMatcher
matcher to perform fuzzy term matching, andalgorithm
parameter in terminology components andeds.matcher
component
Changed
- Add consultation date pattern "CS", and False Positive patterns for dates (namely phone numbers and pagination).
- Update the pipeline score
eds.TNM
. Now it is possible to return a dictionary where the results are eitherstr
orint
values
Fixed
- Add new patterns to the negation qualifier
- Numpy header issues with binary distributed packages
- Simstring dependency on Windows
Pull Requests
- chore: add acknowledgement by @bdura in #102
- TNM by @aricohen93 in #103
- fix: eds.sentences behaviour with dates by @bdura in #99
- Add consultation date pattern and date False Positive by @JCharline in #107
- Simstring by @percevalw in #94
- Fix numpy header issues with binary packages by @percevalw in #109
- fix: add "non" preceding pattern by @bdura in #105
- Bump version to v0.6.2 by @percevalw in #110
New Contributors
- @JCharline made their first contribution in #107
Full Changelog: v0.6.1...v0.6.2