Skip to content

Releases: aphp/edsnlp

v0.9.1

22 Sep 09:06
Compare
Choose a tag to compare

Changelog

Changed

  • Improve negation patterns
  • Abstent disorders now set the negation to True when matched as ABSENT
  • Default qualifier is now None instead of False (empty string)

Fixed

  • span_getter is not incompatible with on_ents_only anymore
  • ContextualMatcher now supports empty matches (e.g. lookahead/lookbehind) in assign patterns

Pull Requests

Full Changelog: v0.9.0...v0.9.1

v0.9.0

15 Sep 16:31
Compare
Choose a tag to compare

Changelog

Added

  • New to_duration method to convert an absolute date into a date relative to the note_datetime (or None)

Changes

  • Input and output of components are now specified by span_getter and span_setter arguments.
  • 💥 Score / disorders / behaviors entities now have a fixed label (passed as an argument), instead of being dynamically set from the component name. The following scores may have a different name than the current one in your pipelines:
    • eds.emergency.gemsaemergency_gemsa
    • eds.emergency.ccmuemergency_ccmu
    • eds.emergency.priorityemergency_priority
    • eds.charlsoncharlson
    • eds.elston_elliselston_ellis
    • eds.SOFAsofa
    • eds.adicapadicap
    • eds.measuremetssize, weight, ... instead of eds.size, eds.weight, ...
  • eds.dates now separate dates from durations. Each entity has its own label:
    • spans["dates"] → entities labelled as date with a span._.date parsed object
    • spans["durations"] → entities labelled as duration with a span._.duration parsed object
  • the "relative" / "absolute" / "duration" mode of the time entity is now stored in
    the mode attribute of the span._.date/duration
  • the "from" / "until" period bound, if any, is now stored in the span._.date.bound attribute
  • to_datetime now only return absolute dates, converts relative dates into absolute if doc._.note_datetime is given, and None otherwise

Fixed

  • export_to_brat issue with spans of entities on multiple lines.

Pull Requests

New Contributors

Full Changelog: v0.8.1...v0.9.0

v0.8.1

20 Jul 13:24
Compare
Choose a tag to compare

Post-release to synchronize Zenodo

v0.8.1

31 May 11:42
Compare
Choose a tag to compare

What's changed

Fix release to allow installation from source.

Pull Requests

Full Changelog: v0.8.0...v0.8.1

v0.8.0

24 May 16:00
Compare
Choose a tag to compare

Changelog

Added

  • New trainable component for multi-label, multi-class span qualification (any attribute/extension)
  • Add range measurements (like la tumeur fait entre 1 et 2 cm) to eds.measurements matcher
  • Add eds.CKD component
  • Add eds.COPD component
  • Add eds.alcohol component
  • Add eds.cerebrovascular_accident component
  • Add eds.congestive_heart_failure component
  • Add eds.connective_tissue_disease component
  • Add eds.dementia component
  • Add eds.diabetes component
  • Add eds.hemiplegia component
  • Add eds.leukemia component
  • Add eds.liver_disease component
  • Add eds.lymphoma component
  • Add eds.myocardial_infarction component
  • Add eds.peptic_ulcer_disease component
  • Add eds.peripheral_vascular_disease component
  • Add eds.solid_tumor component
  • Add eds.tobacco component
  • Add eds.spaces (or eds.normalizer with spaces=True) to detect space tokens, and add ignore_space_tokens to EDSPhraseMatcher and SimstringMatcher to skip them
  • Add ignore_space_tokens option in most components
  • eds.tables: new pipeline to identify formatted tables
  • New merge_mode parameter in eds.measurements to normalize existing entities or detect
    measures only inside existing entities
  • Tokenization exceptions (Mr., Dr., Mrs.) and non end-of-sentence periods are now tokenized with the next letter in the eds tokenizer

Changed

  • Disable EDSMatcher preprocessing auto progress tracking by default
  • Moved dependencies to a single pyproject.toml: support for pip install -e '.[dev,docs,setup]'
  • ADICAP matcher now allow dot separators (e.g. B.H.HP.A7A0)

Fixed

  • Abbreviation and number tokenization issues in the eds tokenizer
  • eds.adicap : reparsed the dictionnary used to decode the ADICAP codes (some of them were wrongly decoded)
  • Fix build for python 3.9 on Mac M1/M2 machines.

What's changed

Pull Requests

New Contributors

Full Changelog: v0.7.4...v0.8.0

v0.7.4

12 Dec 14:36
Compare
Choose a tag to compare

Changelog

Added

  • eds.history : Add the option to consider only the closest dates in the sentence (dates inside the boundaries and if there is not, it takes the closest date in the entire sentence).
  • eds.negation : It takes into account following past participates and preceding infinitives.
  • eds.hypothesis: It takes into account following past participates hypothesis verbs.
  • eds.negation & eds.hypothesis : Introduce new patterns and remove unnecessary patterns.
  • eds.dates : Add a pattern for preceding relative dates (ex: l'embolie qui est survenue à 10 jours).
  • Improve patterns in the eds.pollution component to account for multiline footers
  • Add QuickExample object to quickly try a pipeline.
  • Add UMLS terminology matcher eds.umls
  • New RegexMatcher method to create spans from groupdicts
  • New eds.dates option to disable time detection

Changed

  • Improve date detection by removing false positives

Fixed

  • eds.hypothesis : Remove too generic patterns.
  • EDSTokenizer : It now tokenizes "rechereche d'" as ["recherche", "d'"], instead of ["recherche", "d", "'"].
  • Fix small typos in the documentation and in the docstring.
  • Harmonize processing utils (distributed custom_pipe) to have the same API for Pandas and Pyspark
  • Fix BratConnector file loading issues with complex file hierarchies

Pull Requests

Full Changelog: v0.7.2...v0.7.4

v0.7.2

26 Oct 20:54
Compare
Choose a tag to compare

Changelog

Added

  • Improve the eds.history component by taking into account the date extracted from eds.dates component.
  • New pop up when you click on the copy icon in the termynal widget (docs).
  • Add NER eds.elston-ellis pipeline to identify Elston Ellis scores
  • Add flags=re.MULTILINE to eds.pollution and change pattern of footer

Fixed

  • Remove the warning in the eds.sections when eds.normalizer is in the pipe.
  • Fix filter_spans for strictly nested entities
  • Fill eds.remove-lowercase "assign" metadata to run the pipeline during EDSPhraseMatcher preprocessing

Pull Requests

Full Changelog: v0.7.1...v0.7.2

v0.7.1

13 Oct 09:34
Compare
Choose a tag to compare

Changelog

Added

  • Add new patterns (footer, web entities, biology tables, coding sections) to pipeline normalisation (pollution)

Changed

  • Improved TNM detection algorithm
  • Account for more modifiers in ADICAP codes detection

Fixed

  • Add nephew, niece and daughter to family qualifier patterns
  • EDSTokenizer (spacy.blank('eds')) now recognizes non-breaking whitespaces as spaces and does not split float numbers
  • eds.dates pipeline now allows new lines as space separators in dates

Pull Requests

New Contributors

Full Changelog: v0.7.0...v0.7.1

v0.7.0

06 Sep 16:39
Compare
Choose a tag to compare

Changelog

Added

  • New nested NER trainable nested_ner pipeline component
  • Support for nested entities and attributes in BratDataConnector
  • Pytorch wrappers and experimental training utils
  • Add attribute section to entities
  • Add new cases for separator pattern when components of the TNM score are separated by a forward slash
  • Add NER eds.adicap pipeline to identify ADICAP codes

Changed

  • Update of the ContextualMatcher (and all pipelines depending on it), rendering it more flexible to use
  • Rename R component of score TNM as "resection_completeness"

Fixed

  • Prevent section titles from capturing surrounding tokens, causing overlaps (#113)
  • Enhance existing patterns for section detection and add patterns for previously ignored sections (introduction, evolution, modalites de sortie, vaccination) .
  • Fix explain mode, which was always triggered, in eds.history factory.
  • Fix test in eds.sections. Previously, no check was done
  • Remove SOFA scores spurious span suffixes

Pull requests

New Contributors

  • @paul-bssr made their first contribution in #115
  • @clementjumel made their first contribution in #117

Full Changelog: v0.6.2...v0.7.0

v0.6.2

02 Aug 11:56
Compare
Choose a tag to compare

Changelog

Added

  • New SimstringMatcher matcher to perform fuzzy term matching, and algorithm parameter in terminology components and eds.matcher component

Changed

  • Add consultation date pattern "CS", and False Positive patterns for dates (namely phone numbers and pagination).
  • Update the pipeline score eds.TNM. Now it is possible to return a dictionary where the results are either str or int values

Fixed

  • Add new patterns to the negation qualifier
  • Numpy header issues with binary distributed packages
  • Simstring dependency on Windows

Pull Requests

New Contributors

Full Changelog: v0.6.1...v0.6.2