All notable changes to this project will be documented in this file. Please add new entries at the top. Use one of the following headings: Added
, Changed
, Deprecated
, Removed
, Fixed
, Security
.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
❗ = Breaking change
- Added a visual/interactive demo of the Information Extraction pipeline in the form of a Dash web application (
clinlp app ie_demo
)
- Inclusion of resources in package
- Moved development dependencies according to PEP-735
- Additional package metadata
- Used
uv
as a package manager, replacingpoetry
- Small formatting fixes due to updated linting rules
- Mantra GSC corpus for evaluation
- Loading and exporting
InfoExtractionDataset
as dictionaries or JSON files - Metric support for multi-class qualifiers
- In the
RuleBasedEntityMatcher
, option to add terms as adict
(in addition tostr
,list
andTerm
) - In the
RuleBasedEntityMatcher
, option to add terms from dict (add_terms_from_dict
), json (add_terms_from_json
) or csv (add_terms_from_csv
) - In the
Term
class, an option to override arguments that were not set
- Moved regression test cases to data directory in more open format, so they are re-usable
- Made the
default
field forQualifier
optional InfoExtractionDataset
andInfoExtractionMetrics
useQualifier
objects for qualifiers rather thandict
- ❗
InfoExtractionDataset
andInfoExtractionMetrics
no longer track or use qualifier defaults - Made qualifiers optional for metrics in
Annotation
- Added a
normalize
method toNormalizer
, so it can be used/tested directly - The logic for determining whether the
RuleBasedEntityMatcher
should internally use the phrase matcher or the matcher is simplified
- ❗ The
create_concept_dict
method, which is now replaced byadd_terms_from_csv
inRuleBasedEntityMatcher
- ❗ In the
RuleBasedEntityMatcher
, theload_concepts
method, which is now replaced byadd_terms_from_dict
andadd_terms_from_json
- Docstrings on all modules, classes, methods and functions
- In
InformationExtractionDataset
, renamedspan_counts
,label_counts
andqualifier_counts
tospan_freqs
,label_freqs
andqualifier_freqs
respectively. - The
clinlp_component
utility now returns the class itself, rather than a helper function for making it - Changed order of
direction
andqualifier
arguments ofContextRule
- Simplified default settings for
clinlp
components andTerm
class - Normalizer uses casefold rather than lower for normalizing text
- Parameterized spans_key for ie components
- ❗ Renamed the
clinlp_entity_matcher
toclinlp_rule_based_entity_matcher
- ❗
clinlp
now stores entities indoc.spans['ents']
rather thandoc.ents
, allowing for overlap- ❗ Overlap in entities found by the entity matcher is no longer resolved by default (replacing old behavior). To remove overlap, pass
resolve_overlap=True
.
- ❗ Overlap in entities found by the entity matcher is no longer resolved by default (replacing old behavior). To remove overlap, pass
- Refactored tests to use
pytest
best practices - Changed
clinlp_autocomponent
toclinlp_component
, which automatically registers your component withspaCy
- Codebase and linting improvements
- Renamed the
other_threshold
config tofamily_threshold
in theclinlp_experiencer_transformer
component
- The
clinlp_rule_based_entity_matcher
no longer overwrites entities detected by other components (but appends them)
- Integrated the clin_nlp_metrics package in this repository, specifically in
clinlp.metrics.ie
- Support for non-binary qualifier in the Context Algorithm (e.g. 'Change', with values Decreasing, Stable and Increasing)
- Support for bidirectional qualifier patterns
- ❗ Moved all components related to information extraction to
clinlp.ie
. Please update imports accordingly (e.g.from clinlp.ie import Term
) - ❗ Updated the framework for qualifiers, to now have three qualifier classes: Presence, Temporality and Experiencer. For more details, see docs
- Support for Python 3.12
- A component for transformer-based detection of Experiencer qualifiers (Patient/Other) (
clinlp_experiencer_transformer
)
- A way to use a csv file as input for a concept list, using
create_concept_dict
- Fix a bug with termination trigger directly next to context trigger
- Replaced call to
importlib.resources.path
which is deprecated from python 3.11 on
- A bug with adjacent entities, which were accidentally marked as overlapping
- Qualifier detectors now add all default qualifiers (e.g. 'Affirmed', for
Negation
) - Use titlecase for qualifier values
- A bug with importlib causing an
AttributeError
on importingclinlp
- Removed accidental print statement
- A bug with overlapping entities
- A custom component for entity recognition, with options for proximity, fuzzy and pseudo matching
- Definition for qualifiers (negation, plausibility, temporality, experiencer)
- Updated rules for context algorithm to be consistent with definitions
- Added some rules for context algorithm
- Refactored
Qualifier
class from enum to a separate class, that accommodates other fields (like prob) - Use
entity._.qualifiers
to obtainQualifier
classes,entity._.qualifier_str
for strings, andentity._.qualifier_dict
for dicts
- Ambiguity of
dd
for context rules (can mean differential diagnosis, and daily dosage) - Importing
clinlp
caused a bug when extras were missing
- Support for python 3.9
- Remove a default
spaCy
abbreviation (ts.
) - Option for max scope on qualifier rules, limiting the number of tokens it applies to
- A transformer based pipeline for negation detection (
clinlp_negation_transformer
) - A base class
QualifierDetector
for qualifier detection
- Issue where entity and context trigger were overlapping (e.g.
geen eetlust
) - Some tests that were not auto-discovered by pytest due to naming
- Refactored context algorithm to allow adding new qualifier detectors
- The
@clinlp_autocomponent
wrapper as a utility function, which makes creating components with inheritance and arbitrary config a bit easier - Made default configs a bit simpler and DRY
- Move qualifier adding for context algorithm to base class
- Version info to model meta (warns if installed
clinlp
version does not match model version) - A component for normalizing
- Bug with resource loading
- Initial release
- Placeholder release