Skip to content

Commit

Permalink
add _add_sentences dataset helper config
Browse files Browse the repository at this point in the history
  • Loading branch information
ArneBinder committed Jun 12, 2024
1 parent 76b0454 commit ddf4a88
Showing 1 changed file with 10 additions and 0 deletions.
10 changes: 10 additions & 0 deletions configs/dataset/_add_sentences.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# Note: This requires documents to be of type pytorch_ie.documents.TextDocumentWithLabeledPartitions, so it
# may be necessary to convert the documents first, e.g. by using _convert_documents.yaml with a respective
# document_type (e.g. pytorch_ie.documents.TextDocumentWithLabeledSpansBinaryRelationsAndLabeledPartitions).

add_sentences:
_processor_: pie_datasets.DatasetDict.map
function:
# see this for further information and options:
# https://github.com/ArneBinder/pie-datasets/blob/main/src/pie_datasets/document/processing/nltk_sentence_splitter.py
_target_: pie_modules.document.processing.NltkSentenceSplitter

0 comments on commit ddf4a88

Please sign in to comment.