Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OWPreprocess Text: add option to filter on POS tags #679

Merged
merged 3 commits into from
Jul 20, 2021

Conversation

ajdapretnar
Copy link
Collaborator

Issue

Implements #678.

Description of changes

Add option to filter POS tags directly in Preprocess Text.
Add setter and getter to Corpus for easier pos_tag handling.

Includes
  • Code changes
  • Tests
  • Documentation

@ajdapretnar
Copy link
Collaborator Author

One possible improvement: map existing POS tags to dict and offer them as a dropdown.
Issue: Penn Tree Bank and conllu have different tag sets. Penn is more detailed (i.e. JJR, JJS...), while conllu is simpler in the first form (VERB, NOUN, ADJ) and has specific tag information "hidden" (accessible in parsed tokens, which we currently don't support).

@ajdapretnar ajdapretnar force-pushed the pos-filter-preprocess branch from 07a7fe7 to 8247baa Compare July 15, 2021 09:41
@PrimozGodec
Copy link
Collaborator

It looks good to me. I would say we keep the implementation as it is now and later (when we also add more POS taggers) we can implement it as a dropdown of options.

@PrimozGodec PrimozGodec merged commit a43bbef into biolab:master Jul 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants