-
-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FIX] Corpus: remove text features which not in metas #325
Conversation
Codecov Report
@@ Coverage Diff @@
## master #325 +/- ##
==========================================
+ Coverage 85.37% 85.38% +<.01%
==========================================
Files 34 34
Lines 1860 1861 +1
Branches 335 336 +1
==========================================
+ Hits 1588 1589 +1
Misses 237 237
Partials 35 35 |
orangecontrib/text/corpus.py
Outdated
@@ -434,8 +434,7 @@ def retain_preprocessing(orig, new, key=...): | |||
else: | |||
raise TypeError('Indexing by type {} not supported.'.format(type(key))) | |||
new._dictionary = orig._dictionary | |||
|
|||
new.text_features = orig.text_features | |||
new.text_features = list(set(orig.text_features) & set(new.domain.metas)) or None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why or None
? Perhaps it's safer to set text_features
to an empty list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps also fix this in __init__
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.text_features = None
changed to self.text_features = []
text_features=None
as a default argument in __init__
is kept.
domain = Domain(attributes=c.domain.attributes, class_vars=c.domain.class_vars) | ||
d = c.transform(domain) | ||
self.assertTrue(d.text_features is None or d.text_features[0] in d.domain.metas) | ||
self.assertTrue(d.text_features is None or d.text_features[0] in d.domain.metas) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this the same as above line?
c = Corpus.from_file('friends-transcripts') | ||
domain = Domain(attributes=c.domain.attributes, class_vars=c.domain.class_vars) | ||
d = c.transform(domain) | ||
self.assertTrue(d.text_features is None or d.text_features[0] in d.domain.metas) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we make thi check without an or
.
d = c.transform(domain) | ||
self.assertTrue(d.text_features is None or d.text_features[0] in d.domain.metas) | ||
self.assertTrue(d.text_features is None or d.text_features[0] in d.domain.metas) | ||
d.copy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add a comment here?
GH-324 | ||
GH-325 | ||
""" | ||
c = Corpus.from_file('friends-transcripts') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps, use deerwester
that loads much faster?
Issue
Fixes #324
Description of changes
Includes