Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FIX] Corpus: remove text features which not in metas #325

Merged
merged 1 commit into from
Nov 9, 2017
Merged

[FIX] Corpus: remove text features which not in metas #325

merged 1 commit into from
Nov 9, 2017

Conversation

jerneju
Copy link
Contributor

@jerneju jerneju commented Nov 8, 2017

Issue

Fixes #324

Description of changes
Includes
  • Code changes
  • Tests
  • Documentation

@jerneju jerneju requested a review from nikicc November 8, 2017 12:58
@codecov-io
Copy link

codecov-io commented Nov 8, 2017

Codecov Report

Merging #325 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master     #325      +/-   ##
==========================================
+ Coverage   85.37%   85.38%   +<.01%     
==========================================
  Files          34       34              
  Lines        1860     1861       +1     
  Branches      335      336       +1     
==========================================
+ Hits         1588     1589       +1     
  Misses        237      237              
  Partials       35       35

@@ -434,8 +434,7 @@ def retain_preprocessing(orig, new, key=...):
else:
raise TypeError('Indexing by type {} not supported.'.format(type(key)))
new._dictionary = orig._dictionary

new.text_features = orig.text_features
new.text_features = list(set(orig.text_features) & set(new.domain.metas)) or None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why or None? Perhaps it's safer to set text_features to an empty list.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps also fix this in __init__?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.text_features = None changed to self.text_features = []
text_features=None as a default argument in __init__ is kept.

domain = Domain(attributes=c.domain.attributes, class_vars=c.domain.class_vars)
d = c.transform(domain)
self.assertTrue(d.text_features is None or d.text_features[0] in d.domain.metas)
self.assertTrue(d.text_features is None or d.text_features[0] in d.domain.metas)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this the same as above line?

c = Corpus.from_file('friends-transcripts')
domain = Domain(attributes=c.domain.attributes, class_vars=c.domain.class_vars)
d = c.transform(domain)
self.assertTrue(d.text_features is None or d.text_features[0] in d.domain.metas)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we make thi check without an or.

d = c.transform(domain)
self.assertTrue(d.text_features is None or d.text_features[0] in d.domain.metas)
self.assertTrue(d.text_features is None or d.text_features[0] in d.domain.metas)
d.copy()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a comment here?

GH-324
GH-325
"""
c = Corpus.from_file('friends-transcripts')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps, use deerwester that loads much faster?

@nikicc nikicc merged commit 1b18b9a into biolab:master Nov 9, 2017
@jerneju jerneju deleted the corpus-text-features branch November 9, 2017 12:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants