Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Computation in separate thread for base vectorizer; use base vectorizer for embedding #852

Merged

Conversation

PrimozGodec
Copy link
Collaborator

@PrimozGodec PrimozGodec commented May 25, 2022

Issue

The original issue is that the Embedder widget and embedder method is not based on the base vectorizer.
To enable basing on BaseVectorizer/OWBaseVectorizer several changes need to be made in both classes

Description of changes
  • Implement concurrent mixin in OWBaseVectorizer (consequently it is used in Simhash and Bow)
  • Reimplement commit (do not compute if the commit is unchecked until the button pressed)
  • Use deferred commit
  • Change embedded to base on BaseVectorizer
  • Add some tests
Includes
  • Code changes
  • Tests
  • Documentation

@PrimozGodec PrimozGodec force-pushed the doc-embedder-use-basevectorizer branch from c809707 to aaad632 Compare May 26, 2022 13:15
@codecov-commenter
Copy link

codecov-commenter commented May 26, 2022

Codecov Report

Merging #852 (76f9060) into master (4c38b3f) will increase coverage by 0.22%.
The diff coverage is 93.63%.

❗ Current head 76f9060 differs from pull request most recent head c378f76. Consider uploading reports for the commit c378f76 to get more accurate results

@@            Coverage Diff             @@
##           master     #852      +/-   ##
==========================================
+ Coverage   76.67%   76.90%   +0.22%     
==========================================
  Files          83       83              
  Lines       11761    11772      +11     
  Branches     1867     1866       -1     
==========================================
+ Hits         9018     9053      +35     
+ Misses       2439     2415      -24     
  Partials      304      304              

@PrimozGodec PrimozGodec force-pushed the doc-embedder-use-basevectorizer branch from aaad632 to 76f9060 Compare May 26, 2022 13:31
@PrimozGodec PrimozGodec marked this pull request as ready for review May 26, 2022 13:32
@ajdapretnar
Copy link
Collaborator

One other issue. When saving an old workflow with Document Embedding, the workflow doesn't work well with the new version. The issue is the "Embedding" output is now renamed to "Corpus". Should probably be added to settings migration somehow.

@PrimozGodec PrimozGodec force-pushed the doc-embedder-use-basevectorizer branch from 76f9060 to c378f76 Compare June 1, 2022 13:28
@PrimozGodec
Copy link
Collaborator Author

One other issue. When saving an old workflow with Document Embedding, the workflow doesn't work well with the new version. The issue is the "Embedding" output is now renamed to "Corpus". Should probably be added to settings migration somehow.

It is fixed now through the replaces argument

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants