-
-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] Keywords - replace embedding with MBERT #932
Conversation
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## master #932 +/- ##
==========================================
+ Coverage 77.54% 77.63% +0.08%
==========================================
Files 86 86
Lines 12280 12266 -14
Branches 1609 1602 -7
==========================================
Hits 9523 9523
+ Misses 2458 2446 -12
+ Partials 299 297 -2 |
from .decorators import * | ||
from .widgets import * | ||
from .concurrent import asynchronous | ||
|
||
|
||
def enum2int(enum: Union[Enum, IntEnum]) -> int: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why isn't this imported from Orange.widgets.utils
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one is actually already merged. I forgot to rebase this PR. :) Anyway enum2int from Orange is not released yet. I will add a test that will start to fail after some time and warn about removing it from text after some time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually it was written here before it existed in Orange. I propose a time bomb that will start to fail and warn about removing it. #951
return b64encode(zlib.compress(instance.encode("utf-8", "replace"), level=9)) | ||
|
||
|
||
if __name__ == "__main__": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I run this, it crashes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
Event with the two comments I'm happy to merge. |
3859ad1
to
839444a
Compare
@VesnaT since you approved it I am merging it. |
Issue
Embedding keyword has poor performance and is slow.
Description of changes
Replace Embedding keyword extractor with MBERT with excellent performance. This keyword extractor extracts keywords with higher F1 scores than all existing extractors. The downside of this extractor is that it, on average extracts less than 5 keywords per document and also 0 many times.
Includes