Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Show IDF results in a Word Cloud #401

Closed
benel opened this issue Nov 29, 2018 · 2 comments
Closed

[ENH] Show IDF results in a Word Cloud #401

benel opened this issue Nov 29, 2018 · 2 comments

Comments

@benel
Copy link

benel commented Nov 29, 2018

Text version

0.5.2

Orange version

3.16

Expected behavior

My aim would be to show TF.IDF in action to my students.
When connecting a corpus to a bag of words and the bag of words to a data table (or even better to a word cloud), I would expect that changing the document frequency parameter in the bag of words from none to IDF would change the result (hiding common words in the language like "the", similarly to a stop words preprocessing, but also hiding words common to the corpus like "queen" for a tales corpus).

Actual behavior

Changing the parameter doesn't seem to change anything in the result.
@ajdapretnar explained the following in a related ticket (biolab/orange3#3426):

(...) for the Data Table, you should definitely see the changes when using the IDF transformation.
Word Cloud, however, is currently implemented in a way that it shows frequent tokens, that are a separate property from a table, which is constructed from a bag of words. That said, your idea sounds interesting, since I cannot think of a good way to sort words by IDF frequencies. Could you perhaps open a feature request on our issue tracker

@benel benel changed the title How to visualize the difference between TF and TF.IDF on a bag of words How to visualize the difference between TF and TF.IDF on a bag of words? Nov 29, 2018
@ajdapretnar
Copy link
Collaborator

Thank you for opening this. I normally show IDF in a Data Table as seen below. But you are making a point. Having a hidden token attribute is a big confusing for users and showing this in a Word Cloud could have a nice educational value.

IDF in action, even though in a slightly confusing sparse format:
screen shot 2018-11-29 at 10 32 17

@ajdapretnar ajdapretnar changed the title How to visualize the difference between TF and TF.IDF on a bag of words? [ENH] Show IDF results in a Word Cloud Nov 29, 2018
@PrimozGodec
Copy link
Collaborator

It is fixed #486. Now the Word Cloud shows the bag-of-words weights in a word cloud.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants