Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TopicModeling: Change output for topic vizualization #483

Merged
merged 4 commits into from
Jan 6, 2020

Conversation

ajdapretnar
Copy link
Collaborator

@ajdapretnar ajdapretnar commented Dec 11, 2019

Issue

Partially implements #450.
Partially implements #337.

Description of changes

Output All Topics is changed to output topics as rows and words as columns. Additionally, marginal topic probability is added, so topics can be visualized in MDS, similar to pyLDAvis.

While fixing this, it was necessary to fix the corpus output, which output as many topics as requested by the user, even when gensim wasn't able to find that many topics. Now the output matches the one from gensim.

Includes
  • Code changes
  • Tests
  • Documentation

@codecov-io
Copy link

codecov-io commented Dec 12, 2019

Codecov Report

Merging #483 into master will increase coverage by 0.06%.
The diff coverage is 95.83%.

@@            Coverage Diff             @@
##           master     #483      +/-   ##
==========================================
+ Coverage   61.96%   62.03%   +0.06%     
==========================================
  Files          59       59              
  Lines        6197     6216      +19     
  Branches      808      811       +3     
==========================================
+ Hits         3840     3856      +16     
- Misses       2221     2222       +1     
- Partials      136      138       +2

@ajdapretnar ajdapretnar force-pushed the topic_viewer_output branch 3 times, most recently from e16213d to fa2c80d Compare December 18, 2019 10:29
@PrimozGodec
Copy link
Collaborator

When I connect on the input corpus with election-tweets there is the dimension mismatch:

Traceback (most recent call last):
File "/Users/primoz/PycharmProjects/orange3-text/orangecontrib/text/widgets/owtopicmodeling.py", line 234, in on_result
self.Outputs.all_topics.send(self.model.get_all_topics_table())
File "/Users/primoz/PycharmProjects/orange3-text/orangecontrib/text/topics/topics.py", line 141, in get_all_topics_table
metas=np.hstack((names, topic_proba)))
File "/Users/primoz/miniconda3/envs/orange/lib/python3.7/site-packages/numpy/core/shape_base.py", line 340, in hstack
return _nx.concatenate(arrs, 1)
ValueError: all the input array dimensions except for the concatenation axis must match exactly

@PrimozGodec
Copy link
Collaborator

It looks good to me.

@PrimozGodec PrimozGodec changed the title [WIP] TopicModeling: Change output for topic vizualization TopicModeling: Change output for topic vizualization Dec 20, 2019
@PrimozGodec PrimozGodec merged commit 0a37183 into biolab:master Jan 6, 2020
@ajdapretnar ajdapretnar deleted the topic_viewer_output branch January 21, 2021 12:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants