From e2486ab4e3720d3ded5b34cc63daae10b4674c86 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?M=C3=A1rton=20Kardos?= Date: Tue, 30 Jul 2024 14:15:38 +0200 Subject: [PATCH] Updated readme --- README.md | 56 +++++++++++++++++++++++++++++-------------------------- 1 file changed, 30 insertions(+), 26 deletions(-) diff --git a/README.md b/README.md index c9a309e..252342e 100644 --- a/README.md +++ b/README.md @@ -9,9 +9,10 @@ - Semantic Signal Separation - SĀ³ šŸ§­ - KeyNMF šŸ”‘ (paper in progress ā³) - GMM :gem: (paper soon) - - Implementations of existing transformer-based topic models + - Implementations of other transformer-based topic models - Clustering Topic Models: BERTopic and Top2Vec - Autoencoding Topic Models: CombinedTM and ZeroShotTM + - FASTopic :zap: - Streamlined scikit-learn compatible API šŸ› ļø - Easy topic interpretation šŸ” - Dynamic Topic Modeling šŸ“ˆ (GMM, ClusteringTopicModel and KeyNMF) @@ -19,43 +20,45 @@ > This package is still work in progress and scientific papers on some of the novel methods are currently undergoing peer-review. If you use this package and you encounter any problem, let us know by opening relevant issues. -### New in version 0.4.0 +### New in version 0.5.0 -#### Online KeyNMF +#### Hierarchical KeyNMF -You can now online fit and finetune KeyNMF as you wish! +You can now subdivide topics in KeyNMF at will. ```python -from itertools import batched from turftopic import KeyNMF -model = KeyNMF(10, top_n=5) - -corpus = ["some string", "etc", ...] -for batch in batched(corpus, 200): - batch = list(batch) - model.partial_fit(batch) +model = KeyNMF(2, top_n=15, random_state=42).fit(corpus) +model.hierarchy.divide_children(n_subtopics=3) +print(model.hierarchy) ``` -#### $S^3$ Concept Compasses +
+ +Root
+ā”œā”€ā”€ 0: windows, dos, os, disk, card, drivers, file, pc, files, microsoft
+ā”‚ ā”œā”€ā”€ 0.0: dos, file, disk, files, program, windows, disks, shareware, norton, memory
+ā”‚ ā”œā”€ā”€ 0.1: os, unix, windows, microsoft, apps, nt, ibm, ms, os2, platform
+ā”‚ ā””ā”€ā”€ 0.2: card, drivers, monitor, driver, vga, ram, motherboard, cards, graphics, ati
+ā””ā”€ā”€ 1: atheism, atheist, atheists, religion, christians, religious, belief, christian, god, beliefs
+. ā”œā”€ā”€ 1.0: atheism, alt, newsgroup, reading, faq, islam, questions, read, newsgroups, readers
+. ā”œā”€ā”€ 1.1: atheists, atheist, belief, theists, beliefs, religious, religion, agnostic, gods, religions
+. ā””ā”€ā”€ 1.2: morality, bible, christian, christians, moral, christianity, biblical, immoral, god, religion
+
+
-You can now produce a compass of concepts along two semantic axes using $S^3$. - - - - - -
- -```python -model = SemanticSignalSeparation(10).fit(corpus) -fig = model.concept_compass(topic_x=1, topic_y=4) -fig.show() -``` +#### FASTopic *(Experimental)* -
+You can now use [FASTopic](https://github.com/BobXWu/FASTopic) inside Turftopic. +```python +from turftopic import FASTopic + +model = FASTopic(10).fit(corpus) +model.print_topics() +``` ## Basics [(Documentation)](https://x-tabdeveloping.github.io/turftopic/) [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/x-tabdeveloping/turftopic/blob/main/examples/basic_example_20newsgroups.ipynb) @@ -180,6 +183,7 @@ Alternatively you can use the [Figures API](https://x-tabdeveloping.github.io/to ## References - Kardos, M., Kostkan, J., Vermillet, A., Nielbo, K., Enevoldsen, K., & Rocca, R. (2024, June 13). $S^3$ - Semantic Signal separation. arXiv.org. https://arxiv.org/abs/2406.09556 +- Wu, X., Nguyen, T., Zhang, D. C., Wang, W. Y., & Luu, A. T. (2024). FASTopic: A Fast, Adaptive, Stable, and Transferable Topic Modeling Paradigm. ArXiv Preprint ArXiv:2405.17978. - Grootendorst, M. (2022, March 11). BERTopic: Neural topic modeling with a class-based TF-IDF procedure. arXiv.org. https://arxiv.org/abs/2203.05794 - Angelov, D. (2020, August 19). Top2VEC: Distributed representations of topics. arXiv.org. https://arxiv.org/abs/2008.09470 - Bianchi, F., Terragni, S., & Hovy, D. (2020, April 8). Pre-training is a Hot Topic: Contextualized Document Embeddings Improve Topic Coherence. arXiv.org. https://arxiv.org/abs/2004.03974