Code excerpts

...

>>> import texthero as hero
>>> import pandas as pd
>>> import pyLDAvis
>>> 
>>> # Load an example dataset.
>>> df = pd.read_csv("https://github.com/jbesomi/texthero/raw/master/dataset/bbcsport.csv")[["text"]]
>>> 
>>> # Clean and tokenize all documents.
>>> df['text_preprocessed'] = df['text'].pipe(hero.clean).pipe(hero.tokenize)
>>> 
>>> # Calculate a document term matrix (with tfidf)
>>> df_document_term = df['text_preprocessed'].pipe(hero.tfidf)
>>> 
>>> # Use LDA to get a matrix relating documents to abstract "topics".
>>> df_document_topic = hero.lda(df_document_term,n_components=5)
>>> 
>>> # Calculate document-topic and topic-term matrix
>>> df_document_topic, df_topic_term = hero.topic_matrices(df_document_term, df_document_topic)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             
>>> 
>>> # l1-Normalize the topic matrices.
>>> df_document_topic_distribution = hero.normalize(df_document_topic, norm="l1")
>>> df_topic_term_distribution = hero.normalize(df_topic_term, norm="l1")
>>> 
>>> # Get the LDAvis figure with relevant words per topic.
>>> figure = hero.relevant_words_per_topic(
... df_document_term, df_document_topic_distribution, df_topic_term_distribution, return_figure=True
... ) 
>>> pyLDAvis.display(figure)

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
README.md		README.md
conversations.csv		conversations.csv
terorism.csv		terorism.csv
terrorism_query_dataset.csv		terrorism_query_dataset.csv
tweets_liverpool_vs_watford.csv		tweets_liverpool_vs_watford.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Code excerpts

About

Releases

Packages

Contributors 2

SummerOfCode-NoHate/Data

Folders and files

Latest commit

History

Repository files navigation

Code excerpts

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Packages