Explore the parameter space to try and improve results. #11

bcipolli · 2017-06-25T17:01:38Z

To play: python main.py --csv-file raw_dataframe.csv Then add flags to explore the parameter space:

--source-thresh SOURCE_THRESH Min % of events a news source must cover, to be included.
Default 0.5; lowering this would include a broader set of news sources.
--min-article-length MIN_ARTICLE_LENGTH Min # words in an article (pre-parsing)
Set to 250. Are longer articles more biased?
--min-vocab-length MIN_VOCAB_LENGTH Min # words in an article (post-lemmatizing, vectorizing)
Set to 100. Are longer articles more biased?
--lda-min-appearances LDA_MIN_APPEARANCES Min # appearances of a word, to be included in the vocabulary
Set to 2. Could raise this, to focus on the most common words.
--lda-vectorization-type {count,tfidf} Type of vectorization of article to word counts, to do.
Set to count. Not 100% tfidf is working, but if it is, we should use it.
--lda-groupby {source,article} Run LDA on text separated by article, or by news source?
Set to article right now. this just means: what are the "documents" (sets of words) sent into LDA? Could be by article, or could aggregate over source.
--lda-topics LDA_TOPICS # of LDA topics
Set to 10. Clusters indicate that maybe a higher number could be helpful.
--lda-iters LDA_ITERS # of LDA iterations
1500. Probably could be lowered for larger datasets.
--truth-frequency-thresh TRUTH_FREQUENCY_THRESH % of articles in a news event that must mention a word, for it to be "truth" / removed.
Set to 0.5. Could be higher (e.g. 1.1 - force no words to be removed) or lower (e.g. 0.1, remove most words and leave only infrequent words for bias. Could also be implemented as a range, to say: bias words appear often, but not as often as truth words, and not as infrequently as random garbage.

The text was updated successfully, but these errors were encountered:

bcipolli · 2017-06-25T17:27:51Z

Note that the internal app caching may over-cache. If you think that's happening, just run with the --force command-line flag, to force the app to re-run all steps.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Explore the parameter space to try and improve results. #11

Explore the parameter space to try and improve results. #11

bcipolli commented Jun 25, 2017

bcipolli commented Jun 25, 2017

Explore the parameter space to try and improve results. #11

Explore the parameter space to try and improve results. #11

Comments

bcipolli commented Jun 25, 2017

bcipolli commented Jun 25, 2017