Skip to content

Latest commit

 

History

History
22 lines (17 loc) · 893 Bytes

File metadata and controls

22 lines (17 loc) · 893 Bytes

Chapter 8: Applying Machine Learning to Sentiment Analysis

Chapter Outline

  • Preparing the IMDb movie review data for text processing
    • Obtaining the IMDb movie review dataset
    • Preprocessing the movie dataset into more convenient format
  • Introducing the bag-of-words model
    • Transforming words into feature vectors
    • Assessing word relevancy via term frequency-inverse document frequency
    • Cleaning text data
    • Processing documents into tokens
  • Training a logistic regression model for document classification
  • Working with bigger data – online algorithms and out-of-core learning
  • Topic modeling
    • Decomposing text documents with Latent Dirichlet Allocation
    • Latent Dirichlet Allocation with scikit-learn
  • Summary

Please refer to the README.md file in ../ch01 for more information about running the code examples.