This project involves classifying movies into different genres based on their plot summaries using natural language processing (NLP) techniques and machine learning models.
The dataset includes movie plot summaries and their corresponding genres.
-
Data Preprocessing:
- Text cleaning and normalization.
- Tokenization and stemming.
- Converting text data into numerical representations using techniques like TF-IDF.
-
Exploratory Data Analysis (EDA):
- Visualizing the distribution of genres.
- Analyzing common words and phrases in different genres.
-
Model Building:
- Training various machine learning models like Naive Bayes, SVM, and Random Forest.
- Evaluating model performance using metrics such as accuracy, precision, recall, and F1-score.
-
Model Evaluation:
- Comparing different models.
- Selecting the best model based on evaluation metrics.
To run this project, ensure you have the required packages installed and execute the notebook.
Refer to the requirements.txt
file for a list of dependencies.