This project aims to perform real-time sentiment analysis on Twitter data using Apache Kafka, PySpark, and machine learning models. The project includes modules for data ingestion, preprocessing, model training, real-time prediction, and logging of results in MongoDB.
To run this project, you need to have the following installed:
- Apache Kafka
- Apache Zookeeper
- Python (version >= 3.6)
- Pipenv (for managing Python dependencies)
- pymongo == 1.3.6
- pytz
- djongo
- mongodb
- docker
- pyspark
- matplotlib
- numpy
- pandas
- Clone this repository:
git clone https://github.com/your_username/twitter-sentiment-analysis.git cd twitter-sentiment-analysis
- Install Python dependencies using Pipenv:
pipenv install
-
Build the Docker containers:
docker compose build
-
Start the Docker containers:
docker compose up
Note: Make sure to install MongoDB (e.g.,
brew install mongodb
). Use the following credentials for login:admin
andpassword 1234
.
-
Navigate to the front-end directory:
cd sentiment_analysis_front
-
Run the development server:
python manage.py runserver
- Dockerfile: Defines the environment for the Kafka producer.
- producer.py: Sends tweets from
twitter_validation.csv
to the Kafka topic. - twitter_validation.csv: A dataset used by the producer to send sample tweets.
- pipeline: Contains the pipeline configurations for data processing.
- models: Logistic regression models for sentiment analysis.
- MongoDB: Stores processed tweet data after sentiment analysis.
- The web application: Built with Django to visualize sentiment analysis results.
- Dockerfile: Defines the environment for the Kafka consumer.
- consumer.py: Processes incoming tweets from Kafka, performs sentiment analysis, and stores results in MongoDB.
- save_pipeline.ipynb: Jupyter notebook for saving the machine learning pipeline.
- Overview of the models: A Jupyter notebook detailing the machine learning models used for sentiment analysis.
- Training dataset: Used for training the machine learning models.
- Abdelmajid Benjelloun
- Ayoub Bakkali
- Salma Nidar
This project is licensed under the MIT License.