Real-Time Twitter Sentiment Analysis with Kafka, PySpark, and Machine Learning

-Introduction

This project aims to perform real-time sentiment analysis on Twitter data using Apache Kafka, PySpark, and machine learning models. The project includes modules for data ingestion, preprocessing, model training, real-time prediction, and logging of results in MongoDB.

-Requirements

To run this project, you need to have the following installed:

Apache Kafka
Apache Zookeeper
Python (version >= 3.6)
Pipenv (for managing Python dependencies)
pymongo == 1.3.6
pytz
djongo
mongodb
docker
pyspark
matplotlib
numpy
pandas

-Installation

Clone this repository:

git clone https://github.com/your_username/twitter-sentiment-analysis.git
cd twitter-sentiment-analysis

Install Python dependencies using Pipenv:
```
pipenv install
```

-Starting the Project

Build the Docker containers:
```
docker compose build
```
Start the Docker containers:
```
docker compose up
```
Note: Make sure to install MongoDB (e.g., brew install mongodb). Use the following credentials for login: admin and password 1234.

-Starting the Web App

Navigate to the front-end directory:
```
cd sentiment_analysis_front
```
Run the development server:
```
python manage.py runserver
```

-Files and Purpose

-Kafka_Streaming/producer

Dockerfile: Defines the environment for the Kafka producer.
producer.py: Sends tweets from twitter_validation.csv to the Kafka topic.
twitter_validation.csv: A dataset used by the producer to send sample tweets.

-ML

pipeline: Contains the pipeline configurations for data processing.
models: Logistic regression models for sentiment analysis.

-Mongo

MongoDB: Stores processed tweet data after sentiment analysis.

-Sentiment_analysis_front

The web application: Built with Django to visualize sentiment analysis results.

-Traitement

Dockerfile: Defines the environment for the Kafka consumer.
consumer.py: Processes incoming tweets from Kafka, performs sentiment analysis, and stores results in MongoDB.
save_pipeline.ipynb: Jupyter notebook for saving the machine learning pipeline.

-Models.ipynb

Overview of the models: A Jupyter notebook detailing the machine learning models used for sentiment analysis.

-Twitter_training.csv

Training dataset: Used for training the machine learning models.

Authors

Abdelmajid Benjelloun
Ayoub Bakkali
Salma Nidar

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
Kafka_Streaming		Kafka_Streaming
ML		ML
Sentiment_Analysis_Front		Sentiment_Analysis_Front
mongo/data		mongo/data
myenv		myenv
trainement		trainement
.gitignore		.gitignore
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
docker-compose.yml		docker-compose.yml
models.ipynb		models.ipynb
t1.py		t1.py
twitter_training.csv		twitter_training.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Real-Time Twitter Sentiment Analysis with Kafka, PySpark, and Machine Learning

-Introduction

-Requirements

-Installation

-Starting the Project

-Starting the Web App

-Files and Purpose

-Kafka_Streaming/producer

-ML

-Mongo

-Sentiment_analysis_front

-Traitement

-Models.ipynb

-Twitter_training.csv

Authors

License

About

Releases

Packages

Languages

AbdelmajidBen/Twitter_Sentim_Analysis

Folders and files

Latest commit

History

Repository files navigation

Real-Time Twitter Sentiment Analysis with Kafka, PySpark, and Machine Learning

-Introduction

-Requirements

-Installation

-Starting the Project

-Starting the Web App

-Files and Purpose

-Kafka_Streaming/producer

-ML

-Mongo

-Sentiment_analysis_front

-Traitement

-Models.ipynb

-Twitter_training.csv

Authors

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages