This repository contains the project for the course Text Mining and Sentiment Analysis held by Professor Alfio Ferrara at Università degli Studi di Milano in 2021.
The dataset was not uploaded to the repository due to its large size, but it is downloadable here.
The project consists of a simple Jupyter Notebook with all the calculations. A report is also available in PDF format, together with its .tex
source file.
If you downloaded the dataset from Kaggle, please store all your files in a directory named "dataset" in the same directory of the notebook. Alternatively, change the path in the data loading cell.
Note that the preprocessing takes quite a long time (2-3 hours), because of the large size of the input dataset. If you would like to access the already preprocessed files, contact me here and I will send them to you. Unfortunately, I cannot upload them for size reasons.
This project is released with a permissive Apache License. For more information, see LICENSE.