This project proposes a way to analyze the sentiment of a new technology, more specific that of generative language models (GLMs) based on social media data. As data, the project uses tweets about ChatGPT (a proxy product of GLMs) and identifies the industries to which users discussing this technology pertain, as well as their sentiment.
The notebooks in this repository cover the following steps
- Data cleaning and preprocessing (e.g. removal of redundant data, lemmatization etc.)
- Topic modeling based on unsupervised machine learning (LDA)
- Sentiment polarity with the help of VADER.
(Note: Once the University will publish the MSc. Thesis, a link to it will be added)