- To identify text messages/sms as spam or ham(non-spam).
- SMSes are limited in length, number of features that can be used for classification is small.
- Text messages generally include abbreviations, informal language, text-speak, other languages written in english.
- UCI Machine Learning Repository has a collection of sms – SMS Spam Dataset.
- It contains: A total of 4827 ham and 747 spam = 5574 messages
- The NaiveBayes classifier is used here.
- 'spam.py' file has the main program. GUI is build with tkinter.
- Execute the file in the terminal with : 'python spam.py'
- 'spamDetection.ipynb' has the analysis of the dataset.