About • Dataset • Similarity Algorithms
This project aims to find the most similar data in news categories using four different similarity algorithms.
In the similarity algorithm implementation part, four different similarity algorithms are used to find the most similar data in news categories. These algorithms include Cosine Similarity, Jaccard Similarity, Euclidean Distance, and Manhattan Distance.
The dataset used in this project is the News Category Dataset, which can be found on Kaggle at https://www.kaggle.com/rmisra/news-category-dataset. This dataset contains news articles from various categories, including business, entertainment, politics, sports, and technology.
Four different similarity algorithms are used in this project to find the most similar data in news categories:
- Cosine Similarity
- Jaccard Similarity
- Euclidean Distance
- Manhattan Distance