YouTube Trending Data Analysis Machine Learning Project

This project was created in order try various Machine Learning models on Youtube's Trending video statistics obtained from Kaggle for educational purposes. The main dataset used in this project is the one from the United States last updated on December 5th 2021. Datasets from various countries can be downloaded and retrieved from: YouTube Trending Video Dataset (updated daily)

Image retrieved from: Galaxy Marketing YouTube Stats

This dataset was created using a webscraper that used the Youtube Data API, which is now a part of Google Cloud Platform. The scraper itself can be found at the following link: https://github.com/mitchelljy/Trending-YouTube-Scraper. The dataset that is updated daily is at the following kaggle site YouTube Trending Video Dataset (updated daily).

The scrapper can create useable data in the from '.csv' files for different countries. Every single dataset comes with a column called category_id which is different for every region (there are a total of five regions in the dataset) most likely corresponding to:

Americas (North and South America)
Europe
Africa
Asia
Australia

Each file comes with a 'JSON' file in which users can retrieve the corresponding caterogry id's. An example of a category is music. I'll initially start with creating models with just data from the United States. Then potentially test on data from other countries to see if the models are consitent.

USA Dataset

The csv file has 95391 rows and 16 columns. The category id's json file creates an additional column. I then created the following:

'category' descriptive qualitative representations of the 'categoryId'
'trending_date_dt' python datetime version of the 'trending date'
'published_date' python datetime version of the 'publishedAt'
'time_till_trending' python datetime version of the 'trending_date_dt'

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.idea		.idea
.ipynb_checkpoints		.ipynb_checkpoints
Binomialdistribution.py		Binomialdistribution.py
Gaussiandistribution.py		Gaussiandistribution.py
Generaldistribution.py		Generaldistribution.py
PreProcessing.py		PreProcessing.py
README.md		README.md
Youtube-Video-Trending-Analysis.ipynb		Youtube-Video-Trending-Analysis.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

YouTube Trending Data Analysis Machine Learning Project

Table of contents

Introduction

USA Dataset

About

Releases

Packages

Languages

GateraGael/Machine-Learning-Project-Youtube-Trend-Analysis

Folders and files

Latest commit

History

Repository files navigation

YouTube Trending Data Analysis Machine Learning Project

Table of contents

Introduction

USA Dataset

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages