In this project, we will build and evaluate a simple linear regression model using Python. We will employ the scikit-learn module for calculating the linear regression, while using pandas for data management and seaborn for plotting. We will be working with the very popular Advertising dataset to predict sales revenue based on advertising spending through mediums such as TV, radio, and newspaper.
Linear Regression is a useful tool for predicting a quantitative response. It assumes a linear relationship between the predictor variables and the response variable.
-
Root Mean Squared Error : 12.24744871391589
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
import seaborn as sns
The Advertising dataset captures sales revenue generated with respect to advertisement spending across multiple channels like radio, TV, and newspaper.
# Exploratory data analysis
...
- Sales to Density
- Newspaper to Density
- TV to Density
- Radio to Density
# Visualizing relationships between predictors and response
...
In our case, the linear regression model has the form:
The
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
# Print model coefficients
...
# Make predictions
...
from sklearn import metrics
- Mean Absolute Error (MAE)
- Mean Squared Error (MSE)
- Root Mean Squared Error (RMSE)
# Calculate evaluation metrics
...
This project demonstrates the implementation of a simple linear regression model using scikit-learn. It covers data preprocessing, model training, prediction, and evaluation steps, providing insights into predicting sales revenue based on advertising spends across different channels.