This project was developed as an individual endeavor to predict the price of used cars based on various car attributes. Using multiple regression models, I explored how factors like brand, model, mileage, and engine type influence pricing. This project allowed me to apply a range of machine learning algorithms and to deepen my understanding of model tuning and feature engineering.
The primary objective of this project is to predict used car prices accurately, leveraging various machine learning algorithms. Through data analysis and model tuning, I aimed to achieve an optimized predictive tool that can provide insights into used car pricing.
The dataset used in this project contains the following features:
- brand: Manufacturer of the car (e.g., Toyota, Ford).
- model: Specific model of the car (e.g., Corolla, Mustang).
- model_year: Year the car model was manufactured, which influences depreciation.
- mileage: Total miles the car has been driven, which affects value.
- fuel_type: Type of fuel (e.g., petrol, diesel, electric).
- engine: Engine specifications, typically represented by engine capacity, which affects performance.
- transmission: Type of transmission (e.g., automatic, manual).
- ext_col: Exterior color of the car.
- int_col: Interior color of the car.
- accident: Whether the car has a history of accidents (yes or no).
- clean_title: Indicates if the car has a clean title, showing it hasn’t been a total loss.
- price: The target variable representing the car’s price.
During this project, I implemented a variety of machine learning algorithms to understand their effectiveness in predicting car prices. These include:
- Linear Regression
- Random Forest Regressor
- Decision Tree Regressor
- XGBoost Regressor
- CatBoost Regressor
- AdaBoost Regressor
- LightGBM Regressor
- ElasticNet Regression
To enhance model performance, I used RandomizedSearchCV for hyperparameter tuning. This allowed me to efficiently explore a range of hyperparameters and select the best configurations for each model.
-
Data Preprocessing:
- Handled missing values, encoded categorical variables, and normalized numerical features.
- Split data into training and testing sets.
-
Exploratory Data Analysis (EDA):
- Analyzed relationships between each feature and the target variable (price).
- Identified high-impact features for the predictive models.
-
Model Training and Evaluation:
- Trained each model on the training set and evaluated on the test set using metrics such as RMSE
- Compared model performances to determine the best-fit model for price prediction.
-
Hyperparameter Tuning:
- Used RandomizedSearchCV to fine-tune each model's hyperparameters for optimal performance.
This project helped me build a strong understanding of machine learning models, especially ensemble models like XGBoost, CatBoost, and LightGBM. I also gained insight into how tuning hyperparameters and selecting relevant features impact predictive accuracy.
- Clone this repository:
git clone https://github.com/bPavan16/Used-Car-Price-Prediction.git