This project is a machine learning-based classifier built to predict the species of an Iris flower based on its physical characteristics. The classifier leverages the classic Iris Dataset and is implemented using Jupyter Notebook.
- Overview
- Dataset
- Project Structure
- Requirements
- Installation
- Usage
- Model Performance
- Results
- Future Improvements
- References
This project demonstrates a supervised learning approach to classify the species of an Iris flower based on four key features:
- Sepal Length
- Sepal Width
- Petal Length
- Petal Width
Using these features, the classifier can predict the species among Setosa, Versicolour, and Virginica.
The Iris Dataset is widely used in data science and machine learning for classification tasks. It consists of 150 samples with the following columns:
- Features: Sepal Length, Sepal Width, Petal Length, Petal Width
- Target: Species (Setosa, Versicolour, Virginica)
The dataset is available in the UCI Machine Learning Repository.
Iris-Classification/
├── notebooks/
│ └── iris_classification.ipynb # Jupyter Notebook with data exploration, model training, and evaluation
├── README.md # Project documentation
└── requirements.txt # Python dependencies
To run this notebook, you’ll need Python 3.8+ and the following libraries:
- Jupyter Notebook
- NumPy
- Pandas
- Scikit-learn
- Matplotlib (for visualizations)
Install all dependencies using:
pip install -r requirements.txt
Clone the repository and navigate into the project directory:
git clone https://github.com/username/Iris-Classification.git
cd Iris-Classification
-
Launch Jupyter Notebook
Open the Jupyter Notebook environment:jupyter notebook
-
Open the Notebook
In the Jupyter interface, navigate to thenotebooks/iris_classification.ipynb
file and open it. -
Run Cells Sequentially
Execute each cell to load the dataset, explore data, train the model, and evaluate its performance.
The classifier was evaluated primarily on accuracy. Below are some example results based on common classifiers:
Model | Accuracy |
---|---|
Logistic Regression | 95% |
Support Vector Machine (SVM) | 96% |
Decision Tree | 94% |
The models performed well on the dataset, achieving high accuracy for classifying the three Iris species. The SVM model performed the best in this setup.
Consider the following potential improvements:
- Hyperparameter Optimization: Tune the model for better performance.
- Model Comparison: Experiment with ensemble methods like Random Forest and Gradient Boosting.
- Visualizations: Add more visualizations for feature importance and decision boundaries.