This module covers the basics of Machine Learning pipelines, from dealing with the data shortcomings, selecting the appropriate model, and training/testing the model. We include a hands-on component to walk you through the process of building a ML pipeline on a realistic scenario.
You can find the slides for this module at AI for Software Enfineering Slides.
We prepared a notebook in Kaggle with a pre-defined scenario for implementing a ML pipeline for a classification problem. The notebook can be forked and experimented on at SOEN691_GermanCreditReport.
Overview of the ML pipeline:
- The role of data on the quality of AI/ML systems
- Important factors to consider about data to mitigate significantly biasing AI systems
- How to select an appropriate model for a learnable problem considering: assumptions, explainability, stability, overfitting
- Metrics for measuring performance of ML system
Establishing the Course Project