Machine Learning and Object-Oriented Programming with Python
Spring 2018
Instructors:
- Alexander Goncearenco ([email protected])
- Ayal Gussow ([email protected])
Teaching assistant:
- Ryan Dhindsa ([email protected])
Important links:
Slack workspace for announcements and assistance: <http://biof509.slack.com/>
_
=======
Class grading system <https://okpy.org>
_
First class: 1st February 2018 at 5pm in building 10, room B1C205
Final class: 15th May 2018
This document is subject to revision. Last revised 1st February 2018.
By the end of this course you should be able to:
- Create working python programs using the basic features of the python language together with numpy, pandas, and biopython (A brief refresh)
- Demonstrate the tools commonly used in professional settings to aid development
- Describe the common types of machine learning tasks
- Implement a simple linear regression model utilizing numpy
- List the advantages and disadvantages of different machine learning algorithms
- Apply machine learning algorithms for both regression and classification
- Convert a data set into a form suitable for use by machine learning algorithms
- Apply dimensionality reduction to a data set for visualization and further processing
- Identify subpopulations using clustering algorithms
- Choose appropriate model parameters
- Evaluate the results of a machine learning model
- Integrate a machine learning model in a workflow
- Compare different programming paradigms including procedural, functional and object oriented.
- Define what an object is in the context of programming
- Identify the features of an object definition
- Contrast attributes, properties and methods
- Review special methods
- Design a public interface for a class
- Utilize inheritance and abstraction
- Choose when and how to raise and handle exceptions appropriately
This is a 15 week course starting on the 1st February 2018, and finishing on 10th May 2018. Classes will take place between 5:00pm and 7:00pm each Thursday in building 10, room B1C205 within the FAES Academic Center.
Attendance in class is strongly recommended; however, we realize other commitments will occasionally prevent attendance. Class materials will generally be distributed over the course website.
Most classes will have hands-on tutorials and assignments. Both practice and graded assignments will generally be provided. Graded assignments should be submitted prior to the following class. So that you can follow along during class bringing a laptop to each class is strongly encouraged.
Important dates:
- 23 February 2018 - Last day to drop/withdraw
- 30 March 2018 - Last day to change status (credit or audit)
Each student is encouraged to bring their own laptop to each class. For the course, we will use Python 3. Any python installation should work, but you must be able to install packages. The Anaconda Scientific Python Distribution from Continuum Analytics will likely be the easiest approach to configuring python if you do not already have python installed. The Anaconda installer will automatically install many of the packages we will use during the course.
There is no required textbook for this course.
We will link to relevant online resources throughout the course.
If you would like a refresher on the basics, the following resources may be useful:
Learn python the hard way (ebook freely available from the author) <http://learnpythonthehardway.org/book/>
_ by Zed A. Shaw. A video course is alsoavailable <http://learnpythonthehardway.org/>
_.Think python (ebook freely available from the author) <http://www.greenteapress.com/thinkpython/thinkpython.html>
_ by Allen B. Downey.
Further material is included on the :doc:Week 1 page </week1/index>
.
The following books cover some of the same material we will cover during the course. These books are not required, and presented solely as an alternative starting point covering the course objectives.
The Elements of statistical learning (ebook freely available from the authors) <http://statweb.stanford.edu/~tibs/ElemStatLearn/>
_ by Trevor Hastie, Robert Tibshirani, and Jerome Firedman.Python Machine Learning <http://sebastianraschka.com/books.html>
_ by Sebastian Rashka. Release of a second edition is imminent, notebooks arealready available <https://github.com/rasbt/python-machine-learning-book-2nd-edition>
_Python 3 Object Oriented Programming <https://www.packtpub.com/application-development/python-3-object-oriented-programming>
_ by Dusty Phillips
The emphasis of the course is on learning and mastering the skills covered. It is our hope that everyone will be able to complete the assignments and project. If some of the material appears unclear please ask for clarification.
The final project is 50% of the course, with the weekly assignments representing the remainder.
Weekly Assignments
Weekly assignments will generally consist of multiple components. Unless otherwise specified, each component will be graded pass / fail. A component will be graded as "pass" if it runs and produces the expected results. The final grade will be equal to the percentage of components that are graded as "pass" out of all the assignment's components.
Final Project
The final project will consist of the following components:
-
Project documentation. Each project should have documentation clarifying its goal and functionality. The code itself should be well-documented, with comments spread out to aid understanding. Functions and classes should have docstrings describing their functionality, inputs and outputs.
-
Project code. The code should be well-organized and easy to read. It should also be written modularly, so that each part of code is reusable. The code should run and produce the correct output under different conditions. It should also have robust error checking.
-
Project presentation. Each student will present their project at the end of the semester. The idea here is to present the project's goals, input, and output, preferably while showing snippets of code.
Project grades will be determined based on the components outlined above, with each component representing 33% of the project grade.
Course materials will be distributed on this website in the corresponding weekly sections.
Week 1 (01 February 2018): Course overview and a Python refresher.
Week 2 (08 February 2018): Different programming paradigms. The main object-oriented programming (OOP) concepts.
Week 3 (15 February 2018): Developing applications with OOP.
Week 4 (22 February 2018): Introduction to Numpy, Pandas and Scikit-Learn.
Week 5 (01 March 2018): Plotting in Python: Matplotlib, Pandas, Seaborn.
Week 6 (08 March 2018): Data retrieval and dataset preprocessing in Scikit-Learn.
Week 7 (15 March 2018): Regression with Numpy and Scikit-Learn.
Week 8 (22 March 2018): Classification with Scikit-Learn
Week 9 (29 March 2018): Unsupervised learning and Clustering with Scikit-Learn.
Week 10 (05 April 2018): Dimensionality reduction and feature selection with Scikit-Learn.
Week 11 (12 April 2018): Deep learning and other advanced ML tasks.
Week 12 (19 April 2018): The machine learning workflow with Scikit-Learn.
Week 13 (26 April 2018): Turning Machine-Learning projects into software. Questions and Answers session.
Week 14 (03 May 2018): Project presentations and feedback. Part I.
Week 15 (10 May 2018): Project presentations and feedback. Part II.