Skip to content

Latest commit

 

History

History

week11

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

CICF Week 11

This week we will look at some data wrangling on a tabular dataset. We will then fit a decision tree and a random forest model to the data.

Tutorial

This class focuses more on the tools and concepts you might encounter related to cyberinfrastructure. This means we are not going to cover the mathematics behind the machine learning algorithms in much depth. But I encourage you to look at these materials if you find the techniques interesting. This and the next tutorials are based on the Practical Deep Learning for Coders lessons by Jeremy Howard. References are included in the Resources section at the end of this file.

Random Forests

This section is modeled after the excellent tutorial by Jeremy Howard titled "How Random Forests Really Work". I recommend looking at this for more detail on how decision trees and random forests work.

Open your VM, and git pull in the cicf folder.

sudo apt install graphviz
pip install --upgrade jupyter-core nbconvert seaborn fastai

We are going to work with the Titanic dataset. Lets first look at the dataset This dataset is a passenger manifest from the Titanic.

The rest of this section is in the notebook random-forest.ipynb.

Resources

Sources for the tutorial notebook:

Other Interesting links: