Use Case 3 - Machine Learing with ABA Data

Overview

Description

Repository for Use Case 3, Machine Learning from Morgan Stanley's PADT Services team.

As a part of the hackathon, explored answers to the following questions,

Can we find commonalities among cases to create segments and find benchmarks based on looking at the data alone?

and

Can we determine patterns in skill acquisition which can help segment and benchmark?

FeatureEngineering

Observed data such as gender information to be irrelevant to actual percentage of successful trails.

Future goal is to utilizein built feature selector in python to improve clustering results and utilize more features. Initial focus was to get a basic model set up to answer, the question Other key feature selection tasks:

aggregating duration/time period to a singular numeric value
replacing nan values with mean for continuous varible/featues like lag
replace nan value with 0 or 1 ( discrete values) for dsicrete varaibles
aggregations along trialIdx, sessionIdx, to simplify initial analysis.
one hot encoding for calssification if not already present

Modeling:

K-Means Clustering with clusteval to find best cluster with shielloute score.
PCA to get a 2D picture of the cluster by using Dimensionality reduction

Evaluation

Choosing optimal number of clusters

PCA visualization for different numbers of clusters

NextSteps

Implement CART algoritm analysis to better estimate feature importance
Work on intepretability and evaluation of clustering
Try to answer the question on groupings based on goal/skill domain by framing it as a supervised learning problem and utilizing random forest/decision trees.
Explore density based clustering methods to find other patterns in the data
Continue working on this project outside of the hackathon

Acknowledgements

Thanks to the fanatasic organizers and tech leads in machine learning group for answering all of the questions.

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
Notebook		Notebook
image_results		image_results
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Use Case 3 - Machine Learing with ABA Data

Table of Contents

Overview

Description

Can we find commonalities among cases to create segments and find benchmarks based on looking at the data alone?

Can we determine patterns in skill acquisition which can help segment and benchmark?

FeatureEngineering

Modeling:

Evaluation

Choosing optimal number of clusters

PCA visualization for different numbers of clusters

NextSteps

Acknowledgements

About

Releases

Packages

Languages

fsi-hack4autism/ms-fsi-hackathon-padtservices-team

Folders and files

Latest commit

History

Repository files navigation

Use Case 3 - Machine Learing with ABA Data

Table of Contents

Overview

Description

Can we find commonalities among cases to create segments and find benchmarks based on looking at the data alone?

Can we determine patterns in skill acquisition which can help segment and benchmark?

FeatureEngineering

Modeling:

Evaluation

Choosing optimal number of clusters

PCA visualization for different numbers of clusters

NextSteps

Acknowledgements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages