Skip to content

Repository for Use Case 3, Machine Learning from Morgan Stanley's PADT Services team.

Notifications You must be signed in to change notification settings

fsi-hack4autism/ms-fsi-hackathon-padtservices-team

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 

Repository files navigation

Use Case 3 - Machine Learing with ABA Data

Table of Contents

  1. Overview
  2. FeatureEngineering
  3. Modeling
  4. Evaluation
  5. NextSteps
  6. Acknowledgements

Overview

Description

Repository for Use Case 3, Machine Learning from Morgan Stanley's PADT Services team.

As a part of the hackathon, explored answers to the following questions,

Can we find commonalities among cases to create segments and find benchmarks based on looking at the data alone?

and

Can we determine patterns in skill acquisition which can help segment and benchmark?

FeatureEngineering

Observed data such as gender information to be irrelevant to actual percentage of successful trails.

Future goal is to utilizein built feature selector in python to improve clustering results and utilize more features. Initial focus was to get a basic model set up to answer, the question Other key feature selection tasks:

  • aggregating duration/time period to a singular numeric value
  • replacing nan values with mean for continuous varible/featues like lag
  • replace nan value with 0 or 1 ( discrete values) for dsicrete varaibles
  • aggregations along trialIdx, sessionIdx, to simplify initial analysis.
  • one hot encoding for calssification if not already present

Modeling:

  1. K-Means Clustering with clusteval to find best cluster with shielloute score.
  2. PCA to get a 2D picture of the cluster by using Dimensionality reduction

Evaluation

Choosing optimal number of clusters

PCA visualization for different numbers of clusters

NextSteps

  • Implement CART algoritm analysis to better estimate feature importance
  • Work on intepretability and evaluation of clustering
  • Try to answer the question on groupings based on goal/skill domain by framing it as a supervised learning problem and utilizing random forest/decision trees.
  • Explore density based clustering methods to find other patterns in the data
  • Continue working on this project outside of the hackathon

Acknowledgements

Thanks to the fanatasic organizers and tech leads in machine learning group for answering all of the questions.

About

Repository for Use Case 3, Machine Learning from Morgan Stanley's PADT Services team.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published