G_store_revenue

Kaggle project 201809-201811

Files and Timeline

Updating

Coding.R

All codes, including data processing, eda, modeling and so on.

Created 20180926

workspace_10000 samples in test, all train with y + datetime and NAs transfered to NAs.RData

-All train data with y, datatime dealt with and NAs transfered to NAs(those variables with all NAs are removed);

-Test data, sampled 10000 from population

Created 20180928

workspace_10000 samples in test, all train with y + datetime and NAs transfered to NAs 0.85 + adjustments.RData

-Threshold adjusted to 0.85, which means when that percentage of NA exceeds 15%, this feature will be dropped;

-Saved a few variables, with NA has real meaning, such as isTrueDirect (NA means FALSE) and firstvisit (NA means 0);

-Deleted a deplicated value, the Zulia case.

Created 20181006

workspace_10000 samples in test, all train with y(unit $) + datetime and data class ad NA dealt with.RData

-EDA(not done yet);

-unit for y is now $;

-corrected typeof for all columns;

-Now almost ready for modeling, except for dummy variables tranformations.

Created 20181019

workspace_10000 samples in test, all train with y(unit $) + datetime and data class ad NA dealt with+onehotecoding_2_versions.RData

-Deleted no-use column visits (all ones);

-Two versions of one-hot encoding (one with combined variables (onehottedtrain_com) , one without (onehottedtrain) );

-A first modeling try (random forest) on data, reached 15.9% R-square;

-I think now it's ready for more models!

Created 20181020

[RF]workspace_10000 samples in test, all train with y(unit $) + datetime and data class ad NA dealt with+onehotecoding_2_versions+More Cleaning.RData

-Created ln(y) and decided to replace y;

-More eda with focus on correlation analysis;

-More data cleaning based on the eda, e.g. removed highly relevant variables;

-RF model, with rf_models and rf_best (now R-square is about 18%).

Created 20181118

[RF+GBDT]workspace_10000 samples in test, all train with y(unit $) + datetime and data class ad NA dealt with+onehotecoding_2_versions+More Cleaning.RData

Took back the hits variable because of high performance;
GBDT model and some improvement on RF model

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
README.md		README.md
[RF+GBDT]workspace_10000 samples in test, all train with y(unit $) + datetime and data class ad NA dealt with+onehotecoding_2_versions+More Cleaning+2dateVariables.RData		[RF+GBDT]workspace_10000 samples in test, all train with y(unit $) + datetime and data class ad NA dealt with+onehotecoding_2_versions+More Cleaning+2dateVariables.RData
[RF]workspace_10000 samples in test, all train with y(unit $) + datetime and data class ad NA dealt with+onehotecoding_2_versions+More Cleaning.RData		[RF]workspace_10000 samples in test, all train with y(unit $) + datetime and data class ad NA dealt with+onehotecoding_2_versions+More Cleaning.RData
coding.R		coding.R
workspace_10000 samples in test, all train with y + datetime and NAs transfered to NAs 0.85 + adjustments.RData		workspace_10000 samples in test, all train with y + datetime and NAs transfered to NAs 0.85 + adjustments.RData
workspace_10000 samples in test, all train with y + datetime and NAs transfered to NAs.RData		workspace_10000 samples in test, all train with y + datetime and NAs transfered to NAs.RData
workspace_10000 samples in test, all train with y(unit $) + datetime and data class ad NA dealt with+onehotecoding_2_versions.RData		workspace_10000 samples in test, all train with y(unit $) + datetime and data class ad NA dealt with+onehotecoding_2_versions.RData
workspace_10000 samples in test, all train with y(unit $) + datetime and data class ad NA dealt with.RData		workspace_10000 samples in test, all train with y(unit $) + datetime and data class ad NA dealt with.RData

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

G_store_revenue

Files and Timeline

Updating

Created 20180926

Created 20180928

Created 20181006

Created 20181019

Created 20181020

Created 20181118

About

Releases

Packages

Languages

Maggie1216/G_store_revenue

Folders and files

Latest commit

History

Repository files navigation

G_store_revenue

Files and Timeline

Updating

Created 20180926

Created 20180928

Created 20181006

Created 20181019

Created 20181020

Created 20181118

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages