Minimal Varianсe Sampling booster #4266

kruda · 2021-05-09T08:40:04Z

Implementation of minimal variance sampling for stochastic gradient boosting. Contributes to #2644.
Note for reviers

I've tested it on examples for binary and multiclass classification from lightgbm repository, tried to train on Higgs dataset using slightly modified config from https://github.com/guolinke/boosting_tree_benchmarks.git. Trained several times on https://www.kaggle.com/c/DontGetKicked .
On tested datasets this MVS implementation shows better score and overfitts, than simple bagging strategy with same sampling rate.
Update:
Mean relative error change table:

Algorithm\sampling fraction	0.05	0.1625	0.275	0.3875	0.5	0.6125	0.725	0.8375	0.95
mvs	8.314	3.032	0.775	0.061	-0.391	-0.403	-0.26	-0.185	0.084
sgb	13.037	4.516	3.169	2.569	2.105	1.521	1.08	0.343	-0.155
goss	17.263	4.672	3.285	2.223	1.196	0.488	0.007	-0.05	-0.079
mvs_adaptive	8.314	2.919	0.904	-0.195	-0.451	-0.466	-0.274	-0.079	0.206

Example on adult dataset:

Code to reproduce:
https://github.com/kruda/lightgbm_mvs_experiments

…nistic execution on small datasets/

This reverts commit d50769e

This reverts commit f531f3a.

ghost · 2021-05-09T08:40:16Z

All CLA requirements met.

…added spinx version upper bound

jameslamb

Thanks very much for the contribution! I'll do a more complete review in the next day or two.

For now, I left two very minor suggestions for the R side.

Could you also please add some tests on this feature? As a start, you could try updating the Dask tests to cover this code:

LightGBM/tests/python_package_test/test_dask.py

Line 44 in a421217

boosting_types = ['gbdt', 'dart', 'goss', 'rf']

If you get stuck, please let me know and I'd be happy to help.

R-package/src/Makevars.in

R-package/src/Makevars.win.in

Co-authored-by: James Lamb <[email protected]>

…::ResetConfig

jameslamb · 2021-09-10T14:19:37Z

@kruda are you still interested in pursuing this pull request?

jameslamb · 2021-10-30T05:07:03Z

@shiyu1994 what does it mean that this PR is listed in #4677 with your name next to it?

Does it mean you're planning a separate implementation of MVS from your own branch? And if that's it, should this PR be closed? It has been almost 4 months since it received a new commit.

shiyu1994 · 2021-10-30T08:16:07Z

Hi @jameslamb, sorry if that confuses you. No. We are just trying to merge this PR. If @kruda is not responding, can we merge this first (there are only minor issues to be fixed). And we can open some subsequent PRs to complete it. What do you think about this?

I'll remove my name from the list to avoid ambiguity. Thanks you.

StrikerRUS · 2021-10-30T15:02:52Z

I suppose it'll be better to continue the work (apply fixes) directly in this PR. If @kruda didn't unchecked enabled by default possibility to push into this branch by maintainers, then it'll possible.
I'd better prefer not having the repository in a undesirable state if it can be avoided (my position hasn't changed since #2815 (comment)).
https://docs.github.com/en/github/collaborating-with-pull-requests/working-with-forks/allowing-changes-to-a-pull-request-branch-created-from-a-fork

shiyu1994 · 2021-10-30T15:55:05Z

Oh sure. It would be the best if we can directly push to this branch. Thanks for your suggestion.

BTW, @StrikerRUS @jameslamb, I think both of you are more experienced then me in managing open-source projects. So if any of my behaviors makes you confused or seems no good to the project, please feel free to correct me. Thanks!

StrikerRUS · 2021-10-31T13:04:58Z

@shiyu1994 Thank you for your kindness and openness! I believe everything is fine, please don't worry.

jameslamb · 2021-11-01T22:04:54Z

I suppose it'll be better to continue the work (apply fixes) directly in this PR

I agree with @StrikerRUS . If someone opens a pull request into this project from their fork (even master), unless they've explicitly opted out, I think it's ok for maintainers to push directly to that branch to ensure the PR gets finished in an acceptable amount of time.

It's ok for you to push changes directly here @shiyu1994 . Then @StrikerRUS , I, and others can review them when you're ready.

guolinke · 2022-03-09T16:06:44Z

I think it is better to merge this PR into a branch of the LightGBM repo first, then we continue to develop on that branch, and merge it to the master branch when it is ready.

StrikerRUS · 2022-03-10T00:40:14Z

@guolinke

I think it is better to merge this PR into a branch of the LightGBM repo first, then we continue to develop on that branch, and merge it to the master branch when it is ready.

Just created a new mvs_dev branch and re-targeted this PR into it.

guolinke · 2022-03-10T05:53:57Z

the merge is blocked by conflicts 😅 , it seems most of them are document related. @shiyu1994 @jameslamb can you help to reslove them?

…t#5068) * Update test_windows.ps1 * Update .appveyor.yml * Update test_windows.ps1 * Update .appveyor.yml

shiyu1994 · 2022-03-11T14:06:45Z

I'm picking this up. This PR is almost ready.

guolinke · 2022-03-12T02:47:22Z

@shiyu1994 then we still merge it to mvs_dev branch first, then you can develop based on it?

jameslamb · 2022-03-12T04:33:13Z

then we still merge it to mvs_dev branch first, then you can develop based on it?

We should definitely do that. @kruda could delete their fork or this branch at any time. Let's move merge this code to mvs_dev as soon as possible so that doesn't happen.

shiyu1994 · 2022-03-12T16:14:42Z

Let's move merge this code to mvs_dev as soon as possible

Done with that. Will work on microsoft:mvs_dev instead.

StrikerRUS · 2022-03-12T19:11:01Z

@kruda Thank you so much for your work!

github-actions · 2023-11-15T00:21:18Z

This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

kruda added 10 commits April 12, 2021 13:09

Added base for minimal variance sampling booster

1af3f3e

Implemented MVS booster with support for multioutput targets, determi…

0ad2740

…nistic execution on small datasets/

Merge remote-tracking branch 'upstream/master'

08462f9

Updated documentation and fixed some linting errors

b067a5b

fixed python sklearn documentation, tryed to fix R Cran CI

8229008

Second attempt to fix R pipeline

0f2620e

Fixed R package build for windows and linting error

d50769e

Revert "Fixed R package build for windows and linting error"

f531f3a

This reverts commit d50769e

Revert "Revert "Fixed R package build for windows and linting error""

ef1a28c

This reverts commit f531f3a.

Fixed some documentation

c610035

kruda requested review from btrotta, chivee, guolinke, henry0312, jameslamb, Laurae2, shiyu1994, StrikerRUS and wxchan as code owners May 9, 2021 08:40

StrikerRUS added the feature label May 9, 2021

kruda added 4 commits May 9, 2021 15:14

Fixed intendation error in mvs.hpp, fixed some windows build issues, …

4425874

…added spinx version upper bound

Fixed intendation error in mvs.hpp, fixed some windows build issues, …

a5b72f8

…added spinx version upper bound

Merge branch 'master' into master

64a99f4

Update requirements_base.txt

fb8ff6e

jameslamb previously requested changes May 9, 2021

View reviewed changes

R-package/src/Makevars.in Outdated Show resolved Hide resolved

R-package/src/Makevars.win.in Outdated Show resolved Hide resolved

kruda and others added 4 commits May 10, 2021 16:28

Update R-package/src/Makevars.in

d499d15

Co-authored-by: James Lamb <[email protected]>

Update R-package/src/Makevars.win.in

8a01fb8

Co-authored-by: James Lamb <[email protected]>

Added MVS booster support for dask tests

4b630a1

Merge remote-tracking branch 'origin/master'

29fc099

Updated documentation, MVS::GetLambda, MVS::GetThreshold, updated MVS…

ddcab83

…::ResetConfig

StrikerRUS mentioned this pull request Sep 23, 2021

allow inclusion in C programs #4608

Merged

shiyu1994 mentioned this pull request Oct 14, 2021

[Draft] Oct~Nov iteration Plan #4677

Closed

16 tasks

StrikerRUS changed the base branch from master to mvs_dev March 10, 2022 00:40

StrikerRUS requested review from jmoralez, hzy46 and tongwu-sh as code owners March 10, 2022 00:40

[ci] fix current master fails with graphviz-related error (microsof…

31ab4d4

…t#5068) * Update test_windows.ps1 * Update .appveyor.yml * Update test_windows.ps1 * Update .appveyor.yml

sync with LightGBM/master

9ce06ae

shiyu1994 merged commit 86822d6 into microsoft:mvs_dev Mar 12, 2022

StrikerRUS mentioned this pull request Mar 22, 2022

Minimal Varianсe Sampling (MVS) booster #5091

Closed

shiyu1994 mentioned this pull request Aug 29, 2022

[ci][fix] Fix cuda_exp ci #5438

Merged

jameslamb removed the in progress label Aug 13, 2023

github-actions bot locked as resolved and limited conversation to collaborators Nov 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Minimal Varianсe Sampling booster #4266

Minimal Varianсe Sampling booster #4266

kruda commented May 9, 2021 •

edited

Loading

ghost commented May 9, 2021 •

edited by ghost

Loading

jameslamb left a comment

jameslamb commented Sep 10, 2021

jameslamb commented Oct 30, 2021

shiyu1994 commented Oct 30, 2021

StrikerRUS commented Oct 30, 2021

shiyu1994 commented Oct 30, 2021 •

edited

Loading

StrikerRUS commented Oct 31, 2021

jameslamb commented Nov 1, 2021

guolinke commented Mar 9, 2022 •

edited

Loading

StrikerRUS commented Mar 10, 2022

guolinke commented Mar 10, 2022

shiyu1994 commented Mar 11, 2022

guolinke commented Mar 12, 2022

jameslamb commented Mar 12, 2022

shiyu1994 commented Mar 12, 2022

StrikerRUS commented Mar 12, 2022

github-actions bot commented Nov 15, 2023

Minimal Varianсe Sampling booster #4266

Minimal Varianсe Sampling booster #4266

Conversation

kruda commented May 9, 2021 • edited Loading

ghost commented May 9, 2021 • edited by ghost Loading

jameslamb left a comment

Choose a reason for hiding this comment

jameslamb commented Sep 10, 2021

jameslamb commented Oct 30, 2021

shiyu1994 commented Oct 30, 2021

StrikerRUS commented Oct 30, 2021

shiyu1994 commented Oct 30, 2021 • edited Loading

StrikerRUS commented Oct 31, 2021

jameslamb commented Nov 1, 2021

guolinke commented Mar 9, 2022 • edited Loading

StrikerRUS commented Mar 10, 2022

guolinke commented Mar 10, 2022

shiyu1994 commented Mar 11, 2022

guolinke commented Mar 12, 2022

jameslamb commented Mar 12, 2022

shiyu1994 commented Mar 12, 2022

StrikerRUS commented Mar 12, 2022

github-actions bot commented Nov 15, 2023

kruda commented May 9, 2021 •

edited

Loading

ghost commented May 9, 2021 •

edited by ghost

Loading

shiyu1994 commented Oct 30, 2021 •

edited

Loading

guolinke commented Mar 9, 2022 •

edited

Loading