Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gbm.step() doesn't iterate for large continuous response variables #18

Open
aosmith16 opened this issue Jun 16, 2021 · 0 comments
Open

Comments

@aosmith16
Copy link

The iteration loop in gbm.step() doesn't ever start for some large-value continuous response variables.

library(dismo)
data(Anguilla_train)
Anguilla_train = Anguilla_train[1:200,]

fitcont = gbm.step(data = Anguilla_train, gbm.x = c(3:5, 7:14), gbm.y = 6, family = "gaussian",
                   tree.complexity = 5, learning.rate = 0.01, bag.fraction = 0.5)

#>  GBM STEP - version 2.9 
#>  
#> Performing cross-validation optimisation of a boosted regression tree model 
#> for DSDist and using a family of gaussian 
#> Using 200 observations and 11 predictors 
#> creating 10 initial models of 50 trees 
#> 
#>  folds are unstratified 
#> total mean deviance =  8013.378 
#> tolerance is fixed at  8.0134 
#> ntrees resid. dev. 
#> 50    5300.797 
#> now adding trees...
 

#> mean total deviance = 8013.378 
#> mean residual deviance = 4755.488 
#>  
#> estimated cv deviance = 5300.796 ; se = 365.367 
#>  
#> training data correlation = 0.848 
#> cv correlation =  0.707 ; se = 0.075 
#>  
#> elapsed time -  0.01 minutes

I poked around a bit in gbm.step() and I believe this is caused by the delta.deviance variable that is used as a condition in the while() loop that iterates through the number of trees by the step size. This variable has been hard-coded to be 1 before starting the loop, which works great for family = "bernoulli" and for smaller range continuous variables.

For some continuous variables with a large range, the while loop condition delta.deviance > tolerance.test can never be met when delta.deviance is 1 and the tolerance.test is mean.total.deviance * tolerance. In such cases, like the example above, the while loop never starts since its conditions are never met.

I tried changing the hard-coded delta.deviance from 1 to mean.total.deviance and things appeared to work fine for bernoulli and gaussian models. However, I don't know what other repercussions this has.

Another option to bypass this problem without changing the function is to make the tolerance really small for such variables so tolerance.test goes below 1 (but this may have other impacts) or to scale the response variable. If these are the best fixes, maybe add them as suggestions in the documentation?

Created on 2021-06-16 by the reprex package (v2.0.0)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant