Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xgboost tweedie loss predictions do not match #690

Open
Maggie1216 opened this issue Mar 14, 2023 · 2 comments
Open

xgboost tweedie loss predictions do not match #690

Maggie1216 opened this issue Mar 14, 2023 · 2 comments
Labels
enhancement New feature or request help wanted Extra attention is needed needs investigation

Comments

@Maggie1216
Copy link

Maggie1216 commented Mar 14, 2023

Hi team, I'm using hummingbird_ml==0.4.3 and xgboost==1.5.2, and testing on XGBRegressor with objective reg:tweedie predictions.

import xgboost as xgb
import pandas as pd
import hummingbird
from hummingbird.ml import convert
from sklearn.datasets import *

train_x, train_y = load_diabetes(return_X_y=True)
xgb_tweedie = xgb.XGBRegressor(objective='reg:tweedie', n_estimators = 50, tweedie_variance_power = 1.8)
xgb_tweedie.fit(train_x, train_y)
print(xgb_tweedie.predict(train_x[:10]))
xgb_tweedie_torch = convert(xgb_tweedie, 'pytorch', extra_config = {'post_transform': 'TWEEDIE'})
print(xgb_tweedie_torch.predict(train_x[:10]))

It prints:
[160.32375 73.65087 140.53572 208.20435 115.15947 99.853676
125.59772 64.26746 110.12681 298.41394 ]
[528.6581 242.85928 463.40848 686.5432 379.7321 329.2624 414.15146
211.91847 363.13666 984.0033 ]

After some analysis (I generated 1000 different regression datasets, also tried different tweedie_variance_power, etc.), I found that the xgb_tweedie_torch (after conversion) predictions are always 3.2974 * xgb_tweedie (before conversion). For example, 160.32375 * 3.2974 = 528.6581. I wonder why this is the case?

@ksaur
Copy link
Contributor

ksaur commented Mar 15, 2023

Hi @Maggie1216 thank you for the detailed example! It's possible that our implementation of tweedie does not cover some case. We'll add it to the backlog!

@ksaur ksaur added help wanted Extra attention is needed enhancement New feature or request needs investigation labels Mar 15, 2023
@gorkemozkaya
Copy link

The constant 3.2974 happens to be 2 * exp(0.5), and 0.5 is the default base_score in XGBoost models. I suspect this discrepancy is related to how the base_score is handled in transforms.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed needs investigation
Projects
None yet
Development

No branches or pull requests

3 participants