-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes for CHOP #14
base: main
Are you sure you want to change the base?
Fixes for CHOP #14
Conversation
solvers/chop.py
Outdated
self.X = torch.tensor(X).to(device) | ||
self.y = torch.tensor(y > 0, dtype=torch.float64).to(device) | ||
scaler = StandardScaler() | ||
X = scaler.fit_transform(X) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should not be done in the solver but in the dataset as it changes the problem
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can it be an option in the dataset?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the libsvmdata package does some preprocessing like this. I would just offer the good dataset (normalized or not).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, it seems this was the cause of the infinity values... (Since the scaler wasn't applied at eval time) Modifying.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@GeoffNN dont change the solver but the dataset. Otherwise solvers will not be comparable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done; I added standardization as a parameter in Covtype, so it runs on both versions of the dataset. We can also just keep the standardized version if you prefer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
requirements = ['pip:https://github.com/openopt/chop/archive/master.zip', | ||
'pip:scikit-learn'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
requirements = ['pip:https://github.com/openopt/chop/archive/master.zip', | |
'pip:scikit-learn'] | |
requirements = ['pip:https://github.com/openopt/chop/archive/master.zip'] |
|
||
if self.standardized: | ||
scaler = StandardScaler() | ||
X = scaler.fit_transform(X) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would simplify and always standardize.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok -- only for Covtype? It would make sense to do it for all non-sparse datasets, right?
I'm still working on the stochastic variant -- I'll share the figures for both ASAP |
I don’t know about Madelon but the others should be already scaled |
objective.py
This doesn't fix the issue mentioned here.
This still needs work for the stochastic case.