Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scaling/Rescaling needed for regression outputs #73

Closed
EyalWirsansky opened this issue Oct 23, 2020 · 3 comments · Fixed by #113
Closed

Scaling/Rescaling needed for regression outputs #73

EyalWirsansky opened this issue Oct 23, 2020 · 3 comments · Fixed by #113
Labels
enhancement New feature or request

Comments

@EyalWirsansky
Copy link

Is your feature request related to a problem? Please describe.
Currently scaling of training data (via transformations such as MeanStdDevTransformation) only applies to the features but not to the outputs; however regression outputs need to be scaled as well for some models to train properly. Specifically, this causes the RBF SVM to perform much more poorly comparing to scikit-learn, where it's easy to scale the entire dataset.

Describe the solution you'd like
Adding the option to scale the output of a training dataset in addition to the features when training a regressor. This also means that the output of the regressor will be inverse-scaled when performing predictions.

Describe alternatives you've considered
'Manually' scaling and inverse-scaling outside Tribuo's training/prediction flow. This is cumbersome, and in addition will not be included in the provenance.

Additional context

@EyalWirsansky EyalWirsansky added the enhancement New feature or request label Oct 23, 2020
@Craigacp
Copy link
Member

Thanks for the report, there are a few ways we could integrate this support. Wrapping it via a StandardisingTrainer similar to the TransformTrainer would induce another dataset copy, whereas integrating it directly into the affected regression trainers would be a bunch more code. We'll have a look at figure out which way seems most efficient.

@Craigacp
Copy link
Member

Craigacp commented Nov 6, 2020

There's a prototype for LibSVM models here - https://github.com/oracle/tribuo/tree/regression-rescaling. We're currently trying to figure out if there is a way to build that into all regression models without too much repeated code (and even if it's necessary for things like XGBoost).

@Craigacp
Copy link
Member

Craigacp commented Mar 4, 2021

After some checking it didn't seem necessary in models other than LibSVM regressors, so we just added an option to that trainer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants