Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spike: Investigate increase in fit time for KDDCup dataset #2642

Closed
freddyaboulton opened this issue Aug 16, 2021 · 0 comments · Fixed by #2661
Closed

Spike: Investigate increase in fit time for KDDCup dataset #2642

freddyaboulton opened this issue Aug 16, 2021 · 0 comments · Fixed by #2661
Assignees
Labels
enhancement An improvement to an existing feature. performance Issues tracking performance improvements.

Comments

@freddyaboulton
Copy link
Contributor

In our 0.31.1 release performance tests, @chukarsten noted that the KDDCup dataset is 10% slower. https://alteryx.atlassian.net/wiki/spaces/PS/pages/975536762/EvalML+v0.30.0+-+v0.30.1+Upgrade

@chukarsten Thinks it's due to the changes we made to infer_feature_types in #2610 and I agree.

That being said, someone should take a closer look at the KDDCup dataset and find the real culprit. If infer_feature_types is causing the slowdown, we should speed it up. A 10% penalty in performance is too steep a price to pay for the corner case of all null columns.

@freddyaboulton freddyaboulton added the performance Issues tracking performance improvements. label Aug 16, 2021
@tyler3991 tyler3991 added the enhancement An improvement to an existing feature. label Aug 18, 2021
@freddyaboulton freddyaboulton self-assigned this Aug 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement An improvement to an existing feature. performance Issues tracking performance improvements.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants