-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integration with imbalanced-learn #14
Comments
Hi @glemaitre, absolutely, I was planning for a long time to contact you about this, but you were faster. I think I absolutely agree with the benchmarking of other techniques, too, honestly, this would have been my next project in this topic. I can refine and generalize the evaluation framework quickly. I think we should select the scope (the methods of interest) properly and we could kick off something like this very quickly. I was also thinking about creating some sort of a "super-wrapper" package, which would wrap oversampling, ensemble, and cost-sensitive learning techniques, providing a somewhat standardized interface, exactly for the ease of benchmarking and experimentation. The benchmarking framework would fit this super-wrapper package pretty well. Any comments are welcome! |
We are absolutely on the same line
I think that this is the way to go. On our side, I think that we can become more conservative for including new SMOTE variants. We can first make implement them in
It always has been an objective of @chkoar and myself but we lack some time-bandwidth lately. Reusing some infrastructure would be really useful.
This would need to be discussed more in details but it could be one way to go. Regarding cost-sensitive methods, we were thinking about including some. In some way, we thought to trigger imbalanced-learn 1.0.0 to reorganised the module to take into account different approaches. |
Great! In order to improve the benchmarking, I try to set up some sort of a fully reproducible auto-benchmarking system as some CI/CD job. I feel like this would be the right way to keep the evaluation transparent and fully reproducible. I also think in this way |
Regarding a continuous benchmark, it is really what I had in mind: scikit-learn-contrib/imbalanced-learn#646 (comment) How much resources fo your benchmark requires? How long is it taking to run the experiment? |
Well, the experiment I run and describe in the paper took something like 3 weeks on a 32 core AWS instance, involving 85 methods with 35 different parameter settings, 4 classifiers on top of that with 6 different parameter settings for each, and a repeated k-fold cross validation with 5 splits and 3 repeats, all of that involving 104 datasets. EDIT: That's clearly too much computational work, but the majority of it was caused by 5-10 "large" datasets and 3-5 very slow, evolutionary oversampling techniques. I think that
could reduce the work to a couple of hours on a 32-64 core instance. |
@glemaitre @gykovacs IMHO the methods that we have to implement or include in |
@chkoar If we target well-described, and established methods (which appeared in highly cited journals), the number of potential techniques to include will drop to about 20-30. On the other hand, in my experience, these are typically not the best performers in average - but in the same time, "average performance" is always questionable due to the no free lunch. Seemingly, the question is whether we believe the outcome of a reasonable benchmark. I think it might make sense, as the methods users look for should perform well on the "smooth" problems related to real classification datasets, and this might be captured by a benchmark dataset. One more remark on my experiences: usually less-established, simple methods were found to be robust enough to provide acceptable performance on all datasets. These are usually described in hard-to-access, super short conference papers. |
As I said, I didn't mean about inclusion but for prioritization. So we will not have a bunch of methods initially, as it's @glemaitre concern if I understood correctly.
I totally agree. That's why I do not find a reason for a method not to be included in the If that is was the case the main |
I did some experimentation with CircleCI, it doesn't seem to be suitable for an automated benchmarking in the community subscription plan, too much of a workload even if one relatively small dataset is used. I also got concerned about my previous idea to use CI/CD for benchmarking. I can imagine a standalone benchmarking solution, which can be installed to any machine, checks out packages and datasets providing some quasi-standard interfaces for benchmarking, runs experiments where code has changed, and publishes the results on a local web-server. Maintaining the solution and linking something like this to any documentation page doesn't seem to be a burden, yet the solution is flexible and can be moved around in the clouds easily when needed. I think my company could even finance an instance like this. The main difference compared to CI/CD is that it would run the benchmarking regularly, not on pull requests or any other hooks. Any comments are welcome! Do you have experience or anything particular on your mind regarding a proper benchmarking solution? |
@gykovacs Would you be interested in testing your benchmarks on the newer They do seem promising, at least to my untrained eye. |
@gykovacs I was wondering if you would be interested in an integration of some of the algorithm in
imbalanced-learn
. It would be really nice to have more variant inimbalanced-learn
and actually use your benchmark to have a better idea of what to include.I was wondering if it would also make sense to compare other methods (e.g. under-sampling) to have a big picture of what is actually working globally.
The text was updated successfully, but these errors were encountered: