-
Notifications
You must be signed in to change notification settings - Fork 177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrades XGBoost to 1.3.1 #107
Conversation
I rolled this back to 1.3.1 as that version has a macOS binary in Maven Central, and the fixes in 1.3.2 are Python or Solaris related. |
I'm trying this out locally on macOS (11.2.1). Seems to work fine. With
Also multi-threading doesn't seem to be working (just based on looking at CPU usage - tried |
I believe the JVM builds of XGBoost for macOS are currently compiled by the developers rather than the CI, and the logger gives the line of the code and it's path for each logging message. So I think thats probably compiled by https://github.com/CodingCat who does a lot of the JVM work in XGBoost. This is probably changing in the next few releases, I've been talking to them about adding Windows support to the Maven Central builds and there is a CI build for macOS too - dmlc/xgboost#6630 (comment). If that lands in the upcoming 1.4.0 release (and that release happens before Tribuo's 4.1 release) we'll update to that version to make things easier for our users. I'm a little worried about the multithreading aspect, I'll see if I can replicate it. Roughly how big a problem were you using? |
Ah, so I'd missed the threading issue because internally we build XGBoost4j with OpenMP turned on for Windows, macOS and Linux, but it's not turned on in the builds provided by dmlc in Maven Central for macOS. I'll open an issue upstream as it looks like they build the Python macOS whl with OpenMP turned on. |
On Linux with this branch building a 500 tree model on MNIST (using Tribuo's default parameters) takes 30s with 1 thread and 10s with 6 threads (on an Intel Core i7-8700), so the Tribuo side of things is definitely passing down the right parameters. |
public float xbgAlpha = 0.0f; | ||
@Option(longName = "xgb-min-weight", usage = "Minimum sum of instance weights needed in a leaf (default 1, range [0,inf]).") | ||
@Option(longName = "xgb-min-weight", usage = "Minimum sum of instance weights needed in a leaf (range [0,inf]).") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does one specify inf here? Float.MAX_VALUE
? That maybe should be documented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well if you do specify inf I don't think it can make any splits, so it probably goes pop. I should check the XGBoost docs again, this comes from their CLI docs but I don't think that line has been updated since XGBoost 0.7 ish.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this same questions works better on xgb-max-depth
, which has the same range.
Classification/XGBoost/src/main/java/org/tribuo/classification/xgboost/XGBoostOptions.java
Show resolved
Hide resolved
Classification/XGBoost/src/main/java/org/tribuo/classification/xgboost/XGBoostOptions.java
Show resolved
Hide resolved
public float xgbSubsample = 1.0f; | ||
@Option(longName = "xgb-num-threads", usage = "Number of threads to use (default 4, range (1, num hw threads)).") | ||
@Option(longName = "xgb-num-threads", usage = "Number of threads to use (range (1, num hw threads)).") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this need a default value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OLCUT auto-generates it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I meant doesn't the actual variable need a default value. Or at least, shouldn't it be required? Will xgboost crash if you give it zero threads or use all hardware threads?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you give it zero it uses all hardware threads. I should note that in the usage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good overall, there are a couple of UI nitpicks in XGBoostOptions
I'd like clarification on. See inline comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me
I got multi-threading working on macOS following https://xgboost.readthedocs.io/en/latest/jvm/index.html#enabling-openmp-for-mac-os directions and building that and this project with |
Excellent. Looks like the xgboost developers will discuss turning it on in xgboost builds after the upcoming 1.4.0 release. I'm hopeful that that 1.4.0 release will include windows binaries though which will make things much simpler for Tribuo. |
Description
Bumps XGBoost from 1.0.0 to 1.3.2. Also plumbs through additional XGBoost parameters, so now the booster type and tree method are controllable. Exposes the logging verbosity as a replacement for the deprecated silent parameter.