-
Notifications
You must be signed in to change notification settings - Fork 314
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrade to xgboost 1.0.0 - Use h2oai Predictor #708
Conversation
4e0a995
to
c56b8a9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, although I'm a little nervous about swapping the dataset. You mentioned the new dataset doesn't have nulls, and I know those are a big source of error possibilities. Were the test warnings new after upgrading, or pre-existing?
I was a bit nervous too but tests passed with both dataset (agaricus and diabetes). The problem are just those WARNINGs.
I also commented here on dmlc/xgboost and they suggested to add Example of the diabetes dataset:
All of the columns from 1 to 8 are defined, while in the agaricus, not all of the 127 features are always defined. The problem is that the logs become too long, both locally and in Travis. I'm not sure I can silence all of those warnings, I also don't want to silence all warnings. One way could be to load the data via a different method but SVM is so convenient What do you suggest? |
Seems like the warning just come from loading the libsvm file, so maybe its ok -- we're just relying on the upstream packages to have good test coverage for their null handling. Agree we don't want to just silence all the warnings. |
I think this issue was fixed 23 days ago here: dmlc/xgboost#5856 ? So we'd need to wait for xgboost 1.1.1 to be able to manually set the #features to 127 and silence the warning. I'm now wondering if we'd see all of those warnings in prod too with 1.0.0 (there's a warning for each line with a missing value) |
This only matters for reading libsvm format. We could even look at changing the test setup here to not read the libsvm files and just create a train/test dataset in another way. E.g., how xgboost's internal tests do it |
e2daeb5
to
489e95b
Compare
I fixed the WARNINGS by recreating agaricus in CSV and loading from CSV. I'd like to try out upgrading to 1.1.1 directly as suggested in this comment #697 (comment) because 1.1.1 should also fix those warnings and maybe it's better to jump to 1.1.1 as it's supposed to be more stable? I can also push this out for now and tackle 1.1.1 later @jsleight |
00aa066
to
fe986dc
Compare
…atching the number of features
fe986dc
to
01629c1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is working, then I'd ship the 1.0.0 migration first and do a follow up to bump up to later versions in the future.
This does work and tests now pass. I only had to add a 3MB csv file to git (which we might remove on 1.1.1) but eh, it's not too much I think. @ancasarb for final review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great overall, thank you! Can you please just make the small change to simplify the build scripts? Thanks again!
@@ -2,6 +2,14 @@ | |||
sudo: required | |||
dist: trusty | |||
|
|||
addons: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you please remind me why these addons are needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uh I wasn't able to build it without it. I can post the error message here by removing them and running tests again if you'd like. I also noticed that it was necessary in the spark-3.0.0 release branch here: https://github.com/combust/mleap/blob/spark-3.0.0/.travis.yml#L14
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Makefile
Outdated
|
||
test_xgboost_spark: | ||
sbt "mleap-xgboost-spark/test" | ||
sbt "+ mleap-xgboost-spark/test" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can remove these separate tasks if we now add the mleap-xgboost-runtime
and mleap-xgboost-spark
subprojects in the aggregatedProjects
list in MleapProject.scala
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense. What was the original reason for them to be separated ? I assume because xgboost wouldn't build on 2.12 ?
travis/travis_publish.sh
Outdated
"mleap-xgboost-runtime/publishSigned" \ | ||
"mleap-xgboost-spark/publishSigned" \ | ||
"+ mleap-xgboost-runtime/publishSigned" \ | ||
"+ mleap-xgboost-spark/publishSigned" \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above, these can be removed if we add the mleap-xgboost-runtime and mleap-xgboost-spark subprojects in the aggregatedProjects list in MleapProject.scala`.
val xgboostDep = "ml.dmlc" %% "xgboost4j" % xgboostVersion exclude("com.esotericsoftware.kryo", "kryo") | ||
val xgboostPredictorDep = "ai.h2o" % "xgboost-predictor" % "0.3.17" exclude("com.esotericsoftware.kryo", "kryo") | ||
|
||
val xgboostSparkDep = "ml.dmlc" %% "xgboost4j-spark" % xgboostVersion exclude("com.esotericsoftware.kryo", "kryo") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did we run into issues with the kyro versions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are brought in elsewhere and there is a version conflict. This is just making sure that others are chosen
d63e5a3
to
f444f21
Compare
Fixes #697
Major changes
Minor
sbt dependency-graph
)indexing_mode=1
.NOTE: this has been rebased on top of #709
@talalryz @voganrc for review