Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Improving continuous integration #4234

Closed
hcho3 opened this issue Mar 8, 2019 · 9 comments
Closed

[RFC] Improving continuous integration #4234

hcho3 opened this issue Mar 8, 2019 · 9 comments

Comments

@hcho3
Copy link
Collaborator

hcho3 commented Mar 8, 2019

Now that we have two sponsors funding the continuous integration (CI) infrastructure (https://xgboost-ci.net), we should discuss ways to improve it.

De-couple builds from tests

Currently, we have a single Jenkins stage where XGBoost is both built and run. We should split this stage into two, one for builds and another for tests. The benefits of de-coupling compilation from test runs are:

  • Eliminate redundant compilation: GPU code is quite slow in compilation, and right now we compile XGBoost many times over. Instead, compile XGBoost only once for each CUDA target.
  • Test cross-version CUDA support, e.g. Test whether XGBoost package built with CUDA 8.x also runs on a machine with CUDA 10.x.
  • Save intermediate artifacts in a S3 bucket: If tests pass, then we can deploy built artifacts immediately.

Add Windows target

The progress has been slow on this front. The main challenge is to get Jenkins to somehow launch Windows workers and send remote commands. We've run into issues compiling XGBoost on Windows a few times (#4139, #3869), so it would be nice to detect potential issues early on. In addition, we want to build Python wheels automatically (I've been building the Windows wheel manually).

Migrate Python and Java tests to Jenkins

Regular performance tests

We should run a suite of performance tests on a regular basis (say every two weeks). This way, we can detect any performance degradation.

@dmlc/xgboost-committer

@hcho3
Copy link
Collaborator Author

hcho3 commented Mar 11, 2019

ETA for the first pull request for this RFC: end of this week (March 15, 2019)

This was referenced Mar 12, 2019
@hcho3
Copy link
Collaborator Author

hcho3 commented Apr 3, 2019

It turns out that it is possible to compile CUDA code on machines without NVIDIA GPUs:

The benefit is that we can use more powerful CPU instances (e.g. c5d.18xlarge) to compile CUDA code faster.

cc @trivialfis @RAMitchell

@terrytangyuan
Copy link
Member

Definitely +1 for running performance tests regularly. Probably more frequent than two weeks though.

@hcho3
Copy link
Collaborator Author

hcho3 commented Apr 4, 2019

@terrytangyuan Any suggestions for performance tests?

@terrytangyuan
Copy link
Member

@terrytangyuan Any suggestions for performance tests?

Not on top of my head. I've only done this for internal datasets before. But here we should probably pick some good public (Kaggle?) datasets. Note that performance tests include both statistical and computational performance so we may want to consider use cases from both perspectives.

@hcho3
Copy link
Collaborator Author

hcho3 commented Apr 4, 2019

@terrytangyuan Thanks. Let me think over this over the weekend.

@RAMitchell
Copy link
Member

Would something like this be okay for performance benchmarking? We have a more polished Nvidia version that could be open sourced. All of the dataset loading/processing is automatic and it runs on a docker container.

@hcho3 hcho3 mentioned this issue Apr 15, 2019
6 tasks
@trivialfis
Copy link
Member

Preferably move R tests to Jenkins where we can cache all built dependencies. ;-)

@hcho3
Copy link
Collaborator Author

hcho3 commented Apr 15, 2019

@trivialfis Yes, yes, yes! Docker is a great invention

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants