Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve CI speed #7992

Closed
wangkuiyi opened this issue Jan 31, 2018 · 2 comments
Closed

Improve CI speed #7992

wangkuiyi opened this issue Jan 31, 2018 · 2 comments

Comments

@wangkuiyi
Copy link
Collaborator

wangkuiyi commented Jan 31, 2018

Our CI has been running slow recently. Qing-Qing, Yu Yang, Helin, Chen Xi, Ya-ming, Yi-bing, and I discussed this issue and here are what we learned and what we are going to do:

A. Reduce the number of SM architectures

  1. We are building many SM architectures in the CI: https://github.com/PaddlePaddle/Paddle/blob/develop/cmake/cuda.cmake.
  2. According to the experiment of Qing-qing, [Speed up compiling]: reduce the NVCC compiling (some .cu operators can be compiled by G++) #5491, nvcc could run faster if we generate less number of SM architectures.

Helin is going to configure the CI system to generate only one SM architecture when checking PRs, but generating all SM architecture code in the nightly build of the develop branch.

B. Migrate the CI system to two servers

We are running four TeamCity agents on four GPU desktops, each with one GPU and a desktop-level CPU (a few cores). We have two idle servers, each with 6 GPUs and a powerful CPU with 56 cores.

Helin will migrate the CI system to the servers.

C. Distribute unit tests to multiple GPUs

Our CI system runs unit tests by calling ctest -j N, where N is the number of processes that run unit tests in parallel. However, all these N processes are using the same GPU.

Qing-qing is going to study if we can make cmake/ctest to use more than one GPUs.

D. Add an environment variable to distinguish unit tests and regression tests.

Unit tests and regression tests are tested on CI server for every PR. They should be distinguished. Only unit tests should be run for every PR. Nightly builds should run all tests. We should add an environment flag to control it.

@putcn
Copy link
Contributor

putcn commented Jan 31, 2018

action item B is done, 198 and 199 are added to CI pool.

@Yancey1989
Copy link
Contributor

After discussing with @dzhwinter, we have another simple idea.
We can cache the thirdparty in a Docker Image which bases on paddle:latest-dev so that we don't need to build the thirdparty repeatedly for each PR.

Maybe the steps are as follows:

  1. Check out the code and check the cmake files under cmake/external or Dockerfile under the root folder have any update, if so:
    1. Rebuild a new Docker Image named paddle:teamcity which only contains the thirdparty.
    2. Push paddle:teamcity to the docker hub.
  2. Build and run all the unit test with paddle:teamcity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants