You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Our CI has been running slow recently. Qing-Qing, Yu Yang, Helin, Chen Xi, Ya-ming, Yi-bing, and I discussed this issue and here are what we learned and what we are going to do:
Helin is going to configure the CI system to generate only one SM architecture when checking PRs, but generating all SM architecture code in the nightly build of the develop branch.
B. Migrate the CI system to two servers
We are running four TeamCity agents on four GPU desktops, each with one GPU and a desktop-level CPU (a few cores). We have two idle servers, each with 6 GPUs and a powerful CPU with 56 cores.
Helin will migrate the CI system to the servers.
C. Distribute unit tests to multiple GPUs
Our CI system runs unit tests by calling ctest -j N, where N is the number of processes that run unit tests in parallel. However, all these N processes are using the same GPU.
Qing-qing is going to study if we can make cmake/ctest to use more than one GPUs.
D. Add an environment variable to distinguish unit tests and regression tests.
Unit tests and regression tests are tested on CI server for every PR. They should be distinguished. Only unit tests should be run for every PR. Nightly builds should run all tests. We should add an environment flag to control it.
The text was updated successfully, but these errors were encountered:
After discussing with @dzhwinter, we have another simple idea.
We can cache the thirdparty in a Docker Image which bases on paddle:latest-dev so that we don't need to build the thirdparty repeatedly for each PR.
Maybe the steps are as follows:
Check out the code and check the cmake files under cmake/external or Dockerfile under the root folder have any update, if so:
Rebuild a new Docker Image named paddle:teamcity which only contains the thirdparty.
Push paddle:teamcity to the docker hub.
Build and run all the unit test with paddle:teamcity
Our CI has been running slow recently. Qing-Qing, Yu Yang, Helin, Chen Xi, Ya-ming, Yi-bing, and I discussed this issue and here are what we learned and what we are going to do:
A. Reduce the number of SM architectures
Helin is going to configure the CI system to generate only one SM architecture when checking PRs, but generating all SM architecture code in the nightly build of the develop branch.
B. Migrate the CI system to two servers
We are running four TeamCity agents on four GPU desktops, each with one GPU and a desktop-level CPU (a few cores). We have two idle servers, each with 6 GPUs and a powerful CPU with 56 cores.
Helin will migrate the CI system to the servers.
C. Distribute unit tests to multiple GPUs
Our CI system runs unit tests by calling
ctest -j N
, whereN
is the number of processes that run unit tests in parallel. However, all theseN
processes are using the same GPU.Qing-qing is going to study if we can make cmake/ctest to use more than one GPUs.
D. Add an environment variable to distinguish unit tests and regression tests.
Unit tests and regression tests are tested on CI server for every PR. They should be distinguished. Only unit tests should be run for every PR. Nightly builds should run all tests. We should add an environment flag to control it.
The text was updated successfully, but these errors were encountered: