-
Notifications
You must be signed in to change notification settings - Fork 572
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ROL: Reduce cost of BASIC
pre-push test suite a little
#462
Comments
You can also see that ROL takes the longest to test of all Trilinos CI packages as shown here: http://testing.sandia.gov/cdash/index.php?display=project&project=Trilinos&parentid=2485447 Other than for Zoltan tests, ROL takes almost 10x longer than any other package involved in CI testing. That is way to expensive to CI tests. |
?No objections, Denis From: Roscoe A. Bartlett [email protected] Next Action Status: ??? CC: @trilinos/rolhttps://github.com/orgs/trilinos/teams/rol @trilinos/frameworkhttps://github.com/orgs/trilinos/teams/framework Description: The test ROL_example_PDE-OPT_stefan-boltzmann_example_02_MPI_4 times out at 10 minutes in automated Nightly testing as shown here: http://my.cdash.org/queryTests.php?project=Trilinos&date=2016-06-23&limit=200 I my local testing on muir just now, several of these stefan-boltzmann tests take in excess of 300s to complete: 121/126 Test #100: ROL_example_PDE-OPT_stefan-boltzmann_example_01_MPI_4 .............................. Passed 303.38 sec The total amount of taking time taken by all ROL tests (if run in serial) would be 54 minutes as shown by ctest output on my run of muir: Label Time Summary: Total Test time (real) = 1402.27 sec Therefore, I would like to propose to take tests ROL_example_PDE-OPT_stefan-boltzmann_example_01_MPI_4 and ROL_example_PDE-OPT_stefan-boltzmann_example_02_MPI_4 and promote them from BASIC to NIGHTLY. That would leave the test ROL_example_PDE-OPT_stefan-boltzmann_example_03_MPI_4 to run in pre-push CI testing (once we make ROL a PT package, see #410#410). This will avoid these expensive tests in pre-push testing but they will still get run in Nightly testing. Any objections? You are receiving this because you are on a team that was mentioned. |
Thanks Denis. These two tests mentioned above:
just timed-out at 3 minutes and stopped my push. Here was the ctest output:
Now, I don't think that Trilinos has an official policy (yet) on how long a single CI test can be allowed to run but over the years we have been able to maintain a 3 min max (i.e. 180 sec) for I sorted all of the most expensive tests from this ctest run and got:
As you can see, ROL takes up 18 of the 23 tests of the more expensive tests and that take over 10 seconds to run for this set of packages. If you take out the ROL tests, the remaining most expensive tests over 10 sec in this set of packages are:
To address this, I will promote these two tests NOTE: This change will just result in excluding these more expensive tests from running in pre-push testing (i.e. using checkin-test.py) but they will still get run post-push CI and Nightly testing. And since the default test category for Trilinos in regular CMake build is But in general I think the ROL developers might spend a little time with these tests to see if they can be made a little faster without reducing the quality of testing. So unless you are trying to resolve physics or doing formal verification work, you can typically get pretty good code and mathematical coverage with much faster running tests. |
BASIC
pre-push test suite a little
Agreed. For the two longest-running tests, we are pulling in a lot of other Trilinos packages (Tpetra, Amesos2, etc.) to perform PDE-constrained optimization. We can reduce the domain size on these types of tests and get decent coverage. However, the fact remains that the new Tpetra-based stack has been very slow for us, and we don't know at this point if we are using these packages suboptimally, or if there are performance bugs that we've uncovered. A year or two ago, similar examples based on Epetra and Amesos were running in a small fraction of the currently needed time. This is worrisome. From: Roscoe A. Bartlett [email protected] ?No objection Thanks Denis. These two tests mentioned above:
just timed-out at 3 minutes and stopped my push. Here was the ctest output: ... 99% tests passed, 2 tests failed out of 259 Label Time Summary: Total Test time (real) = 423.25 sec The following tests FAILED: Now, I don't think that Trilinos has an official policy (yet) on how long a single CI test can be allowed to run but over the years we have been able to maintain a 3 min max (i.e. 180 sec) for BASIC tests (I will have to verify that by running more tests). I sorted all of the most expensive tests from this ctest run and got: $ grep " Test " MPI_RELEASE_DEBUG_ST/ctest.out | grep Passed | sort -rn -k 7 As you can see, ROL takes up 18 of the 23 tests of the more expensive tests and that take over 10 seconds to run for this set of packages. If you take out the ROL tests, the remaining most expensive tests over 10 sec in this set of packages are: 259/259 Test #258: TrilinosCouplings_Example_CurlLSFEM_MPI_1 .......................................... Passed 50.53 sec To address this, I will promote these two tests ROL_example_PDE-OPT_stefan-boltzmann_example_01_MPI_4 and ROL_example_PDE-OPT_stefan-boltzmann_example_02_MPI_4 from the default BASIC (i.e. pre-push) category to the CONTINUOUS . Also, since the test ROL_example_topology-optimization_example_01_MPI_1 takes almost 2 mins to run, I will promote that one to CONTINUOUS as well. NOTE: This change will just result in excluding these more expensive tests from running in pre-push testing (i.e. using checkin-test.py) but they will still get run post-push CI and Nightly testing. And since the default test category for Trilinos in regular CMake build is NIGHTLY, that means that will also still run by default when you do a basic local cmake configure then run ctest (i.e. nothing will change for ROL developers or post-push automated testing). But in general I think the ROL developers might spend a little time with these tests to see if they can be made a little faster without reducing the quality of testing. So unless you are trying to resolve physics or doing formal verification work, you can typically get pretty good code and mathematical coverage with much faster running tests. You are receiving this because you are on a team that was mentioned. |
Thanks!
That is worrisome. But I think the milestone they had a few years ago should have addressed the major performance issues of the Tpetra stack w.r.t. to Epetra stack (at least w.r.t. that important SNL customer the milestone was targeting). This is something someone should look into at some point. For now, it is not the end of the world for these couple of tests to only run in post-push testing. |
I pushed the commit 748666a which made these three tests http://testing.sandia.gov/cdash/viewTest.php?onlypassed&buildid=2486328 the three tests:
are still getting run in post-push CI testing. You can also see that the test For now you can just leave these as (expensive) I am turning this back over to the ROL developers. If you want to just leave these as |
Related to this issue, I was on a loaded machine trying to push and I got a timeout of the test ??? at 3 minutes. A prior run with the machine unloaded passed (but a different test failed, see #495). As a result I had to do a forced push a shown below. Not sure why only this ROL test timed out and not any other pre-push BASIC test.
|
@dridzal if you are satisfied leaving these tests continuous test, you can close this ticket. |
The @trilinos/rol team will revamp the ROL test setup and assignment in the next month. For now, we'll leave these 'continuous', and close the ticket. |
Next Action Status:
Demoted 3 ROL tests from
BASIC
toCONTINUOUS
(still running in post-push CI testing, see below). Next: ROL developers review and close?CC: @trilinos/rol @trilinos/framework
Related To: #442
Description:
The test
ROL_example_PDE-OPT_stefan-boltzmann_example_02_MPI_4
times out at 10 minutes in automated Nightly testing as shown here:http://my.cdash.org/queryTests.php?project=Trilinos&date=2016-06-23&limit=200
http://my.cdash.org/testSummary.php?project=896&name=ROL_example_PDE-OPT_elasticitySIMP_topologyOptimization_example_01_MPI_1&date=2016-06-23
I my local testing on muir just now, several of these stefan-boltzmann tests take in excess of 300s to complete:
The total amount of taking time taken by all ROL tests (if run in serial) would be 54 minutes as shown by ctest output on my run of muir:
Therefore, I would like to propose to take tests
ROL_example_PDE-OPT_stefan-boltzmann_example_01_MPI_4
andROL_example_PDE-OPT_stefan-boltzmann_example_02_MPI_4
and promote them fromBASIC
toNIGHTLY
. That would leave the testROL_example_PDE-OPT_stefan-boltzmann_example_03_MPI_4
to run in pre-push CI testing (once we make ROL a PT package, see #410).This will avoid these expensive tests in pre-push testing but they will still get run in Nightly testing.
Any objections?
The text was updated successfully, but these errors were encountered: