-
Notifications
You must be signed in to change notification settings - Fork 127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ci: use mpich built with ch3:sock to speed up tests #3883
ci: use mpich built with ch3:sock to speed up tests #3883
Conversation
@vicentebolea Do you know anything about this failing test 1: terminate called after throwing an instance of 'std::runtime_error'
1: what(): [Mon Nov 13 15:47:31 2023] [ADIOS2 EXCEPTION] <Plugins> <PluginManager> <GetOperatorCreateFun> : Couldn't find operator plugin named MyOperator and 2: IO System base failure exception, STOPPING PROGRAM
2: [Mon Nov 13 15:47:31 2023] [ADIOS2 EXCEPTION] <Toolkit> <transport::file::FilePOSIX> <CheckFile> : couldn't open file testOperator.bp, in call to POSIX open: errno = 2: No such file or directory |
It's also a little hard to see the benefits here, as the results are muddied by the following two tests timing out after 2 minutes (that's 2 tests x 5 tries each x 2 min per try = 20 min):
Oddly, those timeouts seemed to have started only today. But it looks like it affected |
Oh wait, that part is not weird, gha tested a merge commit made by github merging my PR head ( fd111d462 Merge pull request #3913 from anagainaru/perfstub-fix |
As I recall this was resolved by @spyridon97 in the past weeks. |
Yep we are having issues with those tests (Engine.BP./BPChangingShapeWithinStep.MultiBlock/.BP3.MPI) I have a PR trying to figure the reason at #3908 |
If you have a link to the PR, I'll take a look. |
1847e39
to
be7e5fa
Compare
f6d01f7
to
84a321e
Compare
Table showing test times and number of tests for each compiler/parallel pair for the most recent CI run: compiler parallel # tests run elapsed time (tests only)
-----------------------------------------------------------------------
clang6 | ompi | 1269 | 21m
clang6 | mpich | 1257 | 7m
clang10 | mpich | 2029 | 11m
gcc8 | ompi | 1296 | 30m
gcc8 | mpich | 1284 | 12m
gcc9 | mpich | 2029 | 14m
gcc10 | mpich | 2027 | 14m
gcc11 | mpich | 2029 | 14m Anecdotally, I know the |
@vicentebolea Thanks for your help with this. If you approve of these changes, I'll rebuild/tag/push the images to the correct location and rewrite the history to a single commit with those changes. Otherwise, we can iterate on anything you disagree with while still using the test images. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks fantastic, great work there! how about the image building time? Did it significantly increase?
I don't think it increased much, if at all, but I'll let you know for sure when I build them all cleanly in just a few minutes here. |
33 minutes to build the |
Good enough, there has been a 50% time increase in image building but we get a much faster tests execution time. Sounds good! feel free to merge after pushing the images and making the changes in the image names in this PR. I will re-approve then |
11e6db7
to
d6733a3
Compare
Try an mpich built using a possible future version of spack (spack PR here) that supports building
--with-device=ch3:sock:tcp
.