-
Notifications
You must be signed in to change notification settings - Fork 149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Runtime error when turning on threading in a CAM simulation #941
Comments
I don't believe threading is supported when the SE dycore is used. Could you try threading with the FV dycore (--res f09_f09_mg17)? |
Thanks @fvitt for your suggestion. Yes, switching to the FV dycore works with threading. Is there a plan to support SE dycore with threading in the future or it just stays with the MPI-exclusive configuration? |
I don't know of plans to support threading for the SE dycore. |
Got it! Thanks @adamrher . |
Hi @fvitt , it seems that I could only run FV dycore with 2 threads per MPI task on Derecho. When I increase the number to 4, the simulation fails again with some errors coming from the dycore. Is it expected or should I set something specific for larger thread number? |
In principle, you should be able to use 4 threads per MPI task. When I have tried threading on derecho I noticed the performance was quite poor, but the runs did not fail. There is some discussions on how to run hybrid MPI+OpenMP jobs on slide 96 here: I just have not tried the suggestions for process binding. Do we have the arguments to mpiexec correct for threading? |
Hi @fvitt , thanks a lot for your suggestion. It turns out that I need to manually add the arguments you indicate for a hybrid MPI/OpenMP job when I use more than 2 threads per MPI task. Now it is working. Yes, the threading performance of CAM is poor on Derecho, and the reason is again that the arguments you suggest for hybrid MPI/OpenMP job are not set in CAM by default. When I manually change the MPI command with those arguments, the performance of hybrid MPI/OpenMP job is restored compared to the MPI-exclusive configuration. CISL is working a wrapper script that will avoid adding this long argument list manually. I will test it in CAM once it is in good shape. |
The |
What happened?
I tried to turn on the threading option in a CAM simulation (
F2000climo
compset,ne30pg3
resolution). I used one compute node on Derecho with 64 MPI tasks and 2 threads per MPI task. It built successfully but I encountered lots of runtime errors (partials of them are listed below):The complete list of errors could be found on Derecho at
/glade/derecho/scratch/sunjian/cam6_run/F2000climo.ne30pg3_ne30pg3_mg17.derecho.intel.gpu00_pcols00016_mpi0064_thread002_rrtmgp/run/cesm.log.2648024.desched1.231212-143239
.What are the steps to reproduce the bug?
To reproduce the error on Derecho, you can do:
What CAM tag were you using?
cam6_3_139
What machine were you running CAM on?
CISL machine (e.g. cheyenne)
What compiler were you using?
Intel
Path to a case directory, if applicable
/glade/derecho/scratch/sunjian/cam6/F2000climo.ne30pg3_ne30pg3_mg17.derecho.intel.gpu00_pcols00016_mpi0064_thread002_rrtmgp
Will you be addressing this bug yourself?
No
Extra info
No response
The text was updated successfully, but these errors were encountered: