-
Notifications
You must be signed in to change notification settings - Fork 251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for MSU Hercules #1732
Comments
Some sacct commands are not working as expected on Hercules (on login nodes 1,3,4, at least). Both of the commands
Simply entering Since some |
|
@jkbk2004 @jkbk2004 @ulmononian A new hpc-stack built on The stack could be loaded as following:
Please see below a the modules in the stack when inquired using "module list":
Feel free to test it if possible. There could be potential errors with ESMF during the run-time, let me know if any errors result! |
@natalie-perlin I was able to create a new modulefile from the
I do see that
Any clarification on why the ufs-weather-model might not be accepting Please see |
@MichaelLueken would you be willing to open up permissions to your RT path? i haven't been testing with hpc-stack on hercules (focusing on spack-stack), but i'd be happy to take a look for you. one thing to note is that if you're interested in testing spack-stack on hercules (currently 112/126 RTs pass w/ spack-stack/1.3.1 there), feel free to check out my fork branch at #1733. |
@ulmononian I thought that I opened up the permissions to my RT path - I did a second pass and it looks like everything in my I'm certainly willing to test your fork's branch at #1733 as well! |
just to note: sacct issue re-emerged on hercules login-3. this is with
this issue causes failures to job status checking in the RT scrips. for example, even if the compile step completes successfully, the run step won't proceed because the job monitoring logic hits this |
@MichaelLueken i think i found the cause of the netcdf issue you mentioned: it looks like in |
@MichaelLueken i can confirm that after doing |
for some reason, the srun command w/ --mpi=pmi2 that was added to fix previously reported srun errors started failing with
to fix this, |
Hi @ulmononian, with respect to the requirement to include --mpi=pmi2 in the srun command, have you opened a Hercules helpdesk ticket to see if they would be able to build the MPI on the machine with this flag turned on? This seems to be something that the sys admins should include in the MPI build, rather than require users to add these extra steps into the srun command or adding |
@MichaelLueken this is a great question and point. i am not sure why this is necessary on the user side, so i can certainly put a helpdesk ticket in to inquire. i should note that i have only tested the ufs-wm w/ spack-stack-built executables on hercules, so i am not sure if it it a general issue or spack-specific. i should note that, in the MSU hercules docs, there is an explicit mention of using perhaps more bizarre to me is that, for a month or more, the |
@MichaelLueken i contacted msu hercules helpdesk. i will let you know what i hear! |
@ulmononian Thanks! I'm definitely interested in hearing why they want users to add --mpi=pmi2 to their srun command. |
i forgot to follow-up on this, but the hercules team made some changes and the |
i opened an issue on the mapl repo regarding newer intel compiler/mapl compatibility issues: GEOS-ESM/MAPL#2213 |
cpld_control_p8 passes when RT is run against a test installation of spack-stack that was built w/ intel 2023.1.0 (which includes ifort 2021.9.0 rather than ifort 2021.7.1). the msu sys. admins recently (7/14/23) installed intel 2023 by request, as the mapl team had alluded that there were known issues with ifort 2021.7.1 (which is the fortran compiler included w/ the original intel 2022 installation on hercules. |
Description
The WM needs updated to support MSU's new HPC, Hercules.
Solution
Update all necessary files to enable WM functionality on Hercules. spack-stack/1.3.1 is currently being installed, so testing can begin there shortly.
Relates to
PR #1707, Issue #1651
The text was updated successfully, but these errors were encountered: