-
Notifications
You must be signed in to change notification settings - Fork 258
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Port UFS-WM to Ursa #2471
base: develop
Are you sure you want to change the base?
Port UFS-WM to Ursa #2471
Conversation
quick update: this work is on hold while we wait for ursa's |
we are still waiting for internet connectivity to be enabled on ursa so that the spack-stack build can be completed. @RaghuReddy-NOAA do you have any update on this? |
@ulmononian The login node nfe91 is now able to access the external network. Please note that it is still behind a firewall, and any site that is reachable by Hera/Niagara should be reachable from nfe91 too. |
currently testing ufs-wm w/ spack-stack/1.8.0 on ursa. some issues with cmake and locating mpi libs right now, i.e., when trying to run cmake for the atm-only model
i get:
this corresponds to an error generated during the find MPI section of the top-level ufs-wm
i've also tried changing the envvars set in the ufs_ursa modulefile (mirroring some of the approaches in the hercules/hera/gaeac6 llvm lua files), to no avail so far. note that we are using icx,icpx, and ifort for now, though we do have an ifx-based stack as well. i've also tried w/ system cmake, but no difference. |
update: testing w/ newly built
baseline comp. failed because we don't have the newest bl data on ursa yet. should just run with |
i tried to build the WM in
i checked some things, purged modules and ran again, only to see that same
we are testing different stacks & different compilers, hitting the same cmake error, once in an ATM build and once in a coupled build. i am not sure what is going on. @rickgrubin-noaa @RatkoVasic-NOAA @RaghuReddy-NOAA @DusanJovic-NOAA @climbfuji fyi |
now seeing this when trying to compile S2SWA (it finds MPI_C, but i'm not sure it is finding these correctly:
cmake fails now in
seems like something weird is going on here... the ccpp_prebuild.err file shows
|
From the log:
|
lol, i was wondering why it was trying to parse a suite that did not exist / was not being used in my cmake commdand. classic case of typo (i had entered now just back to the same MOM6 issue:
|
have you recursed all submodules in your git clone? |
seems like my recent pull from develop was not successful and messed up the MOM6 src... cloned fresh and it gets past this. thanks @BrianCurtis-NOAA. |
using
|
Those are remarks and I don't believe cause the make to fail. Look earlier for errors. |
you're right. it was some issue in cmeps. this was with ifx...which i'm going to bypass for 1.6.0 testing for now. |
@DeniseWorthen I have no access to that platform. If you want I could run cpld_control_p8_lnd_intel vs cpld_control_p8_intel and check the timings to see the extra overhead from land component. |
@uturuncoglu Sorry for not being clear. It seems we're having this same issue on other platforms. I brought it up here because extending wall clock is not the solution to this sort of problem. Could you maybe derecho and/or hercules? Are they running close to wall-clock there? |
@DeniseWorthen Okay. Let me check on Hercules. I'll update you soon. |
@DeniseWorthen If I remember correctly. I did couple of test before and reducing the output interval of the land component was improving its performance. So, here is the result of my previous tests that I did for land DA,
Of course this is for the configuration coupled with DATM. I'll do a similar test with the |
@DeniseWorthen I run the control_p8 with and without land component. The standalone atmosphere case (original control_p8) took 223 sec in total and one with the external land component took 286 sec. Both flactuates little bit but not too much due to load in the system. In the land coupled case, FV3 writing every hour but this is not the case for standalone. So, probably I/O plays role in here in terms of timing difference. In any case, the run finish around 5 min. So, I am not sure why it's timing out. Maybe comparing files is taking time. Not sure. Anyway, I think the best way to speed up the case (if you want more performance) is to use same output interval with control_p8. So, I did following modification in the test.
So, this minimize the I/O footprint of the |
thanks for this testing @uturuncoglu! @DeniseWorthen @jkbk2004 preference here for these lands tests on ursa? |
just to note: |
is there some memory issue? Does a TPN of 192 make sense for Ursa's hardware? |
@uturuncoglu Thanks for your testing. I don't see there should be any issue w/ that sort of timing (~300s) so I don't think there is any cause to change your tests. |
@DeniseWorthen Okay. Let me know if you need help. We could make them more efficient in the future. I am not sure at this point who will maintain land coupling related issues since we are in a process of finalizing JTTI project. |
@BrianCurtis-NOAA |
@BrianCurtis-NOAA moving to 128 TPN resolves the issues for all the
wallclock time is still at 30mins. times out every time. @chan-hoo mentioned you maintain this test, so wanted to check if you had any insight. |
just a note (in case anyone knows a fix) that rocoto will hang indefinitely when running rt.sh for multiple tests (i.e., |
compile failures in the following using
all other gnu configurations & tests turned on for both hera/hercules succeed on ursa, i.e.:
|
the error i reported above seems to be resolved if i include
|
all compile jobs pass with
|
Commit Queue Requirements:
Description:
Enable the UFS-WM to run on Ursa (Hera's follow-on machine). Currently, just using the pre-TDS via Niagara, so configurations will likely need to be updated as the machine comes closer to full implementation.
spack-stack installation to support UFS applications using Intel LLVM compilers is in progress; see JCSDA/spack-stack#1297.
UFS-WM_RT data will be staged on shared Niagara/Ursa disk space for now, until dedicated Ursa filesystem is made available; once the stack installation is finished, RTs will be run using 8 available nodes (1 service, 7 compute).
Commit Message:
Priority:
Git Tracking
UFSWM:
Sub component Pull Requests:
UFSWM Blocking Dependencies:
Changes
Regression Test Changes (Please commit test_changes.list):
Input data Changes:
Library Changes/Upgrades:
Testing Log: