-
Notifications
You must be signed in to change notification settings - Fork 317
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mesh file for sparse grid for the NUOPC coupler #1731
Comments
@ekluzek in case it's relevant: Very briefly:
|
Using the new mesh_modifier tool, I was able to get a mesh file from the domain file. The mesh file for the atm forcing is different though in that it's not modifying the 2D grid it's a simple list of 400 points. So I need to create a SCRIP grid file that describes that list of points from the domain file, and then I convert it into ESMF mesh format. |
So this is what you need to do:
(This mesh file’s mask = 1 everywhere) |
Awesome, thanks @slevisconsulting the above helped me to get mesh files created. I got everything setup Friday, but when I run the case it's failing. So I need to debug what's happening and get a working case. The mesh files I created are in... /glade/work/erik/ctsm_worktrees/main_dev/cime_config/usermods_dirs/sparse_grid400_f19_mv17 Hopefully, the crash I'm seeing is something simple I can figure out. |
The crash has to do with the connectivity on the forcing grid which is just a list of the 400 points. The suggestion from ESMF is that I make the forcing grid points vertices just be a tiny bit around the cell centers. Because of the time it's taking to do this project I also plan to bring this work as a user-mod to master and add a test for it. |
We talked about this at the standup this morning. An idea I got there was to try it with the new land mesh, but without the atm forcing mesh. I tried that and it works. So there's something going on with the new forcing that only has the 400 points. This is something I did suspect. |
OK, I got a case to work! I couldn't use ncks to make the SCRIP grid file as it would "correct" my vertices to turn it into a regular grid. I was able to use curvilinear_to_SCRIP inside of NCL to write out a SCRIP grid file that I could then convert to a working mesh file. Using unstructured_to_ESMF inside of NCL didn't generate a mesh that I could use. One clue in the final mesh file I could see is that the nodeCount was 1600 (so 4x the number of points [400]) which shows that all of the points are isolated from each other. The mesh files that did NOT work all had a smaller number of total nodes than that which meant that they shared nodes between each other. |
@ekluzek I understand that you used |
Seems like this connects with #1919 too. In general we need better documentation on how to do this. |
@ekluzek and I met to compare notes (this issue #1731 vs. discussion #1919):
|
@ekluzek and @slevis-lmwg I tried running a sparse grid simulation using the steps we talked about on the call today. I got a bunch of ESMF errors. It seems like I should be following what @ekluzek did above? I'm not sure how to do what you did above, Erik. Do you remember the steps you took? My log files for the case can be found :
|
It looks like the issue is in datm, from the PET file as you point out. So one thing to check would be to see if just the change for the MASK_MESH works. I think that should, so that would be good to try. Another thing to try would be the datm mesh file I created which does differ from your file. From reading above I ran into trouble with just using ncks, because it would change the vertices from me, so I couldn't use files created with it. I think that might be the warnings we saw when we worked on this that showed issues with the south pole. (By the way the unclear messages from ESMF are another example of error checking that doesn't help you figure out the problem. I think this might be something where some better error checking could be done to help us figure out what's wrong). |
You can also try the land mesh file I created... /glade/work/erik/ctsm_worktrees/main_dev/cime_config/usermods_dirs/sparse_grid400_f19_mv17/fv1.9x2.5_sparse400_181205v4_ESMFmesh_c20220929.nc NOTE: That for this you would replace use it for ATM_DOMAIN_MESH And leave MASK_MESH as it was before. OR -- you would reverse the mask and use it for MASK_MESH, and leave LND_DOMAIN_MESH/ATM_DOMAIN_MESH as they were. I set it up changing ATM_DOMAIN_MESH/LND_DOMAIN_MESH because that is what made sense to me. But, as we saw with talking with @slevis-lmwg it's more general and simpler to swap out the MASK_MESH file. That mesh file is also different from yours, and it's not just the mask. So again maybe there was something going on with the ncks conversion? |
@adrifoster I do see four posts up that I wrote, "We think that his ncks attempt failed because he applied it to a "domain" file" If things continue to fail, let's meet again and go through the full process, as I recommend it above. Let's plan on an hour. |
Okay I removed the info in
Will try your next suggestion next. |
That still failed... thanks @slevis-lmwg for joining meeting with @mvertens this afternoon. @ekluzek do you want to join as well? |
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
@adrifoster I hope the run still works when you now point to the dense400 datm files and the datm mesh that I generated (previous post). |
Unfortunately that did not work. see logs /glade/scratch/afoster/ctsm51FATES_SP_OAAT_Control_2000_nuopctest/run |
I'm using DATM data in:
|
Okay thanks to @slevis-lmwg for helping me fix a separate error. This seemed to work! And the timing is faster! Updated table:
Thank you @slevis-lmwg !!! |
Here are the updated results for the new PE layout @slevis-lmwg @ekluzek @jedwards4b and @mvertens and I discussed today
The land and atm are much closer now. @jedwards4b does this seem like a good layout to land on? To create these cases I used the script see the script
|
@adrifoster I think that layout is pretty good. |
I will organize my instructions for generating mesh files into one post and hide (not delete) corresponding obsolete posts: Generating mesh files for sparse grid I-cases where land and datm are on different grids 1) Generate sparse grid mesh for the land model a) In matlab (% are comments)
b) After matlab
In a run where you point to the default atmosphere drivers (not the sparse version), set the three mesh paths to this lnd_mesh.nc in env_run.xml. @adrifoster showed that this works. It's not the full story if you want to run faster BUT I believe we have not seen correct output from simulations that followed the next step. The time savings has not been sufficient motivation to continue problem solving. @adrifoster points out that, if using xarray for (1a), then you have to set the encoding correctly otherwise nco complains:
2) Generate the sparse grid mesh for datm The 1D datm domain file for mct runs came from combining the 2D gswp3 data and the dense400 mask. So... in matlab I will take the 1D mask from the domain file In matlab:
In a copy of the datm file, so as to avoid overwriting the original, still in matlab:
After matlab
@adrifoster showed that this mesh file works when still pointing to the global datm data. Next I modify this mesh file In matlab, I read variables from the global datm_mesh/lnd_mesh.nc file and the 400-element domain file...
To work, this file needs all the same variable attributes as found in other working mesh files. I copied the attributes manually from the original file to an ascii version of the new file and used ncgen to generate a new netcdf. In this step I corrected some vars from double to int and added global attributes. This is the mesh file for running with the 400-element datm files. If necessary, convert type netcdf4 files to cdf5 (type classic is fine). To check and to change the type:
@adrifoster will test. To run faster, perform load balancing (discussion above). |
You can avoid the last step of translating from netcdf4 files by creating the file in the right format in the first place: Example: nccreate("myFile.nc","Var1",Datatype="double",Format="classic") |
From conversation with @oehmke @mvertens @adrifoster @ekluzek @slevis-lmwg nodeCount = 1600 |
@oehmke thank you for meeting with us this morning. Let me know if you see anything wrong with this one. |
As far as I can tell, this file should work to be read in as a mesh and used in the nearest neighbor. It looks like the node coordinates for the elements are just repeats of the first one. I think that should be ok as long as you aren’t using those node coordinates for anything (e.g. conservative regridding). Let me know how it goes. Thanks!
… On Nov 13, 2023, at 1:17 PM, Samuel Levis ***@***.***> wrote:
@oehmke <https://github.com/oehmke> thank you for meeting with us this morning.
The new file is
/glade/scratch/slevis/temp_work/sparse_grid/datm_mesh/lnd_mesh_400.nc (ascii version available in same directory)
Let me know if you see anything wrong with this one.
—
Reply to this email directly, view it on GitHub <#1731 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AE6A7U2T4HET7UTJ5MOSIUDYEJ545AVCNFSM5UXJD462U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCOBQHEYDAMBTGE4A>.
You are receiving this because you were mentioned.
|
So I tested this new mesh but the output looks the same to me. I ran it twice just to make sure. The datm.streams.xml is pointing to that new file I didn't update the LND_DOMAIN_MESH, ATM_DOMAIN_MESH, or MASK_MESH, should I have? rundir: /glade/scratch/afoster/ctsm51FATES_SP_OAAT_Control_testCLIM_2000/run |
Hmmm. Maybe that was a problem, but not the only one. Let me experiment a bit with the file here and see if I find anything.
… On Nov 14, 2023, at 1:11 PM, Adrianna Foster ***@***.***> wrote:
So I tested this new mesh but the output looks the same to me. I ran it twice just to make sure. The datm.streams.xml is pointing to that new file
I didn't update the LND_DOMAIN_MESH, ATM_DOMAIN_MESH, or MASK_MESH, should I have?
<https://user-images.githubusercontent.com/13225250/282911764-df98340a-57dd-4749-905d-e4d065e3e512.png>
rundir: /glade/scratch/afoster/ctsm51FATES_SP_OAAT_Control_testCLIM_2000/run
casedir: /glade/work/afoster/FATES_calibration/FATES_SP_OAAT/ctsm51FATES_SP_OAAT_Control_testCLIM_2000
—
Reply to this email directly, view it on GitHub <#1731 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AE6A7UZ2AMC4HWL23SEWBYLYEPF6HAVCNFSM5UXJD462U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCOBRGEYTMMBYGE4A>.
You are receiving this because you were mentioned.
|
Okay thank you! If it helps at all I used this script: |
@adrifoster
Also, for when we have time to work on the sparse datm further:
With these lines changed, we would compare what we get from an MCT case versus a NUOPC case. If I remember right, you don't have any working MCT cases, so we'd need to start over by checking out some old version of the model. Again, this is for when we have time/motivation to get the sparse datm working. |
@mvertens and @jedwards4b I've moved over to Derecho and am having trouble optimizing my PE layout... Right now the timing and throughput is not as good as on cheyenne for a PE layout suggested by @ekluzek. Should I increase the ATM ntasks?
|
Sorry, here is the time for each component:
|
@adrifoster One issue may be that with only 4 tasks for atm you are using the serial netcdf interface. Try changing PIO_STRIDE_ATM=1 and see if that helps. I'll build a case from your sandbox and play around with the pelayout a bit. |
Thanks @jedwards4b, just FYI the script I used to build this case is here:
|
Model Cost: 10.94 pe-hrs/simulated_year TOT Run Time: 3076.618 seconds 0.843 seconds/mday 280.83 myears/wday This is with |
Thank you @jedwards4b!! |
@jedwards4b I am revisiting this PE layout for a different 400-point sparse grid application and wanted to ask about PIO_STRIDE=32:
Thanks! |
SAM - there is no ASYNCIO in this case so I'm confused about your question. |
Sorry to confuse you. I, too, was confused, thinking I would find PIO_STRIDE in env_mach_pes.xml, but I have located it in env_run.xml, so I think I'm all set. Thanks @jedwards4b. |
I suggest that you should use xmlchange and xmlquery so that you don't need to know which file to look in. |
@adrifoster you got something working here. Is there more that needs to be done in this space? It does seem like we should make this something easy to do and easy to get the right PE layout and mesh file needed for NUOPC. |
We need a mesh file that can be used with the NUOPC coupler for the sparse grid.
Here's a sample case for the MCT coupler:
/glade/work/oleson/PPE.n11_ctsm5.1.dev030/cime/scripts/ctsm51c8BGC_PPEn11ctsm51d030_2deg_GSWP3V1_Sparse400_Control_2000
The text was updated successfully, but these errors were encountered: