-
Notifications
You must be signed in to change notification settings - Fork 318
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrate @bekozi's ocgis tool in mkmapdata.sh #815
Conversation
Add config_archive.xml to cime_config for CTSM. Update the CODE_OF_CONDUCT to the August/2019 UCAR version. No testing done.
@slevisconsulting looking at your PR comment, it looks like GitHub is trying to be too clever, and thinks that this PR resolves some issues that aren't really resolved. It looks like this is truly supposed to close #806, but the way you reference #645 and #286 makes GitHub think that this PR closes those two issues, too. Can you please reword your PR comment to avoid this? For example, maybe use passive voice (e.g., "once #645 is resolved"). |
@bekozi I've reached the point where I need the weight files from ocgis to contain the same metadata as NCAR's weight files in order to test running them through our script If you look in one of the NCAR weight files that I've shared with you in the past, you'll see what I mean. Pls let me know if you need anything from me. Thanks! |
@slevisconsulting Would you mind pointing me towards one (Cheyenne is fine)? I just want to make sure we are on the same page. |
@bekozi some of the data in the NCAR weight files is redundant now, so I will first confirm what's really needed and then point you to a file. |
@bekozi I take it back. Removing things that I thought might be redundant caused me problems, so let's keep everything. Here's a weight file with everything in it: |
Got it. You only need the global attributes added, correct? |
As far as I can tell, the code that makes the surface datasets needs all the variables and dimensions that you see in there, where
In terms of the global attributes, some now already make it in my ocgis-made weight files. See here: |
Change a few uses of shr_kind Change a few uses of shr_kind to work with latest cime. These changes are from Mariana Vertenstein.
ESMF is moving away from including center coordinates and other auxiliary variables in the weight files. The concurrent weight file write routine (used by ESMPy) also does not support them. The reason for the removal is the increasing size of grids and associated coordinate arrays. If this is a deal breaker, I can definitely investigate getting them output in the concurrent routine. I'll add the additional global attributes no problem! |
@ekluzek I will let you decide how to approach this. |
...so as to compare map_ files generated by ocgis (this PR) vs. generated by the default mkmapdata.sh tool in ESCOMP#823.
...also first (incomplete) draft of ChangeLog
Only one src/dst does not and will not work because the region in the dst file does not overlap the src file at all: SRC=UGRID_1km-merge-10min_HYDRO1K-merge-nomask_c130402.nc DST=SCRIPgrid_1x1pt_urbanc_alpha_nomask_c110308.nc
Thanks for pointing this out @slevisconsulting. I'm checking with Jim Edwards and Mariana V. about this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some questions, comments, and change requests. Thanks.
@@ -446,17 +429,9 @@ if [ "$interactive" = "NO" ]; then | |||
mpirun=$MPIEXEC | |||
echo "Running in batch mode" | |||
else | |||
mpirun="" | |||
mpirun="mpirun -np 1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be better for the serial case to NOT be required to run with MPI. Is always running with MPI a requirement of OCGIS?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not a requirement for ocgis. ESMF must be built with MPI support for IO operations...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for that @bekozi. On cheyenne there are versions of ESMF that are built without MPI (the "uni" versions). So we used them for the serial cases. Can ocgis choose to point to a version of ESMF that is built without MPI?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we will encounter a problem in the weight file write routine since it uses PIO v1 which ESMF does not wrap with mpiuni. This could be another reason to fall back on the other weight file write routine which, while not writing concurrently, uses netcdf directly so supports mpiuni (I'm 99% sure this is the case). I'll confirm this with the ESMF group.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, thanks for pointing that out. If we need to require MPI, I think that's OK. But, because it's so much simpler to run without MPI, I like to have it as an option. It's often easier to setup the non-MPI version on a machine first, rather than having to also bring in the extra complexity for MPI. Practically, I'm not sure you can run the regridding for a non-MPI case at least for any sizable grid. It would be hopeful (and helpful) that you could run a single point grid without MPI though for example though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll talk with @rokuingh about adding another option for writing weight files in ESMPy. This should allow us to handle the mpiuni
case. We will also be able to maintain the center coordinates in the weight file for the interim. An option to switch between the write routines will be available. I created an issue on the ocgis side (NCPP/ocgis#511) to track this capability.
I did want to amend my comment on ESMF weight file center coordinates from earlier. The center coordinates in the ESMF weight file are not repeated for each source/destination index combination. The center coordinates are copied over from the source/destination grids. This can be inefficient for larger grids, but not as inefficient as my earlier comment would make it appear.
Ping @oehmke
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rokuingh is planning to work on these two features (verbose weight file and arbitrary sequence indices for factor lists) hopefully next week.
tools/mkmapdata/mkmapdata.sh
Outdated
@@ -288,7 +280,7 @@ else | |||
fi | |||
|
|||
# Set timestamp for names below | |||
CDATE="c"`date +%y%m%d` | |||
CDATE="c"`date -d "0 days" +%y%m%d` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment about this in the other PR. Is this for setting of daylight savings time? If so I especially think we shouldn't mess with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding comment explaining the use of the date -d
flag, as discussed here.
tools/mkmapdata/mkmapdata.sh
Outdated
if [ -z "$MPIEXEC" ]; then | ||
MPIEXEC="mpiexec_mpt -np $REGRID_PROC" | ||
fi | ||
|
||
module load ncarenv/1.3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it critical to use specific versions of software packages? This can be something difficult to stay up to date with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bekozi, before attempting to remove specific version numbers by trial-and-error, I returned to my previously working script on cheyenne:
/glade/work/slevis/ocgis_work/subset_1x1_20191213.sh
and found that it now fails with this error in the PET0.ESMF_LogFile:
20191229 114147.072 ERROR PET0 ESMF_Array.F90:3962 ESMF_ArrayWrite Unable to write to file - Internal subroutine call returned Error
20191229 114147.073 ERROR PET0 ESMF_IOScrip.F90:367 ESMF_OutputWeight File Unable to write to file - Internal subroutine call returned Error
20191229 114147.073 ERROR PET0 ESMF_IOScrip.F90:136 ESMF_SparseMatrix Write Unable to write to file - Internal subroutine call returned Error
20191229 114147.073 ERROR PET0 ESMF_Field_C.F90:1269 f_esmf_regridstorefile Unable to write to file - Internal subroutine call returned Error
It still writes out the same subset file as before:
/glade/work/slevis/ocgis_work/subsets/spatial_subset_191209.nc
but fails to generate the weight file that it had previously:
/glade/work/slevis/ocgis_work/weights/weights_191209_after_subsetg_1x1.nc
I tried two things that didn't help.
In /glade/work/slevis/git_ocgis/ocgis
and
in /glade/work/slevis/git_esmf/esmf
I typed
git fetch
git pull
to get caught up with the latest codes.
What else am I missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turns out the job fails when the weight file already exists. I'm surprised that I had not run into this before :-) Thanks for troubleshooting with me, @bekozi. Reminder for you to put a more explanatory error message in the code about this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replaced specific module loads with generic module loads now.
tools/mkmapdata/mkmapdata.sh
Outdated
cmd="$cmd -d $GRIDFILE -m conserve -w ${OUTFILE[nfile]}" | ||
if [ $type = "regional" ]; then | ||
cmd="$cmd --dst_regional" | ||
PATH_PREFIX=/glade/work/slevis |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should have this outside of a particular user's space. On cheyenne it could be under: /glade/tss/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now I changed these path assignments to be based off of $dir
because this is already assigned at the top of the script. This way I avoid using slevis
in the path.
I am open to changing to something like /glade/tss/
later.
tools/mkmapdata/mkmapdata.sh
Outdated
CHUNKDIR=${WD}/chunking | ||
rm -rf ${CHUNKDIR} | ||
OCLI_FILE=${PATH_PREFIX}/git_ocgis/ocgis/src/ocgis/ocli.py | ||
CRWG="python ${OCLI_FILE} chunked-rwg" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of these names are a bit short and cryptic. I get WD for Working Directory. And I suppose from context SS_PATH is Spatial Subset path. But, I'm not sure what CRWG means? What does the "chunked -rwg" option mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added this variable for the development script. It allows switching the CLI executable easily. For production code, I expect you could just use ocli chunked-rwg
(chunked-rwg is the CLI subcommand).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm mainly complaining about the names being too short to be meaningful and obvious (at least to me). So if it could be a little longer and more descriptive that would be useful. Or adding a comment that describes what it's doing and what its purpose is. But, typically it's better to just have a longer more descriptive name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Names are indeed a bit cryptic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed CRWG to CLI_EXECUTABLE.
Renamed SS_PATH to SUBSETS_PATH.
Removed PATH_PREFIX and WD.
These are minor, but I have not tested them, yet, because my simple ocgis script that last worked on 12/13/2019 is now failing. I will test when that is resolved.
This error came up while running my ocgis script on izumi. I confirmed that the same SRC/DST combination leads to the same error on cheyenne: @bekozi this is the script that gives the error on cheyenne: Thanks, again, for setting up the call today! |
@bekozi sorry... I renamed the file to |
@slevisconsulting The two issues we identified yesterday are fixed in
|
Thank you, @bekozi , I pulled your mods and the test script runs. |
@slevisconsulting I'm doing pretty well thanks for asking. I hope it's the same for you! My last update for the issue is here: NCPP/ocgis#494 (comment). I should have pinged you on it. @rokuingh and I got the arbitrary sequence indices for grids working in both ESMF and ocgis. The mesh is still outstanding. We are aiming to do it the right way, and once the MOAB mesh in ESMF is implemented fully, we can change the sequence indices in-memory without rebuilding the mesh. I know this has taken awhile. It ended up being more complex than I had expected. Given the success with the grid, I expect once the new mesh capabilities are available, we can move on it pretty quickly. Is this holding you up in anyway? The mesh changes are getting close, but it's difficult to give you a reliable time estimate... |
Thank you @bekozi for this update. One more piece of work discussed above was the "weight file with auxiliary variables." Is that one done? I will let somebody from the NCAR group respond to your question about timing. |
It is!: NCPP/ocgis#511 The argument |
Great, thank you! |
I'm not able to follow all the technical details enough to even know what
part of our project is or is not being held up. I suggest that we discuss
at a forthcoming WRF-CTSM meeting.
…On Fri, Jul 24, 2020 at 1:27 PM Samuel Levis ***@***.***> wrote:
Great, thank you!
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#815 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AFABYVCNWF7RJVVY4474ERTR5HODDANCNFSM4I5VZVVQ>
.
|
These mods came in with ESCOMP@ef5f3d5 mkmapdata.sh with ocgis still runs for certain SRC/DST combinations and not for others. Will keep working on it.
NCHUNKS_DST=10 works for all DST files. NCHUNKS_DST=20 failed for 10x15 DST files.
<unstructdata hgrid="1km-merge-10min" lmask="HYDRO1K-merge-nomask" | ||
>lnd/clm2/mappingdata/grids/UGRID_1km-merge-10min_HYDRO1K-merge-nomask_c130402.nc</unstructdata> | ||
<unstructdata_type hgrid="1km-merge-10min" lmask="HYDRO1K-merge-nomask">UGRID</unstructdata_type> | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kept the above as placeholders. I'm ok with removing if that's preferred.
input_pathname="abs" group="clmexp" valid_values="" > | ||
UNSTRUCT format grid data file | ||
</entry> | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Placeholder. I'm ok with removing if that's preferred.
Flag to pass to the ESMF mapping utility, telling it what kind of grid | ||
file this is. | ||
</entry> | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Placeholder. I'm ok with removing if that's preferred.
tools/mkmapdata/mkmapdata.sh
Outdated
"5x5min_IGBP-GSDP" \ | ||
"5x5min_ISRIC-WISE" \ | ||
"5x5min_ORNL-Soil" \ | ||
"10x10min_nomask" \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, I missed two lines. I will put them back in the next commit.
Update: The latter SRC files return this error: I request 10 chunks and end up with 1 for these SRC files. As a test I requested 1 chunk but the error remains the same. @bekozi suggested pointing to older esmf builds, so I tried changing my ESMFMKFILE to @bekozi will be taking time off from today until 8/17. |
Latest: So, back to using the newer esmf build The WITHAUX option is what will allow us to generate fsurdat files from the map_files. |
Good news: |
Update
There are 34 DST files per SRC file:
I have found that the 1km SRC file works with 8 ncpus/mpiprocs but not with 12, 16, 20, 36, or 72. I will let @ekluzek respond to Ben's questions. |
Reduce duplication between caps Eliminate duplication of the "derived quantities for required fields" and corresponding error checking codes that repeat across the mct, nuopc, and lilac caps. This consolidates the code and reduces maintenance requirements.
At Sep 24 meeting we decided to move fwd with the same testing that @ekluzek and @billsacks had proposed for #823 . The first test-suite is |
OK, note the relevant tests are passing. If you look at /glade/work/slevis/git/mkmapdata_ocgis/test/tools/td.2615.status you'll see the summary of the tests. They all pass, except the PTCLM tests, but they don't matter for this work. Likely you would see the same for running the baseline version. But, the problem with those tests don't matter for this work. |
Thanks @ekluzek and you are right that tests 31-35 fail in the baseline, as well. |
Ben removed the cause of blank I have now generated the first surface dataset using map_files that I generated with ocgis in mkmapdata.sh. Although the surface dataset seems good, the corresponding log file shows an error-checking issue where all area variables (input and output) are zero. I have posted this here. |
The work on parallel mksurfdata probably means this work will be closed without coming in. |
With PE #1663 this is something that we just aren't going to do. |
Description of changes
Modify mkmapdata.sh to run with @bekozi's ocgis tool because it allows for
Specific notes
Contributors other than yourself, if any:
@bekozi @negin513 @ekluzek @billsacks
CTSM Issues Fixed (include github issue #):
Begins to address #643 #644
Fixes #806
I could see it possibly fixing #648
Are answers expected to change (and if so in what way)?
I expect roundoff differences in output weight (aka map) files from testing this far.
I use roundoff differences as a minimum requirement for my ocgis tests to pass.
Any User Interface Changes (namelist or namelist defaults changes)?
None once #645 is resolved
Testing performed, if any: