-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
llcreader: get grid info from portal #197
Conversation
Tim thanks SO much for doing this! I plan to review this PR in detail ASAP.
…Sent from my iPhone
On Apr 4, 2020, at 3:44 PM, Timothy Smith ***@***.***> wrote:
This PR is designed to give llcreader access to grid information on the ECCO portal (or elsewhere), to hopefully to close #161, close #166, and close #158 (possibly more).
Right now it gets the full grid for LLC 2160 and gets the available grid info on the ECCO portal for LLC 4320, which is some but not all. I was not able access the grid info mentioned by @rabernat in #161, which was here
vertical grid variables and coordinates
this got a little tricky since everything is designed to deal with faceted data. I wrote some functions to read these in differently. I adapted the same infrastructure to the 1D variables, even though they could probably be dealt with eagerly (see _get_1d_chunk and _dask_array_vgrid in llcmodel.py).
k_p1 index
I changed this implementation as follows. Currently, the code grabs k_levels for k_p1 where k_levels is the cell centered index. Even if the user does not select levels, this still results in len(k)==len(k_p1), which is technically not right, unless I'm missing something... It gets tricky though when the user selects individual, nonconsecutive levels. I wrote this PR so that, if the user selects cell centered k_levels=[0,10,11] then the output dataset has k_p1=[0,1,10,11,12] - so that k_p1 has the upper and lower indices for each vertical grid cell. I found this easier to do in order to correctly assign Zu,Zl, and Zp1. This implementation "seems right", but I'm interested if anyone has an idea for how to handle it more elegantly or if it's unnecessary/confusing. It also felt like a somewhat hacky solution, so again, feedback welcome. (see _get_kp1_levels in llcmodel.py). With this new implementation I had to change the expected size of k_p1 to be 51 and 91 for llc90 and llc2160/llc4320.
tests
I just appended checks for this stuff in current tests rather than adding new ones. Tests for the k_p1 levels is in test_llc90_local_faces_load and 2160/4320 grids checked for in test_ecco_portal_faces/latlon. Perhaps could add more, testing the added read_grid and grid_vars_to_coords options added to get_dataset.
Some of the implementation started to feel a little hacky, and I'm open to anyone's suggestion on making it cleaner. Let me know what you think!
You can view, comment on, or merge this pull request online at:
#197
Commit Summary
access to llc2160 grid is a go, with a couple caveats for now
use grid var names not filenames, get vertical grid vars working
lil typo
fixes loading grid dir issues
it works for llc2160
add directory to some LLC4320 grid vars
separate grid metadata function, add shrunk_grid flag
add grid_varnames to paramter list
test that grid variables show up in portal datasets
prefixes->variables
try Zl...
extra logic to create Zl, Zu, and Zp1
make reading grid optional
wow, not copying dicts really messes things up
only build Zu, Zl if RF is there
shouldn't kp1 have nz+1 levels
only read grid if grid_varnames has entries
also test selected k_p1 levels
typo...
... more typos...
rhoref, extra dx/dy metrics
add new metrics to test
typos, and add rhoRef to vgrid prefixes
File Changes
M xmitgcm/llcreader/known_models.py (17)
M xmitgcm/llcreader/llcmodel.py (201)
M xmitgcm/llcreader/stores.py (41)
M xmitgcm/test/test_llcreader.py (28)
M xmitgcm/variables.py (28)
Patch Links:
https://github.com/MITgcm/xmitgcm/pull/197.patch
https://github.com/MITgcm/xmitgcm/pull/197.diff
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
FYI this has moved here: https://catalog.pangeo.io/browse/master/ocean/LLC4320/LLC4320_grid/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tim this looks great. Very thorough.
The main theme in my comments is that I think we should not ever be using multiple chunks for 1D vertical variables. This gives us the opportunity to remove some code from you PR, which is always a good thing.
This PR sets the stage for us to completely replace the original backend with the llcreader approach. So happy you did this.
grid_vars.update(vertical_grid_variables) | ||
grid_vars.update(volume_grid_variables) | ||
grid_vars.update(mask_variables) | ||
grid_vars.update(extra_grid_variables) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not super happy about how repetitive this has to be, but it is consistent with the rest of the code base.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, a slight change to _get_all_grid_variables
got rid of most of this
xmitgcm/llcreader/llcmodel.py
Outdated
@@ -87,6 +126,8 @@ def _decompress(data, mask, dtype): | |||
_facet_reshape = (False, False, False, True, True) | |||
_nfaces = 13 | |||
_nfacets = 5 | |||
_vgrid_prefixes = ['DRC','DRF','PHrefC','PHrefF','RC','RF','RhoRef'] | |||
_vgrid_p1_prefixes = ['DRC','PHrefF','RF'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way we could infer that these are vgrid variables from the metadata (e.g. dims), rather than hard coding it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, definitely. I added a little _is_vgrid
routine for this and it seems much cleaner
xmitgcm/llcreader/llcmodel.py
Outdated
data.shape = facet_shape | ||
level_data.append(data) | ||
|
||
return np.concatenate(level_data, axis=0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic feels overwrought for 1D vertical variables. And it will be very slow to issue 90 1-byte http read requests. I recommend we always read the entire array for such variables and then subset klevels after.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think my updated implementation should do the trick, and yep some quick tests showed that it's much faster to do it this way. It definitely makes sense... I also appended to test_llcreader.test_ecco_portal_faces/latlon
tests to make sure that vertical grid coordinates are a single chunk that is the length of the data
xmitgcm/llcreader/llcmodel.py
Outdated
def _dask_array_vgrid(self, varname, klevels, k_chunksize): | ||
# return a dask array for a 1D vertical grid var | ||
chunks = (tuple([len(c) | ||
for c in _chunks(klevels, k_chunksize)]),) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned above, I don't think it ever makes sense to use chunks for 1D variables. We should always be returning a single contiguous chunk.
Thanks for the review @rabernat! I will get to this in the next couple of nights. I agree, writing out chunking 1D variables felt ... awkward ... but it was a start. I'll change that around and address the comments as well. Should I point the LLC4320 grid to the link you mentioned? https://catalog.pangeo.io/browse/master/ocean/LLC4320/LLC4320_grid/ I hope you're doing OK - staying safe, sane, & healthy! |
Thanks for the update Tim! It looks like we are getting some test failures on python 3.6 only. I'm not sure what's behind this. They appear unrelated to your PR, but it would be good to try to resolve before merging. |
For sure! I'll start cracking into it tonight. I have been lazily ignoring it, but I agree it would be good to get to the bottom of before getting a new version on pip. |
Now that I've merged #200, could you rebase this? Hopefully the tests will all go green. |
9c25904
to
ca80a05
Compare
Looks as good as green to me :) One last thing: should we keep the LLC4320 grid link to the ECCO portal, which has oddly an incomplete set of the coordinate files or switch to the Pangeo link? The ECCO portal is currently missing e.g. Depth, but containing the seemingly necessary ones like dx/dy, xc/yc. Or maybe the ECCO portal could get updated ... |
Are you referring to this directory: https://data.nas.nasa.gov/ecco/data.php?dir=/eccodata/llc_4320/grid Let's ping @menemenlis to see if we can get them to update the grid directory to include the complete set of files. Do we have a full inventory of the missing files? |
Yep that's what I'm referring to and currently referencing in
I think the |
I've followed up with Dimitris over email to try to get his input on this. |
Are you looking for missing grid at ECCO NAS data portal? |
When we developed this tool, only ECCO Data Portal existed. @hongandyan -- could you and / or @menemenlis explain the difference between "ECCO Drive" and "ECCO Data Portal"? Why would we use one vs. the other? |
sorry for the confusion. |
I just got the following info from @menemenlis:
@timothyas - can you check that things are working on your branch with these new additions? |
Codecov Report
@@ Coverage Diff @@
## master #197 +/- ##
==========================================
- Coverage 88.09% 79.68% -8.41%
==========================================
Files 12 11 -1
Lines 1646 1275 -371
Branches 338 255 -83
==========================================
- Hits 1450 1016 -434
- Misses 158 226 +68
+ Partials 38 33 -5
Continue to review full report at Codecov.
|
Thanks @menemenlis! The files listed look good to me, and once they are on the portal everything should be working smoothly. Right now though I'm running into a strange 404 Error whenever the data are actually called/loaded. This is happening on travis with my latest commit, see here at |
@timothyas It's ECCO-Data portal temporary issue. |
Thanks @hongandyan! I'll check back in a bit then. |
FYI.. The missing grid data was just exported and is available for download: |
Woohoo! Thanks @menemenlis and @ryanspaulding! This seems good to go now, I'm getting a passing build and my personal test notebook is lookin' good. |
Fantastic! We should do a new release of xmitgcm asap. |
* access to llc2160 grid is a go, with a couple caveats for now * use grid var names not filenames, get vertical grid vars working * lil typo * fixes loading grid dir issues * it works for llc2160 * add directory to some LLC4320 grid vars * separate grid metadata function, add shrunk_grid flag * add grid_varnames to paramter list * test that grid variables show up in portal datasets * prefixes->variables * try Zl... * extra logic to create Zl, Zu, and Zp1 * make reading grid optional * wow, not copying dicts really messes things up * only build Zu, Zl if RF is there * shouldn't kp1 have nz+1 levels * only read grid if grid_varnames has entries * also test selected k_p1 levels * typo... * ... more typos... * rhoref, extra dx/dy metrics * add new metrics to test * typos, and add rhoRef to vgrid prefixes * load vgrid as one big chunk, all at once. test it too! * oops, fixed test * fix for global vgrid vars... what about layers.. tbd * cleanup vgrid check * cleanup grid var dict situation * also test latlon vgrid coord chunking * faces->ll * update docs * tiny typo * added more 4320 grid files
This PR is designed to give llcreader access to grid information on the ECCO portal (or elsewhere), to hopefully to close #161, close #166, and close #158 (possibly more).
Right now it gets the full grid for LLC 2160 and gets the available grid info on the ECCO portal for LLC 4320, which is some but not all. I was not able access the grid info mentioned by @rabernat in #161, which was here
vertical grid variables and coordinates
this got a little tricky since everything is designed to deal with faceted data. I wrote some functions to read these in differently. I adapted the same infrastructure to the 1D variables, even though they could probably be dealt with eagerly (see
_get_1d_chunk
and_dask_array_vgrid
in llcmodel.py).k_p1
indexI changed this implementation as follows. Currently, the code grabs
k_levels
fork_p1
wherek_levels
is the cell centered index. Even if the user does not select levels, this still results inlen(k)==len(k_p1)
, which is technically not right, unless I'm missing something... It gets tricky though when the user selects individual, nonconsecutive levels. I wrote this PR so that, if the user selects cell centeredk_levels=[0,10,11]
then the output dataset hask_p1=[0,1,10,11,12]
- so thatk_p1
has the upper and lower indices for each vertical grid cell. I found this easier to do in order to correctly assignZu
,Zl
, andZp1
. This implementation "seems right", but I'm interested if anyone has an idea for how to handle it more elegantly or if it's unnecessary/confusing. It also felt like a somewhat hacky solution, so again, feedback welcome. (see_get_kp1_levels
in llcmodel.py). With this new implementation I had to change the expected size ofk_p1
to be 51 and 91 for llc90 and llc2160/llc4320.tests
I just appended checks for this stuff in current tests rather than adding new ones. Tests for the
k_p1
levels is intest_llc90_local_faces_load
and 2160/4320 grids checked for intest_ecco_portal_faces/latlon
. Perhaps could add more, testing the addedread_grid
andgrid_vars_to_coords
options added toget_dataset
.Some of the implementation started to feel a little hacky, and I'm open to anyone's suggestion on making it cleaner. Let me know what you think!