-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft of eNATL60 recipe #24
base: master
Are you sure you want to change the base?
Conversation
And I should mention that these instructions are really not clear yet or up to date. Mostly we just need the python code to generate a |
Based on the tutorial, the recipe should no longer be a python script but rather be executed via a Jupyter notebook...? |
No it's a python file. See #20. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Takaya!
Have you actually tried running this recipe locally? I can't imagine it would have worked as is, due to the different size of the regions.
By running this locally and assuming I make it work, it will push the data to OSM...? |
Locally you would assign other (local) storage targets and step through the recipe one bit at a time, as in the tutorial. You don't need to run the whole thing. But this allows you to debug your own recipe rather than just guess. |
I'm getting the following import error from rechunker when trying to run the recipe locally but is this version specific...?
|
ah you have to install both pangeo-forge and rechunker from github master. Thanks for your patience experimenting with bleeding-edge software! |
This is useful. We can stop here for now. Once we have this recipe working, we can add others. Many recipes can live in the same .py file. |
Is there a flexible way to prescribe the chunk size? It seems that the flag |
I'm getting the following error when running:
I'm assuming this is due to the chunk sizes being too large... but is this understanding correct? |
The recipe will first cache the entire dataset by downloading the files. It looks like you've filled up your hard disk. You probably don't want to run the entire recipe locally. Instead, try to just run a few steps, as in this example: https://pangeo-forge.readthedocs.io/en/latest/tutorials/netcdf_zarr_sequential.html#step-5-create-storage-targets I'm also going to play with this recipe today to see if I can get it working. |
Could you let me know if you found further fixes I'd need for the recipe? Also Yuvi asked us whether there's any data on OSN he could test with for the SWOT-AdAC Jupyterhub deployment. |
Co-authored-by: Ryan Abernathey <[email protected]>
The region 1 dataset is now on OSN! 🎉 import xarray as xr
osn_url = 'https://ncsa.osn.xsede.org/Pangeo/pangeo-forge/swot_adac/eNATL60_surface_region_1'
ds = xr.open_zarr(osn_url, consolidated=True)
print(ds)
I have to say, pangeo forge worked like a charm! |
The latest version of the full recipe I am using is from itertools import product
from pangeo_forge.patterns import pattern_from_file_sequence
from pangeo_forge.recipes import XarrayZarrRecipe
import pandas as pd
regions = [1, 2, 3]
season_months = {
'fma': pd.date_range("2010-02", "2010-05", freq="M"),
'aso': pd.date_range("2009-08", "2009-10", freq="M")
}
url_base = (
"https://ige-meom-opendap.univ-grenoble-alpes.fr"
"/thredds/fileServer/meomopendap/extract/SWOT-Adac"
)
def make_recipe_surface(region, season):
input_url_pattern = url_base + "/Surface/eNATL60/Region{reg:02d}-surface-hourly_{yymm}.nc"
months = season_months[season]
input_urls = [input_url_pattern.format(reg=region, yymm=date.strftime("%Y-%m"))
for date in months]
file_pattern = pattern_from_file_sequence(input_urls, "time_counter")
recipe = XarrayZarrRecipe(
file_pattern,
target_chunks={'time_counter': 72}
)
return recipe
def make_recipe_interior(region, season):
input_url_pattern = url_base + "/Interior/eNATL60/Region{reg:02d}-interior-daily_{yymm}.nc"
months = season_months[season]
input_urls = [input_url_pattern.format(reg=region, yymm=date.strftime("%Y-%m"))
for date in months]
file_pattern = pattern_from_file_sequence(input_urls, "time_counter")
recipe = XarrayZarrRecipe(
file_pattern,
target_chunks={'time_counter': "50MB"}
)
return recipe
recipes = {f'eNATL60/Region{reg:02d}/surface_hourly/{season}': make_recipe_surface(reg, season)
for reg, season in product(regions, season_months)}
recipes.update(
{f'eNATL60/Region{reg:02d}/interior_daily/{season}': make_recipe_interior(reg, season)
for reg, season in product(regions, season_months)}
) cc @cisaacstern |
Noting that pangeo-forge/pangeo-forge-recipes#164 is a blocker for running the interior recipes. Working on that issue now and hope to have some solutions soon. |
eNATL60 interior data is now on OSN, thanks to @rabernat's efforts on pangeo-forge/pangeo-forge-recipes#171 and pangeo-forge/pangeo-forge-recipes#166 It can be accessed via the |
I'm getting the following error for the summer interior data:
Winter data seems to be fine :) |
@roxyboy, oops! Looks like I'd forgotten to consolidate zarr metadata for that dataset. It should be corrected now. |
@cisaacstern I've noticed that the summer interior data only has 61 time steps, which is weird for three months of daily data... I'll check if I made a mistake in extracting the data but could you also check if the zarr metadata consolidation was correct? |
@roxyboy I've rebuilt all six of the eNATL60 summer datasets (both interior and surface, for all three regions) to include the previously missing October data. The length of |
hi folks, i'm curious... is this recipe related to or a duplicate of https://github.com/pangeo-forge/eNATL60-feedstock (maintained by @auraoupa) |
Not a duplicate, it is the same simulation but different extractions : here we have extracted some sub region in 3D (300 levels), in https://github.com/pangeo-forge/eNATL60-feedstock it is the whole North Atlantic at some depths |
Added edits to
pipeline.py
to ingest eNATL60 data.