-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CMORizer for ESACCI-SEAICE #3821
base: main
Are you sure you want to change the base?
Conversation
I am looking at it now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@axel-lauer: Thanks for the well working cmorizer!
I tried the downloader, formatter and the check-obs recipe. All works fine. Data looks good. All files that need to be updated are modified/created.
Good to be merged from my side if you fix the little mistake that I commented on in one of the files.
Great work! 🍾
Hey @rbeucher: I finished the scientific review. Would be great if you could do the technical review for this! Thanks! |
Thanks for reviewing! The typo has been fixed here 524a186 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @axel-lauer for the CMORiser and @hb326, for the scientific review!
I used the downloader and formatter for the entire dataset—it works well, but I think the code could be significantly simplified.
It might also be useful to add an option to select a single region at a time.
frequency1: day | ||
frequency2: mon |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these used? I can't find them in the formatter code?
regions = ('NH', 'SH') | ||
basepath = 'sea_ice_concentration/L4/ssmi_ssmis/12.5km/v3.0' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be a good idea to have the option to download a specified region.
if is_daily: | ||
hrs = 12 | ||
else: | ||
hrs = 0 | ||
newtime = datetime(year=year, month=month, day=day, | ||
hour=hrs, minute=0, second=0, microsecond=0) | ||
newtime_num = cf_units.date2num(newtime, dataset_time_unit, | ||
dataset_time_calender) | ||
nan_cube.coord('time').points = float(newtime_num) | ||
|
||
# remove existing time bounds and create new bounds | ||
coord = nan_cube.coord('time') | ||
if is_daily: | ||
bnd1 = newtime + relativedelta.relativedelta(hours=-12) | ||
bnd2 = bnd1 + relativedelta.relativedelta(days=1) | ||
else: | ||
bnd1 = newtime + relativedelta.relativedelta(days=-day + 1) | ||
bnd2 = bnd1 + relativedelta.relativedelta(months=1) | ||
coord.bounds = [cf_units.date2num(bnd1, dataset_time_unit, | ||
dataset_time_calender), | ||
cf_units.date2num(bnd2, dataset_time_unit, | ||
dataset_time_calender)] | ||
|
||
return nan_cube |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The raw data is daily, do we need to keep the code for monthly data??
if is_daily: | ||
loop_date = datetime(year0, 1, 1) | ||
while loop_date <= datetime(year0, 12, 31): | ||
date_available = False | ||
for idx, cubetime in enumerate(time_list): | ||
if loop_date == cubetime: | ||
date_available = True | ||
break | ||
if date_available: | ||
full_list.append(new_list[idx]) | ||
else: | ||
logger.debug("No data available for %d/%d/%d", loop_date.month, | ||
loop_date.day, loop_date.year) | ||
nan_cube = _create_nan_cube(new_list[0], loop_date.year, | ||
loop_date.month, loop_date.day, | ||
is_daily) | ||
full_list.append(nan_cube) | ||
loop_date += relativedelta.relativedelta(days=1) | ||
else: | ||
loop_date = datetime(year0, 1, 15) | ||
while loop_date <= datetime(year0, 12, 31): | ||
date_available = False | ||
for idx, cubetime in enumerate(time_list): | ||
if loop_date == cubetime: | ||
date_available = True | ||
break | ||
if date_available: | ||
full_list.append(new_list[idx]) | ||
else: | ||
logger.debug("No data available for %d/%d", loop_date.month, | ||
loop_date.year) | ||
nan_cube = _create_nan_cube(new_list[0], loop_date.year, | ||
loop_date.month, loop_date.day, | ||
is_daily) | ||
full_list.append(nan_cube) | ||
loop_date += relativedelta.relativedelta(months=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, data is daily by default, should we keep the code for monthly frequency? It is not used
logger.debug("Saving cube\n%s", cube) | ||
logger.debug("Setting time dimension to UNLIMITED while saving!") | ||
version = attributes['version'] | ||
attributes['mip'] = var['mip2'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assumes that mip2 has a monthly frequency...
daily = True | ||
while loop_date <= end_date: | ||
filepattern = os.path.join( | ||
in_dir, region, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest checking missing dates and output warnings. This could be done for the downloader as well.
var['file'].format(year=loop_date.year, region=region) | ||
) | ||
in_files = glob.glob(filepattern) | ||
if not in_files: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would work even if only one file is available for the year... I believe it could be improved with a more thorough validation.
return cube | ||
|
||
|
||
def _extract_variable(in_files, var, cfg, out_dir, is_daily, year0, region): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As flagged by codacy, the complexity of the function is too high. I believe that the code for non-daily data is not needed and could be removed
logger.info('%d: no data not found for ' | ||
'variable %s', loop_date.year, short_name) | ||
else: | ||
_extract_variable(in_files, var, cfg, out_dir, daily, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest adding region as an option and passing it via the cfg object.
logger.debug("No data available for %d/%d/%d", loop_date.month, | ||
loop_date.day, loop_date.year) | ||
nan_cube = _create_nan_cube(new_list[0], loop_date.year, | ||
loop_date.month, loop_date.day, | ||
is_daily) | ||
full_list.append(nan_cube) | ||
loop_date += relativedelta.relativedelta(days=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK I see that the check for missing days is done here. My problem is it appears while the data processing has already started, nan cube are added by default. This is error prone if for some reason, there was an issue with the download.
Description
This PR adds downloading and formatting scripts for the ESACCI-SEAICE dataset version L4-SICONC-RE-SSMI-12.5kmEASE2-fv3.0. The scripts process daily and monthly mean sea ice concentration (siconc).
Checklist
New or updated data reformatting script