Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CMORizer for ESACCI-SEAICE #3821

Open
wants to merge 11 commits into
base: main
Choose a base branch
from
Open

CMORizer for ESACCI-SEAICE #3821

wants to merge 11 commits into from

Conversation

axel-lauer
Copy link
Contributor

Description

This PR adds downloading and formatting scripts for the ESACCI-SEAICE dataset version L4-SICONC-RE-SSMI-12.5kmEASE2-fv3.0. The scripts process daily and monthly mean sea ice concentration (siconc).

Checklist

New or updated data reformatting script

@axel-lauer axel-lauer marked this pull request as ready for review January 29, 2025 09:09
@axel-lauer axel-lauer requested a review from a team as a code owner January 29, 2025 09:10
@axel-lauer axel-lauer added the REF Important for the CMIP Rapid Evaluation Framework (REF) label Jan 29, 2025
@axel-lauer axel-lauer self-assigned this Jan 29, 2025
@rbeucher
Copy link
Contributor

I am looking at it now

Copy link
Contributor

@hb326 hb326 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@axel-lauer: Thanks for the well working cmorizer!
I tried the downloader, formatter and the check-obs recipe. All works fine. Data looks good. All files that need to be updated are modified/created.
Good to be merged from my side if you fix the little mistake that I commented on in one of the files.

Great work! 🍾

@hb326
Copy link
Contributor

hb326 commented Feb 3, 2025

I am looking at it now

Hey @rbeucher: I finished the scientific review. Would be great if you could do the technical review for this! Thanks!

@axel-lauer
Copy link
Contributor Author

axel-lauer commented Feb 3, 2025

@axel-lauer: Thanks for the well working cmorizer! I tried the downloader, formatter and the check-obs recipe. All works fine. Data looks good. All files that need to be updated are modified/created. Good to be merged from my side if you fix the little mistake that I commented on in one of the files.

Great work! 🍾

Thanks for reviewing! The typo has been fixed here 524a186

@rbeucher rbeucher self-assigned this Feb 3, 2025
Copy link
Contributor

@rbeucher rbeucher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @axel-lauer for the CMORiser and @hb326, for the scientific review!
I used the downloader and formatter for the entire dataset—it works well, but I think the code could be significantly simplified.
It might also be useful to add an option to select a single region at a time.

Comment on lines +21 to +22
frequency1: day
frequency2: mon
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these used? I can't find them in the formatter code?

Comment on lines +42 to +43
regions = ('NH', 'SH')
basepath = 'sea_ice_concentration/L4/ssmi_ssmis/12.5km/v3.0'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be a good idea to have the option to download a specified region.

Comment on lines +49 to +72
if is_daily:
hrs = 12
else:
hrs = 0
newtime = datetime(year=year, month=month, day=day,
hour=hrs, minute=0, second=0, microsecond=0)
newtime_num = cf_units.date2num(newtime, dataset_time_unit,
dataset_time_calender)
nan_cube.coord('time').points = float(newtime_num)

# remove existing time bounds and create new bounds
coord = nan_cube.coord('time')
if is_daily:
bnd1 = newtime + relativedelta.relativedelta(hours=-12)
bnd2 = bnd1 + relativedelta.relativedelta(days=1)
else:
bnd1 = newtime + relativedelta.relativedelta(days=-day + 1)
bnd2 = bnd1 + relativedelta.relativedelta(months=1)
coord.bounds = [cf_units.date2num(bnd1, dataset_time_unit,
dataset_time_calender),
cf_units.date2num(bnd2, dataset_time_unit,
dataset_time_calender)]

return nan_cube
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The raw data is daily, do we need to keep the code for monthly data??

Comment on lines +180 to +215
if is_daily:
loop_date = datetime(year0, 1, 1)
while loop_date <= datetime(year0, 12, 31):
date_available = False
for idx, cubetime in enumerate(time_list):
if loop_date == cubetime:
date_available = True
break
if date_available:
full_list.append(new_list[idx])
else:
logger.debug("No data available for %d/%d/%d", loop_date.month,
loop_date.day, loop_date.year)
nan_cube = _create_nan_cube(new_list[0], loop_date.year,
loop_date.month, loop_date.day,
is_daily)
full_list.append(nan_cube)
loop_date += relativedelta.relativedelta(days=1)
else:
loop_date = datetime(year0, 1, 15)
while loop_date <= datetime(year0, 12, 31):
date_available = False
for idx, cubetime in enumerate(time_list):
if loop_date == cubetime:
date_available = True
break
if date_available:
full_list.append(new_list[idx])
else:
logger.debug("No data available for %d/%d", loop_date.month,
loop_date.year)
nan_cube = _create_nan_cube(new_list[0], loop_date.year,
loop_date.month, loop_date.day,
is_daily)
full_list.append(nan_cube)
loop_date += relativedelta.relativedelta(months=1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, data is daily by default, should we keep the code for monthly frequency? It is not used

logger.debug("Saving cube\n%s", cube)
logger.debug("Setting time dimension to UNLIMITED while saving!")
version = attributes['version']
attributes['mip'] = var['mip2']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assumes that mip2 has a monthly frequency...

daily = True
while loop_date <= end_date:
filepattern = os.path.join(
in_dir, region,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest checking missing dates and output warnings. This could be done for the downloader as well.

var['file'].format(year=loop_date.year, region=region)
)
in_files = glob.glob(filepattern)
if not in_files:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would work even if only one file is available for the year... I believe it could be improved with a more thorough validation.

return cube


def _extract_variable(in_files, var, cfg, out_dir, is_daily, year0, region):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As flagged by codacy, the complexity of the function is too high. I believe that the code for non-daily data is not needed and could be removed

logger.info('%d: no data not found for '
'variable %s', loop_date.year, short_name)
else:
_extract_variable(in_files, var, cfg, out_dir, daily,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest adding region as an option and passing it via the cfg object.

Comment on lines +191 to +197
logger.debug("No data available for %d/%d/%d", loop_date.month,
loop_date.day, loop_date.year)
nan_cube = _create_nan_cube(new_list[0], loop_date.year,
loop_date.month, loop_date.day,
is_daily)
full_list.append(nan_cube)
loop_date += relativedelta.relativedelta(days=1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I see that the check for missing days is done here. My problem is it appears while the data processing has already started, nan cube are added by default. This is error prone if for some reason, there was an issue with the download.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved by scientific reviewer looking for technical reviewer REF Important for the CMIP Rapid Evaluation Framework (REF)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants