CMORizer for ESACCI-SEAICE #3821

axel-lauer · 2024-11-25T11:45:21Z

Description

This PR adds downloading and formatting scripts for the ESACCI-SEAICE dataset version L4-SICONC-RE-SSMI-12.5kmEASE2-fv3.0. The scripts process daily and monthly mean sea ice concentration (siconc).

Checklist

[🛠][1] This pull request has a descriptive title
[🛠][1] Code is written according to the code quality guidelines
[🛠][1] Documentation is available
[🛠][1] Tests run successfully
[🛠][1] The list of authors is up to date
[🛠][1] Any changed dependencies have been added or removed correctly
[🛠][1] All checks below this pull request were successful

New or updated data reformatting script

[🛠][1] Documentation is available
[🛠][1] The dataset has been added to the CMOR check recipe
[🛠][1] The dataset has been added to the shared data pools of DKRZ and Jasmin by the @ESMValGroup/OBS-maintainers team
[🧪][2] Numbers and units of the data look physically meaningful

rbeucher · 2025-01-30T05:09:30Z

I am looking at it now

esmvaltool/cmorizers/data/formatters/datasets/esacci_seaice.py

hb326

@axel-lauer: Thanks for the well working cmorizer!
I tried the downloader, formatter and the check-obs recipe. All works fine. Data looks good. All files that need to be updated are modified/created.
Good to be merged from my side if you fix the little mistake that I commented on in one of the files.

Great work! 🍾

hb326 · 2025-02-03T14:15:17Z

I am looking at it now

Hey @rbeucher: I finished the scientific review. Would be great if you could do the technical review for this! Thanks!

axel-lauer · 2025-02-03T14:26:17Z

@axel-lauer: Thanks for the well working cmorizer! I tried the downloader, formatter and the check-obs recipe. All works fine. Data looks good. All files that need to be updated are modified/created. Good to be merged from my side if you fix the little mistake that I commented on in one of the files.

Great work! 🍾

Thanks for reviewing! The typo has been fixed here 524a186

rbeucher

Thanks @axel-lauer for the CMORiser and @hb326, for the scientific review!
I used the downloader and formatter for the entire dataset—it works well, but I think the code could be significantly simplified.
It might also be useful to add an option to select a single region at a time.

rbeucher · 2025-02-03T21:17:46Z

esmvaltool/cmorizers/data/cmor_config/ESACCI-SEAICE.yml

+    frequency1: day
+    frequency2: mon


Are these used? I can't find them in the formatter code?

rbeucher · 2025-02-03T21:19:10Z

esmvaltool/cmorizers/data/downloaders/datasets/esacci_seaice.py

+    regions = ('NH', 'SH')
+    basepath = 'sea_ice_concentration/L4/ssmi_ssmis/12.5km/v3.0'


It might be a good idea to have the option to download a specified region.

rbeucher · 2025-02-03T21:20:19Z

esmvaltool/cmorizers/data/formatters/datasets/esacci_seaice.py

+    if is_daily:
+        hrs = 12
+    else:
+        hrs = 0
+    newtime = datetime(year=year, month=month, day=day,
+                       hour=hrs, minute=0, second=0, microsecond=0)
+    newtime_num = cf_units.date2num(newtime, dataset_time_unit,
+                                    dataset_time_calender)
+    nan_cube.coord('time').points = float(newtime_num)
+
+    # remove existing time bounds and create new bounds
+    coord = nan_cube.coord('time')
+    if is_daily:
+        bnd1 = newtime + relativedelta.relativedelta(hours=-12)
+        bnd2 = bnd1 + relativedelta.relativedelta(days=1)
+    else:
+        bnd1 = newtime + relativedelta.relativedelta(days=-day + 1)
+        bnd2 = bnd1 + relativedelta.relativedelta(months=1)
+    coord.bounds = [cf_units.date2num(bnd1, dataset_time_unit,
+                                      dataset_time_calender),
+                    cf_units.date2num(bnd2, dataset_time_unit,
+                                      dataset_time_calender)]
+
+    return nan_cube


The raw data is daily, do we need to keep the code for monthly data??

rbeucher · 2025-02-03T21:21:42Z

esmvaltool/cmorizers/data/formatters/datasets/esacci_seaice.py

+    if is_daily:
+        loop_date = datetime(year0, 1, 1)
+        while loop_date <= datetime(year0, 12, 31):
+            date_available = False
+            for idx, cubetime in enumerate(time_list):
+                if loop_date == cubetime:
+                    date_available = True
+                    break
+            if date_available:
+                full_list.append(new_list[idx])
+            else:
+                logger.debug("No data available for %d/%d/%d", loop_date.month,
+                             loop_date.day, loop_date.year)
+                nan_cube = _create_nan_cube(new_list[0], loop_date.year,
+                                            loop_date.month, loop_date.day,
+                                            is_daily)
+                full_list.append(nan_cube)
+            loop_date += relativedelta.relativedelta(days=1)
+    else:
+        loop_date = datetime(year0, 1, 15)
+        while loop_date <= datetime(year0, 12, 31):
+            date_available = False
+            for idx, cubetime in enumerate(time_list):
+                if loop_date == cubetime:
+                    date_available = True
+                    break
+            if date_available:
+                full_list.append(new_list[idx])
+            else:
+                logger.debug("No data available for %d/%d", loop_date.month,
+                             loop_date.year)
+                nan_cube = _create_nan_cube(new_list[0], loop_date.year,
+                                            loop_date.month, loop_date.day,
+                                            is_daily)
+                full_list.append(nan_cube)
+            loop_date += relativedelta.relativedelta(months=1)


Same here, data is daily by default, should we keep the code for monthly frequency? It is not used

rbeucher · 2025-02-03T21:22:33Z

esmvaltool/cmorizers/data/formatters/datasets/esacci_seaice.py

+    logger.debug("Saving cube\n%s", cube)
+    logger.debug("Setting time dimension to UNLIMITED while saving!")
+    version = attributes['version']
+    attributes['mip'] = var['mip2']


This assumes that mip2 has a monthly frequency...

rbeucher · 2025-02-03T21:27:49Z

esmvaltool/cmorizers/data/formatters/datasets/esacci_seaice.py

+            daily = True
+            while loop_date <= end_date:
+                filepattern = os.path.join(
+                    in_dir, region,


I suggest checking missing dates and output warnings. This could be done for the downloader as well.

rbeucher · 2025-02-03T21:29:54Z

esmvaltool/cmorizers/data/formatters/datasets/esacci_seaice.py

+                    var['file'].format(year=loop_date.year, region=region)
+                )
+                in_files = glob.glob(filepattern)
+                if not in_files:


This would work even if only one file is available for the year... I believe it could be improved with a more thorough validation.

rbeucher · 2025-02-03T21:30:48Z

esmvaltool/cmorizers/data/formatters/datasets/esacci_seaice.py

+    return cube
+
+
+def _extract_variable(in_files, var, cfg, out_dir, is_daily, year0, region):


As flagged by codacy, the complexity of the function is too high. I believe that the code for non-daily data is not needed and could be removed

rbeucher · 2025-02-03T21:33:26Z

esmvaltool/cmorizers/data/formatters/datasets/esacci_seaice.py

+                    logger.info('%d: no data not found for '
+                                'variable %s', loop_date.year, short_name)
+                else:
+                    _extract_variable(in_files, var, cfg, out_dir, daily,


I suggest adding region as an option and passing it via the cfg object.

rbeucher · 2025-02-03T21:39:33Z

esmvaltool/cmorizers/data/formatters/datasets/esacci_seaice.py

+                logger.debug("No data available for %d/%d/%d", loop_date.month,
+                             loop_date.day, loop_date.year)
+                nan_cube = _create_nan_cube(new_list[0], loop_date.year,
+                                            loop_date.month, loop_date.day,
+                                            is_daily)
+                full_list.append(nan_cube)
+            loop_date += relativedelta.relativedelta(days=1)


OK I see that the check for missing days is done here. My problem is it appears while the data processing has already started, nan cube are added by default. This is error prone if for some reason, there was an issue with the download.

axel-lauer added 8 commits November 8, 2024 10:52

snapshot 2024-11-08

ab2524b

snapshot 2024-11-08 (2)

c264710

snapshot 2024-11-18

9f7ea1f

first working version

1f1d284

cleaned up version

8117a16

removed variable attribues valid_min and valid_max

e2a3696

fixed 2 codacy issues

dd69976

working on codacy issues

20d8ff3

axel-lauer mentioned this pull request Jan 29, 2025

Add recipe for seasonal cycle of Arctic/Antarctic sea ice extent (REF) #3891

Open

11 tasks

axel-lauer marked this pull request as ready for review January 29, 2025 09:09

axel-lauer requested a review from a team as a code owner January 29, 2025 09:10

axel-lauer added the REF Important for the CMIP Rapid Evaluation Framework (REF) label Jan 29, 2025

axel-lauer self-assigned this Jan 29, 2025

axel-lauer mentioned this pull request Jan 29, 2025

Arctic/Antarctic sea ice extent (seasonal cycle) REF #3890

Open

Merge branch 'main' into cmorize_esacci_seaice

545ab01

hb326 added the in scientific review label Jan 30, 2025

hb326 reviewed Jan 31, 2025

View reviewed changes

esmvaltool/cmorizers/data/formatters/datasets/esacci_seaice.py Show resolved Hide resolved

hb326 requested changes Feb 3, 2025

View reviewed changes

Fixed type in formatter esacci_seaice.py

524a186

axel-lauer added the looking for technical reviewer label Feb 3, 2025

hb326 approved these changes Feb 3, 2025

View reviewed changes

hb326 added approved by scientific reviewer and removed in scientific review labels Feb 3, 2025

rbeucher self-assigned this Feb 3, 2025

rbeucher requested changes Feb 3, 2025

View reviewed changes

Merge branch 'main' into cmorize_esacci_seaice

110fc47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CMORizer for ESACCI-SEAICE #3821

CMORizer for ESACCI-SEAICE #3821

axel-lauer commented Nov 25, 2024

rbeucher commented Jan 30, 2025

hb326 left a comment

hb326 commented Feb 3, 2025

axel-lauer commented Feb 3, 2025 •

edited

Loading

rbeucher left a comment

rbeucher Feb 3, 2025

rbeucher Feb 3, 2025

rbeucher Feb 3, 2025

rbeucher Feb 3, 2025

rbeucher Feb 3, 2025

rbeucher Feb 3, 2025

rbeucher Feb 3, 2025

rbeucher Feb 3, 2025

rbeucher Feb 3, 2025

rbeucher Feb 3, 2025

		regions = ('NH', 'SH')
		basepath = 'sea_ice_concentration/L4/ssmi_ssmis/12.5km/v3.0'

		return cube


		def _extract_variable(in_files, var, cfg, out_dir, is_daily, year0, region):

CMORizer for ESACCI-SEAICE #3821

Are you sure you want to change the base?

CMORizer for ESACCI-SEAICE #3821

Conversation

axel-lauer commented Nov 25, 2024

Description

Checklist

rbeucher commented Jan 30, 2025

hb326 left a comment

Choose a reason for hiding this comment

hb326 commented Feb 3, 2025

axel-lauer commented Feb 3, 2025 • edited Loading

rbeucher left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

axel-lauer commented Feb 3, 2025 •

edited

Loading