Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add noaa-cdr datasets #82

Merged
merged 26 commits into from
Apr 28, 2023
Merged

Add noaa-cdr datasets #82

merged 26 commits into from
Apr 28, 2023

Conversation

gadomski
Copy link
Contributor

@gadomski gadomski commented Nov 10, 2022

Description

Collections:

  • noaa-cdr-ocean-heat-content
  • noaa-cdr-ocean-heat-content-netcdf
  • noaa-cdr-sea-ice-concentration (not included because of version mismatch between Azure blob storage assets and NOAA's assets)
  • noaa-cdr-sea-surface-temperature-optimum-interpolation
  • noaa-cdr-sea-surface-temperature-whoi
  • noaa-cdr-sea-surface-temperature-whoi-netcdf

Type of change

  • New feature (non-breaking change which adds functionality)

How Has This Been Tested?

Item(s) from all six collections have been ingested into the test instance.

Checklist:

Please delete options that are not relevant.

  • I have performed a self-review
  • Changelog has been updated
  • Documentation has been updated
  • Unit tests pass locally (./scripts/test)
  • Code is linted and styled (./scripts/format)

Screenshots

Ocean heat content

image

Sea surface temperature optimum interpolation

image

Sea surface temperature WHOI

image

@gadomski gadomski force-pushed the noaa-cdr branch 2 times, most recently from 7e29016 to 234b3fe Compare November 10, 2022 17:58
@gadomski gadomski self-assigned this Jan 12, 2023
@gadomski gadomski marked this pull request as ready for review February 28, 2023 15:35
@gadomski gadomski marked this pull request as draft February 28, 2023 15:45
Collections:
- noaa-cdr-ocean-heat-content
- noaa-cdr-ocean-heat-content-netcdf
- noaa-cdr-sea-ice-concentration
- noaa-cdr-sea-surface-temperature-optimum-interpolation
- noaa-cdr-sea-surface-temperature-whoi
- noaa-cdr-sea-surface-temperature-whoi-netcdf
@gadomski gadomski marked this pull request as ready for review March 31, 2023 13:40
@gadomski gadomski requested a review from TomAugspurger April 18, 2023 13:18
datasets/noaa-cdr/noaa_cdr.py Outdated Show resolved Hide resolved
datasets/noaa-cdr/noaa_cdr.py Outdated Show resolved Hide resolved
datasets/noaa-cdr/noaa_cdr.py Show resolved Hide resolved
@gadomski gadomski requested a review from TomAugspurger April 25, 2023 16:39
@gadomski
Copy link
Contributor Author

@TomAugspurger media type and updated link fixed.

@gadomski gadomski requested a review from TomAugspurger April 25, 2023 21:03
Tom Augspurger added 2 commits April 26, 2023 09:29
* Added Dockerfile
* Updated image
* Added app insights string
* Added chunking to sea-ice
@TomAugspurger
Copy link
Contributor

TomAugspurger commented Apr 27, 2023

I've had a handful of troubles with the full run.

  • 395bc3e adds some chunking to the larger collections.
  • Some of the sea-surface collections are occasionally hanging in the read from blob storage. To attempt to fix that, I've
    • Updated the version of Python in the task runner to 3.11, which includes some fixes in the ssl module.
    • Explicitly set a timeout in the call to download_blob (2cac208)
    • Set a global default timeout (unsure if this does anything)
  • A task failed while reading a NetCDF file. The traceback is below, but I haven't been able to reproduce that locally. It might be worth explicitly setting engine="h5netcdf" in the call to xarray.open_dataset in stactools-noaacdr, but I'm pessimistic that would have helped here.
pctasks.dataset.items.task.CreateItemsError: Failed to create item from blob://noaacdr/sea-surface-temp-optimum-interpolation/data/v2.1/avhrr/199708/oisst-avhrr-v02r01.19970805.nc

[INFO]:2023-04-26 21:23:11,698: (039.00%) [5.37s]  - blob://noaacdr/sea-surface-temp-optimum-interpolation/data/v2.1/avhrr/199708/oisst-avhrr-v02r01.19970804.nc (117 of 300)
/mnt/batch/tasks/workitems/noaa-cdr-process-chunk-087ca945--b2-b2a7-82031c5bc775-tsk_gen_gr/job-1/create-items-19/wd/_code/xarray/backends/plugins.py:159: RuntimeWarning: 'h5netcdf' fails while guessing
  warnings.warn(f"{engine!r} fails while guessing", RuntimeWarning)
[INFO]:2023-04-26 21:23:25,027:  === PCTasks: Task Failed! ===
[ERROR]:2023-04-26 21:23:25,028: Failed to create item from blob://noaacdr/sea-surface-temp-optimum-interpolation/data/v2.1/avhrr/199708/oisst-avhrr-v02r01.19970805.nc
Traceback (most recent call last):
  File "/opt/conda/lib/python3.11/site-packages/pctasks/dataset/items/task.py", line 174, in create_items
    result = self._create_item(asset_uri, storage_factory)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/batch/tasks/workitems/noaa-cdr-process-chunk-087ca945--b2-b2a7-82031c5bc775-tsk_gen_gr/job-1/create-items-19/wd/_code/noaa_cdr.py", line 161, in create_item
    item = stactools.noaa_cdr.stac.add_cogs(item, temporary_directory)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/batch/tasks/workitems/noaa-cdr-process-chunk-087ca945--b2-b2a7-82031c5bc775-tsk_gen_gr/job-1/create-items-19/wd/_code/stactools/noaa_cdr/stac.py", line 94, in add_cogs
    assets = cog.cogify(
             ^^^^^^^^^^^
  File "/mnt/batch/tasks/workitems/noaa-cdr-process-chunk-087ca945--b2-b2a7-82031c5bc775-tsk_gen_gr/job-1/create-items-19/wd/_code/stactools/noaa_cdr/cog.py", line 25, in cogify
    with xarray.open_dataset(file, mask_and_scale=False) as ds:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/batch/tasks/workitems/noaa-cdr-process-chunk-087ca945--b2-b2a7-82031c5bc775-tsk_gen_gr/job-1/create-items-19/wd/_code/xarray/backends/api.py", line 509, in open_dataset
    engine = plugins.guess_engine(filename_or_obj)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/batch/tasks/workitems/noaa-cdr-process-chunk-087ca945--b2-b2a7-82031c5bc775-tsk_gen_gr/job-1/create-items-19/wd/_code/xarray/backends/plugins.py", line 197, in guess_engine
    raise ValueError(error_msg)
ValueError: found the following matches with the input file in xarray's IO backends: ['h5netcdf']. But their dependencies may not be installed, see:
https://docs.xarray.dev/en/stable/user-guide/io.html 
https://docs.xarray.dev/en/stable/getting-started-guide/installing.html

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.11/site-packages/pctasks/task/run.py", line 138, in run_task
    result = task.parse_and_run(task_data, task_context)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/pctasks/task/task.py", line 53, in parse_and_run
    output = self.run(args, context)
             ^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/pctasks/dataset/items/task.py", line 203, in run
    results = self.create_items(input, context)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/pctasks/dataset/items/task.py", line 176, in create_items
    raise CreateItemsError(
pctasks.dataset.items.task.CreateItemsError: Failed to create item from blob://noaacdr/sea-surface-temp-optimum-interpolation/data/v2.1/avhrr/199708/oisst-avhrr-v02r01.19970805.nc
[INFO]:2023-04-26 21:23:25,086: Task run complete.
Traceback (most recent call last):
  File "/opt/conda/lib/python3.11/site-packages/pctasks/dataset/items/task.py", line 174, in create_items
    result = self._create_item(asset_uri, storage_factory)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/batch/tasks/workitems/noaa-cdr-process-chunk-087ca945--b2-b2a7-82031c5bc775-tsk_gen_gr/job-1/create-items-19/wd/_code/noaa_cdr.py", line 161, in create_item
    item = stactools.noaa_cdr.stac.add_cogs(item, temporary_directory)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/batch/tasks/workitems/noaa-cdr-process-chunk-087ca945--b2-b2a7-82031c5bc775-tsk_gen_gr/job-1/create-items-19/wd/_code/stactools/noaa_cdr/stac.py", line 94, in add_cogs
    assets = cog.cogify(
             ^^^^^^^^^^^
  File "/mnt/batch/tasks/workitems/noaa-cdr-process-chunk-087ca945--b2-b2a7-82031c5bc775-tsk_gen_gr/job-1/create-items-19/wd/_code/stactools/noaa_cdr/cog.py", line 25, in cogify
    with xarray.open_dataset(file, mask_and_scale=False) as ds:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/batch/tasks/workitems/noaa-cdr-process-chunk-087ca945--b2-b2a7-82031c5bc775-tsk_gen_gr/job-1/create-items-19/wd/_code/xarray/backends/api.py", line 509, in open_dataset
    engine = plugins.guess_engine(filename_or_obj)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/batch/tasks/workitems/noaa-cdr-process-chunk-087ca945--b2-b2a7-82031c5bc775-tsk_gen_gr/job-1/create-items-19/wd/_code/xarray/backends/plugins.py", line 197, in guess_engine
    raise ValueError(error_msg)
ValueError: found the following matches with the input file in xarray's IO backends: ['h5netcdf']. But their dependencies may not be installed, see:
https://docs.xarray.dev/en/stable/user-guide/io.html 
https://docs.xarray.dev/en/stable/getting-started-guide/installing.html

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/conda/bin/pctasks", line 8, in <module>
    sys.exit(cli())
             ^^^^^
  File "/opt/conda/lib/python3.11/site-packages/pctasks/cli/cli.py", line 140, in cli
    pctasks_cmd(prog_name="pctasks")
  File "/opt/conda/lib/python3.11/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/click/decorators.py", line 26, in new_func
    return f(get_current_context(), *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/pctasks/task/cli.py", line 50, in run_cmd
    _cli.run_cmd(
  File "/opt/conda/lib/python3.11/site-packages/pctasks/task/_cli.py", line 32, in run_cmd
    output = run_task(msg)
             ^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/pctasks/task/run.py", line 138, in run_task
    result = task.parse_and_run(task_data, task_context)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/pctasks/task/task.py", line 53, in parse_and_run
    output = self.run(args, context)
             ^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/pctasks/dataset/items/task.py", line 203, in run
    results = self.create_items(input, context)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/pctasks/dataset/items/task.py", line 176, in create_items

FWIW, the compute is in West Europe, but this dataset resides in East US, which might contribute to the networking / download issues.

gadomski and others added 4 commits April 27, 2023 08:49
This adds a last-ditch timeout to the user-defined create_items.
It *should* reliably interrupt functions that run longer than the
user-specified `timeout` (unset by defualt, i.e. no timeout).

It works by registering a singal handler for `SIGALRM`
(https://www.man7.org/linux/man-pages/man7/signal.7.html,
https://www.man7.org/linux/man-pages/man2/alarm.2.html) and setting an
alarm for `timeout` seconds. If more than that passes, the kernel takes
care of interrupting our thread, and we'll raise a `TimeoutError`.

To handle the "common" case of something in `ssl` is stuck, we'll also
retry `TimeoutErrors` multiple times.
@TomAugspurger
Copy link
Contributor

e5b60cc has (yet another) attempt at handling the timeouts. It's running now on a pair of runs that have gotten stuck 2/2 times now.

I'm a bit... unsure about how this this strategy but I think it'll be OK. The tl/dr is that if you specify a timeout, we'll register an alarm signal. If the user-provided create items function doesn't complete in timeout seconds, that alarm fires and (hopefully?) the hanging function call is interrupted. pctasks will retry these timeouts (on top of whatever retry logic is in create_items) a couple times before bailing.

That said, we should always prefer to solve these problems within the create_items function if possible. This should be a last resort.

I'll split it out into its own PR, but wanted to "test" it out on noaa-cdr since it's been giving problems..

@TomAugspurger
Copy link
Contributor

Depending on your point of view, we got unlucky with the latest round and didn't get any hangs (or the logs I added didn't work properly).

@TomAugspurger
Copy link
Contributor

I did manage to reproduce the issue locally by running this in a loop. This is my understanding:

We were specifying timeout_seconds int he call to download_blob. This is put in the querystring and is used as a server-side timeout. Our hang is on the client side. This was picking up the default of

Timeout(connect=20, read=80000, total=None)

That's in seconds, so we would have eventually interrupted it (in 22.2 hours). I'm not sure who is setting that, but it isn't us.

The correct way to set these client-side, socket timeouts is with read_timeout and connection_timeout to download_blob. I've added that here, and reverted the signal-based interrupt mechanism (which is in #190).

Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@TomAugspurger TomAugspurger merged commit bbc8cf9 into main Apr 28, 2023
@TomAugspurger TomAugspurger deleted the noaa-cdr branch April 28, 2023 17:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants