Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

User/tom/fix/s2 pctasks perf #291

Merged
merged 16 commits into from
May 28, 2024
Merged

Conversation

TomAugspurger
Copy link
Contributor

Description

We've observed some issues running the sentinel-2 STAC pipeline on pctasks. The orchestrator pod had high memory and CPU usage.

This PR adjust the workflow orchestrator to free up some memory (but releasing final tasks) and adds a setting to limit the number of concurrent tasks.

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration

Checklist:

Please delete options that are not relevant.

  • I have performed a self-review
  • Changelog has been updated
  • Documentation has been updated
  • Unit tests pass locally (./scripts/test)
  • Code is linted and styled (./scripts/format)

@TomAugspurger
Copy link
Contributor Author

CI failure:

  <Code>InvalidHeaderValue</Code>
  <Message>The API version 2024-05-04 is not supported by Azurite. Please upgrade Azurite to latest version and retry. If you are using Azurite in Visual Studio, please check you have installed latest Visual Studio patch. Azurite command line parameter "--skipApiVersionCheck" or Visual Studio Code configuration "Skip Api Version Check" can skip this error. 
RequestId:bf0d31a2-dc74-449c-aef9-a25a6fca48fb

Fixed in 2bf2bb3 by bumping the version of azurite we use.

@TomAugspurger TomAugspurger force-pushed the user/tom/fix/s2-pctasks-perf branch 3 times, most recently from 957f7b2 to 7bb3371 Compare May 23, 2024 20:08
The BlobStorage.file_exists method had a lot of overhead
compared to azure.storage.blob. Tracked down to the ContainerClientWrapper
context manager.

Cache that, and don't explicitly close stuff.

```
storage = BlobStorage("pctasksteststaging", "taskio")
%timeit storage.file_exists("status/0a3e7fd0-a9ae-43ea-9d1f-fc4e8cb30ef3/copy/0/copy/status-00b39.txt")
1.4 s ┬▒ 107 ms per loop (mean ┬▒ std. dev. of 7 runs, 1 loop each)

client = azure.storage.blob.ContainerClient.from_container_url("https://pctasksteststaging.blob.core.windows.net/taskio", credential=azure.identity.DefaultAzureCredential())

%timeit client.get_blob_client("status/0a3e7fd0-a9ae-43ea-9d1f-fc4e8cb30ef3/copy/0/copy/status-00b39.txt").exists()
239 ms ┬▒ 34.1 ms per loop (mean ┬▒ std. dev. of 7 runs, 1 loop each)

```
@TomAugspurger TomAugspurger force-pushed the user/tom/fix/s2-pctasks-perf branch from 7bb3371 to 136ee66 Compare May 23, 2024 21:36
Profiling with memray showed a lot of allocations (leaks? not sure)
under `process_output_if_available`. Some of this is inevitable: we are
reading data from the network. But some of it seems to be overhead from
creating very many ContainerClient / BlobClient objects.

We'll see if this helps. Uncertain.
@TomAugspurger
Copy link
Contributor Author

TomAugspurger commented May 28, 2024

I think that e93c2bf fixed another (hopefully last?) memory "leak" that was causing issues in the orchestrator pod. The fix there is relatively simple: share the BlobStorage objects we use for task IO / logs across calls to complete_job_partition_group. I'm not 100% sure of the mechanism, but after staring at the memray flamegraph for a while, I noticed that there's a lot of allocations related to BlobStorage underneath complete_job_parittion_group

image

Some of those allocations are necessary (we are reading data after all). But some of it seemed to be overhead from establishing connections and other HTTP infrastructure ("pipelines" / "transports" in Azure SDK docs) that could maybe be shared.

It might be premature, but this query against our Container Insights maybe shows that we've got it fixed?

let FilteredPerf = Perf
| where ObjectName == "K8SContainer" and CounterName == "memoryRssBytes"
| where InstanceName endswith "main";
FilteredPerf
| summarize min(CounterValue) by InstanceName
| join kind=inner (FilteredPerf) on InstanceName
| project InstanceName, MemoryGrowth=CounterValue - min_CounterValue, TimeGenerated
| summarize avg(MemoryGrowth) by InstanceName, bin(TimeGenerated, 1m)
| render timechart

image

Both of those were generated with pctasks workflow submit sentinel-2-sentinel-2-l2a-process-items -a since 2024-05-27T10:00:00Z. The first like (blue) is without e93c2bf while the second line (red) is with it.

@TomAugspurger
Copy link
Contributor Author

One of the sentinel-2 item creation tasks failed with an odd error:

pctasks runs get task-log 0bf5d0cd-bca7-40fb-9129-a484eae0ba37 process-chunk create-items -p 136

 ___   ___  _____            _
| _ \ / __||_   _| __ _  ___| |__ ___
|  _/| (__   | |  / _` |(_-/| / /(_-/
|_|   \___|  |_|  \__/_|/__/|_\_\/__/


<LOG for task create-items>
[INFO] 2024-05-28 15:17:07,062 -  === PCTasks ===
[INFO] 2024-05-28 15:17:07,062 -   == 0bf5d0cd-bca7-40fb-9129-a484eae0ba37/process-chunk/136/create-items
[INFO] 2024-05-28 15:17:07,062 -   -- PCTasks: Setting up task...
[INFO] 2024-05-28 15:17:07,529 -   -- PCTasks: Running task...
[INFO] 2024-05-28 15:17:07,529 - Creating items...
[WARNING] 2024-05-28 15:17:07,597 - Unable to initialize AzureLogHandler
[INFO] 2024-05-28 15:17:08,055 - (001.15%) [0.46s]  - blob://sentinel2l2a01/sentinel2-l2/33/X/VH/2024/05/27/S2B_MSIL2A_20240527T123659_N0510_R095_T33XVH_20240527T161012.SAFE/manifest.safe (1 of 87)
[INFO] 2024-05-28 15:17:08,055 - Created item
[INFO] 2024-05-28 15:17:08,524 - (002.30%) [0.37s]  - blob://sentinel2l2a01/sentinel2-l2/33/X/VF/2024/05/27/S2A_MSIL2A_20240527T114641_N0510_R023_T33XVF_20240527T171500.SAFE/manifest.safe (2 of 87)
[INFO] 2024-05-28 15:17:08,524 - Created item
[INFO] 2024-05-28 15:17:08,955 - (003.45%) [0.41s]  - blob://sentinel2l2a01/sentinel2-l2/33/X/VH/2024/05/27/S2A_MSIL2A_20240527T132721_N0510_R024_T33XVH_20240527T191613.SAFE/manifest.safe (3 of 87)
[INFO] 2024-05-28 15:17:08,955 - Created item
[INFO] 2024-05-28 15:17:09,506 - (004.60%) [0.53s]  - blob://sentinel2l2a01/sentinel2-l2/33/X/VL/2024/05/27/S2A_MSIL2A_20240527T132721_N0510_R024_T33XVL_20240527T191606.SAFE/manifest.safe (4 of 87)
[INFO] 2024-05-28 15:17:09,507 - Created item
[INFO] 2024-05-28 15:17:09,933 - (005.75%) [0.41s]  - blob://sentinel2l2a01/sentinel2-l2/33/X/UF/2024/05/27/S2B_MSIL2A_20240527T123659_N0510_R095_T33XUF_20240527T160455.SAFE/manifest.safe (5 of 87)
[INFO] 2024-05-28 15:17:09,934 - Created item
[INFO] 2024-05-28 15:17:10,366 - (006.90%) [0.41s]  - blob://sentinel2l2a01/sentinel2-l2/33/X/VF/2024/05/27/S2A_MSIL2A_20240527T114641_N0510_R023_T33XVF_20240527T183320.SAFE/manifest.safe (6 of 87)
[INFO] 2024-05-28 15:17:10,366 - Created item
[INFO] 2024-05-28 15:17:10,803 - (008.05%) [0.42s]  - blob://sentinel2l2a01/sentinel2-l2/33/X/VF/2024/05/27/S2B_MSIL2A_20240527T123659_N0510_R095_T33XVF_20240527T160826.SAFE/manifest.safe (7 of 87)
[INFO] 2024-05-28 15:17:10,803 - Created item
[INFO] 2024-05-28 15:17:11,341 - (009.20%) [0.51s]  - blob://sentinel2l2a01/sentinel2-l2/33/X/VL/2024/05/27/S2A_MSIL2A_20240527T150801_N0510_R025_T33XVL_20240527T201619.SAFE/manifest.safe (8 of 87)
[INFO] 2024-05-28 15:17:11,341 - Created item
[INFO] 2024-05-28 15:17:11,845 - (010.34%) [0.48s]  - blob://sentinel2l2a01/sentinel2-l2/33/X/VL/2024/05/27/S2B_MSIL2A_20240527T141739_N0510_R096_T33XVL_20240527T181921.SAFE/manifest.safe (9 of 87)
[INFO] 2024-05-28 15:17:11,845 - Created item
[INFO] 2024-05-28 15:17:12,235 - (011.49%) [0.37s]  - blob://sentinel2l2a01/sentinel2-l2/33/X/UE/2024/05/27/S2A_MSIL2A_20240527T114641_N0510_R023_T33XUE_20240527T183628.SAFE/manifest.safe (10 of 87)
[INFO] 2024-05-28 15:17:12,235 - Created item
[INFO] 2024-05-28 15:17:12,689 - (012.64%) [0.43s]  - blob://sentinel2l2a01/sentinel2-l2/33/X/UE/2024/05/27/S2B_MSIL2A_20240527T123659_N0510_R095_T33XUE_20240527T160454.SAFE/manifest.safe (11 of 87)
[INFO] 2024-05-28 15:17:12,689 - Created item
[INFO] 2024-05-28 15:17:13,181 - (013.79%) [0.47s]  - blob://sentinel2l2a01/sentinel2-l2/33/X/WA/2024/05/27/S2B_MSIL2A_20240527T105619_N0510_R094_T33XWA_20240527T151913.SAFE/manifest.safe (12 of 87)
[INFO] 2024-05-28 15:17:13,181 - Created item
[INFO] 2024-05-28 15:17:13,674 - (014.94%) [0.47s]  - blob://sentinel2l2a01/sentinel2-l2/33/X/WE/2024/05/27/S2A_MSIL2A_20240527T114641_N0510_R023_T33XWE_20240527T171459.SAFE/manifest.safe (13 of 87)
[INFO] 2024-05-28 15:17:13,674 - Created item
[INFO] 2024-05-28 15:17:14,270 - (016.09%) [0.57s]  - blob://sentinel2l2a01/sentinel2-l2/33/X/WE/2024/05/27/S2A_MSIL2A_20240527T114641_N0510_R023_T33XWE_20240527T183548.SAFE/manifest.safe (14 of 87)
[INFO] 2024-05-28 15:17:14,271 - Created item
[INFO] 2024-05-28 15:17:14,904 - (017.24%) [0.61s]  - blob://sentinel2l2a01/sentinel2-l2/33/X/VJ/2024/05/27/S2A_MSIL2A_20240527T132721_N0510_R024_T33XVJ_20240527T191643.SAFE/manifest.safe (15 of 87)
[INFO] 2024-05-28 15:17:14,904 - Created item
[INFO] 2024-05-28 15:17:15,616 - (018.39%) [0.69s]  - blob://sentinel2l2a01/sentinel2-l2/33/X/VJ/2024/05/27/S2B_MSIL2A_20240527T141739_N0510_R096_T33XVJ_20240527T181916.SAFE/manifest.safe (16 of 87)
[INFO] 2024-05-28 15:17:15,616 - Created item
[INFO] 2024-05-28 15:17:16,017 - (019.54%) [0.38s]  - blob://sentinel2l2a01/sentinel2-l2/33/X/VJ/2024/05/27/S2A_MSIL2A_20240527T150801_N0510_R025_T33XVJ_20240527T201603.SAFE/manifest.safe (17 of 87)
[INFO] 2024-05-28 15:17:16,017 - Created item
[INFO] 2024-05-28 15:17:16,529 - (020.69%) [0.49s]  - blob://sentinel2l2a01/sentinel2-l2/33/X/VJ/2024/05/27/S2B_MSIL2A_20240527T123659_N0510_R095_T33XVJ_20240527T160450.SAFE/manifest.safe (18 of 87)
[INFO] 2024-05-28 15:17:16,529 - Created item
[INFO] 2024-05-28 15:17:16,985 - (021.84%) [0.43s]  - blob://sentinel2l2a01/sentinel2-l2/33/X/WG/2024/05/27/S2B_MSIL2A_20240527T123659_N0510_R095_T33XWG_20240527T160933.SAFE/manifest.safe (19 of 87)
[INFO] 2024-05-28 15:17:16,985 - Created item
[INFO] 2024-05-28 15:17:17,393 - (022.99%) [0.39s]  - blob://sentinel2l2a01/sentinel2-l2/33/X/WG/2024/05/27/S2A_MSIL2A_20240527T114641_N0510_R023_T33XWG_20240527T171949.SAFE/manifest.safe (20 of 87)
[INFO] 2024-05-28 15:17:17,393 - Created item
[INFO] 2024-05-28 15:17:17,789 - (024.14%) [0.37s]  - blob://sentinel2l2a01/sentinel2-l2/33/X/VE/2024/05/27/S2A_MSIL2A_20240527T114641_N0510_R023_T33XVE_20240527T183608.SAFE/manifest.safe (21 of 87)
[INFO] 2024-05-28 15:17:17,790 - Created item
[INFO] 2024-05-28 15:17:18,294 - (025.29%) [0.48s]  - blob://sentinel2l2a01/sentinel2-l2/33/X/VE/2024/05/27/S2B_MSIL2A_20240527T123659_N0510_R095_T33XVE_20240527T160442.SAFE/manifest.safe (22 of 87)
[INFO] 2024-05-28 15:17:18,294 - Created item
[INFO] 2024-05-28 15:17:18,747 - (026.44%) [0.43s]  - blob://sentinel2l2a01/sentinel2-l2/33/X/WK/2024/05/27/S2A_MSIL2A_20240527T150801_N0510_R025_T33XWK_20240527T201635.SAFE/manifest.safe (23 of 87)
[INFO] 2024-05-28 15:17:18,747 - Created item
[INFO] 2024-05-28 15:17:19,446 - (027.59%) [0.67s]  - blob://sentinel2l2a01/sentinel2-l2/33/X/WK/2024/05/27/S2B_MSIL2A_20240527T141739_N0510_R096_T33XWK_20240527T181921.SAFE/manifest.safe (24 of 87)
[INFO] 2024-05-28 15:17:19,446 - Created item
[INFO] 2024-05-28 15:17:19,892 - (028.74%) [0.42s]  - blob://sentinel2l2a01/sentinel2-l2/33/X/WK/2024/05/27/S2A_MSIL2A_20240527T132721_N0510_R024_T33XWK_20240527T191617.SAFE/manifest.safe (25 of 87)
[INFO] 2024-05-28 15:17:19,892 - Created item
[INFO] 2024-05-28 15:17:20,377 - (029.89%) [0.46s]  - blob://sentinel2l2a01/sentinel2-l2/33/X/WK/2024/05/27/S2B_MSIL2A_20240527T123659_N0510_R095_T33XWK_20240527T160450.SAFE/manifest.safe (26 of 87)
[INFO] 2024-05-28 15:17:20,377 - Created item
[INFO] 2024-05-28 15:17:20,822 - (031.03%) [0.42s]  - blob://sentinel2l2a01/sentinel2-l2/33/X/WJ/2024/05/27/S2A_MSIL2A_20240527T150801_N0510_R025_T33XWJ_20240527T201555.SAFE/manifest.safe (27 of 87)
[INFO] 2024-05-28 15:17:20,823 - Created item
[INFO] 2024-05-28 15:17:21,350 - (032.18%) [0.50s]  - blob://sentinel2l2a01/sentinel2-l2/33/X/WJ/2024/05/27/S2B_MSIL2A_20240527T123659_N0510_R095_T33XWJ_20240527T160456.SAFE/manifest.safe (28 of 87)
[INFO] 2024-05-28 15:17:21,350 - Created item
[INFO] 2024-05-28 15:17:21,737 - (033.33%) [0.37s]  - blob://sentinel2l2a01/sentinel2-l2/33/X/WJ/2024/05/27/S2A_MSIL2A_20240527T132721_N0510_R024_T33XWJ_20240527T191644.SAFE/manifest.safe (29 of 87)
[INFO] 2024-05-28 15:17:21,738 - Created item
[INFO] 2024-05-28 15:17:22,189 - (034.48%) [0.43s]  - blob://sentinel2l2a01/sentinel2-l2/33/X/WJ/2024/05/27/S2B_MSIL2A_20240527T141739_N0510_R096_T33XWJ_20240527T171942.SAFE/manifest.safe (30 of 87)
[INFO] 2024-05-28 15:17:22,189 - Created item
[INFO] 2024-05-28 15:17:22,664 - (035.63%) [0.45s]  - blob://sentinel2l2a01/sentinel2-l2/33/X/VM/2024/05/27/S2B_MSIL2A_20240527T141739_N0510_R096_T33XVM_20240527T172917.SAFE/manifest.safe (31 of 87)
[INFO] 2024-05-28 15:17:22,664 - Created item
[INFO] 2024-05-28 15:17:23,283 - (036.78%) [0.60s]  - blob://sentinel2l2a01/sentinel2-l2/33/X/VM/2024/05/27/S2A_MSIL2A_20240527T150801_N0510_R025_T33XVM_20240527T201617.SAFE/manifest.safe (32 of 87)
[INFO] 2024-05-28 15:17:23,283 - Created item
[INFO] 2024-05-28 15:17:23,771 - (037.93%) [0.46s]  - blob://sentinel2l2a01/sentinel2-l2/33/X/WD/2024/05/27/S2A_MSIL2A_20240527T114641_N0510_R023_T33XWD_20240527T183538.SAFE/manifest.safe (33 of 87)
[INFO] 2024-05-28 15:17:23,771 - Created item
[INFO] 2024-05-28 15:17:24,019 -  === PCTasks: Task Failed! ===
[ERROR] 2024-05-28 15:17:24,019 - Failed to create item from blob://sentinel2l2a01/sentinel2-l2/33/X/VD/2024/05/27/S2A_MSIL2A_20240527T114641_N0510_R023_T33XVD_20240527T183526.SAFE/manifest.safe
Traceback (most recent call last):
  File "/opt/conda/lib/python3.11/site-packages/pctasks/dataset/items/task.py", line 211, in create_items
    result = self._create_item(asset_uri, storage_factory)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/batch/tasks/workitems/sentinel-2-process-chunk-0bf5d0c-fb-9129-a484eae0ba37-tasks_pool/job-1/create-items-136/wd/_code/sentinel2.py", line 95, in create_item
    item = with_backoff(
           ^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/pctasks/core/utils/backoff.py", line 149, in with_backoff
    return fn()
           ^^^^
  File "/mnt/batch/tasks/workitems/sentinel-2-process-chunk-0bf5d0c-fb-9129-a484eae0ba37-tasks_pool/job-1/create-items-136/wd/_code/sentinel2.py", line 87, in get_item
    item: pystac.Item = stac.create_item(
                        ^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/stactools/sentinel2/stac.py", line 55, in create_item
    product_metadata = ProductMetadata(safe_manifest.product_metadata_href,
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/stactools/sentinel2/product_metadata.py", line 47, in __init__
    self._root = XmlElement.from_file(href, read_href_modifier)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/stactools/core/io/xml.py", line 74, in from_file
    text = read_text(href, read_href_modifier)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/stactools/core/io/__init__.py", line 20, in read_text
    return StacIO.default().read_text(read_href_modifier(href))
                                      ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/pctasks/core/storage/base.py", line 320, in sign
    return self.get_authenticated_url(path)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/pctasks/core/storage/blob.py", line 376, in get_authenticated_url
    sas_token = self._generate_container_sas(
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/pctasks/core/storage/blob.py", line 353, in _generate_container_sas
    key = self._get_client()._account_client.get_user_delegation_key(
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/azure/core/tracing/decorator.py", line 78, in wrapper_use_tracer
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/azure/storage/blob/_blob_service_client.py", line 230, in get_user_delegation_key
    process_storage_error(error)
  File "/opt/conda/lib/python3.11/site-packages/azure/storage/blob/_shared/response_handlers.py", line 182, in process_storage_error
    exec("raise error from None")   # pylint: disable=exec-used # nosec
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<string>", line 1, in <module>
azure.core.exceptions.HttpResponseError: The value for one of the XML nodes is not in the correct format.
RequestId:15cd9f24-601e-0015-7812-b1fd1d000000
Time:2024-05-28T15:17:23.9986581Z
ErrorCode:InvalidXmlNodeValue
xmlnodename:2024-06-04T15:17:24Z
xmlnodevalue:2024-06-04T15:17:24Z
Content: <?xml version="1.0" encoding="utf-8"?><Error><Code>InvalidXmlNodeValue</Code><Message>The value for one of the XML nodes is not in the correct format.
RequestId:15cd9f24-601e-0015-7812-b1fd1d000000
Time:2024-05-28T15:17:23.9986581Z</Message><XmlNodeName>2024-06-04T15:17:24Z</XmlNodeName><XmlNodeValue>2024-06-04T15:17:24Z</XmlNodeValue></Error>

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.11/site-packages/pctasks/task/run.py", line 154, in run_task
    result = task.parse_and_run(task_data, task_context)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/pctasks/task/task.py", line 53, in parse_and_run
    output = self.run(args, context)
             ^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/pctasks/dataset/items/task.py", line 244, in run
    results = self.create_items(input, context)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/pctasks/dataset/items/task.py", line 213, in create_items
    raise CreateItemsError(
pctasks.dataset.items.task.CreateItemsError: Failed to create item from blob://sentinel2l2a01/sentinel2-l2/33/X/VD/2024/05/27/S2A_MSIL2A_20240527T114641_N0510_R023_T33XVD_20240527T183526.SAFE/manifest.safe
</LOG>

Trying to make sense of that. Something went wrong when we tried to generate that SAS token:

<XmlNodeName>2024-06-04T15:17:24Z</XmlNodeName>
<XmlNodeValue>2024-06-04T15:17:24Z</XmlNodeValue>

@TomAugspurger
Copy link
Contributor Author

I can produce a similar error with this diff, which exaggerates the duration of the SAS token by extending the end_datetime:

diff --git a/pctasks/core/pctasks/core/storage/blob.py b/pctasks/core/pctasks/core/storage/blob.py
index 059152e6..20c00f79 100644
--- a/pctasks/core/pctasks/core/storage/blob.py
+++ b/pctasks/core/pctasks/core/storage/blob.py
@@ -343,7 +343,7 @@ class BlobStorage(Storage):
         attached credentials) to generate a container-level SAS token.
         """
         start = Datetime.utcnow() - timedelta(hours=10)
-        expiry = Datetime.utcnow() + timedelta(hours=24 * 7)
+        expiry = Datetime.utcnow() + timedelta(hours=24 * 8)
         permission = ContainerSasPermissions(
             read=read,
             write=write,

that gives

In [1]: from pctasks.core.storage.blob import *
st
In [2]: storage = BlobStorage("sentinel2l2a01", "sentinel2-l2")

In [3]: _ = storage._generate_container_sas()
---------------------------------------------------------------------------
HttpResponseError                         Traceback (most recent call last)
Cell In[3], line 1
----> 1 _ = storage._generate_container_sas()

File ~/src/Microsoft/planetary-computer-tasks/pctasks/core/pctasks/core/storage/blob.py:353, in BlobStorage._generate_container_sas(self, read, list, write, delete)
    346 expiry = Datetime.utcnow() + timedelta(hours=24 * 8)
    347 permission = ContainerSasPermissions(
    348     read=read,
    349     write=write,
    350     delete=delete,
    351     list=list,
    352 )
--> 353 key = self._get_client()._account_client.get_user_delegation_key(
    354     key_start_time=start, key_expiry_time=expiry
    355 )
    356 sas_token = generate_container_sas(
    357     self.storage_account_name,
    358     self.container_name,
   (...)
    362     expiry=expiry,
    363 )
    364 return sas_token

File ~/src/Microsoft/planetary-computer-tasks/.direnv/python-3.10.14/lib/python3.10/site-packages/azure/core/tracing/decorator.py:78, in distributed_trace.<locals>.decorator.<locals>.wrapper_use_tracer(*args, **kwargs)
     76 span_impl_type = settings.tracing_implementation()
     77 if span_impl_type is None:
---> 78     return func(*args, **kwargs)
     80 # Merge span is parameter is set, but only if no explicit parent are passed
     81 if merge_span and not passed_in_parent:

File ~/src/Microsoft/planetary-computer-tasks/.direnv/python-3.10.14/lib/python3.10/site-packages/azure/storage/blob/_blob_service_client.py:225, in BlobServiceClient.get_user_delegation_key(self, key_start_time, key_expiry_time, **kwargs)
    221     user_delegation_key = self._client.service.get_user_delegation_key(key_info=key_info,
    222                                                                        timeout=timeout,
    223                                                                        **kwargs)  # type: ignore
    224 except HttpResponseError as error:
--> 225     process_storage_error(error)
    227 return parse_to_internal_user_delegation_key(user_delegation_key)

File ~/src/Microsoft/planetary-computer-tasks/.direnv/python-3.10.14/lib/python3.10/site-packages/azure/storage/blob/_shared/response_handlers.py:184, in process_storage_error(storage_error)
    181 error.args = (error.message,)
    182 try:
    183     # `from None` prevents us from double printing the exception (suppresses generated layer error context)
--> 184     exec("raise error from None")   # pylint: disable=exec-used # nosec
    185 except SyntaxError as exc:
    186     raise error from exc

File <string>:1

HttpResponseError: The value for one of the XML nodes is not in the correct format.
RequestId:ca881233-e01e-0024-5317-b11c0e000000
Time:2024-05-28T15:55:00.0351181Z
ErrorCode:InvalidXmlNodeValue
xmlnodename:2024-06-05T15:54:50Z
xmlnodevalue:2024-06-05T15:54:50Z
Content: <?xml version="1.0" encoding="utf-8"?><Error><Code>InvalidXmlNodeValue</Code><Message>The value for one of the XML nodes is not in the correct format.
RequestId:ca881233-e01e-0024-5317-b11c0e000000
Time:2024-05-28T15:55:00.0351181Z</Message><XmlNodeName>2024-06-05T15:54:50Z</XmlNodeName><XmlNodeValue>2024-06-05T15:54:50Z</XmlNodeValue></Error>

Under the assumption of "something weird about clocks" I'm going to adjust the duration of that token to be the start + 7 days - a few hours.

Tom Augspurger added 2 commits May 28, 2024 10:57
Hopefully avoids the invalid token we ran into on the last run.
@TomAugspurger
Copy link
Contributor Author

This is running now with a run ID c8db9b06-4157-4324-ab1b-3ac60bb71464.

If / when that finishes, I think this is ready. Might want to wait on #294, since I think both are removing the sp-related env vars from this dataset.yaml (I've been testing on top of #294).

@TomAugspurger
Copy link
Contributor Author

That finished. Planning to merge this tonight and start the process of rolling it and #294 out to production Thursday.

@TomAugspurger TomAugspurger merged commit d550656 into main May 28, 2024
2 checks passed
@TomAugspurger TomAugspurger deleted the user/tom/fix/s2-pctasks-perf branch May 28, 2024 20:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants