[dataset]: Added tracing to create_item #185

TomAugspurger · 2023-04-25T14:46:09Z

This adds some tracing to monitor item creating time to the create_item task. After an item is created, we'll log a messaage to an application insights instance with some custom dimensions

type: pctasks.create_item for filtering
collection_id: The collection ID, for filtering
asset_uri: unique(ish) identifier for what assets were cataloged
duration_seconds: The time it took to create the item

For collections to use this, they just need to include APPLICATIONINSIGHTS_CONNECTION_STRING in the environment section of the dataset.yaml. We might consider automatically loading that into the environment.

This adds some tracing to monitor item creating time to the create_item task. After an item is created, we'll log a messaage to an application insights instance with some custom dimensions * `type`: `pctasks.create_item` for filtering * `collection_id`: The collection ID, for filtering * `asset_uri`: unique(ish) identifier for what assets were cataloged * `duration_seconds`: The time it took to create the item

TomAugspurger · 2023-04-25T15:37:28Z

pctasks/dataset/pctasks/dataset/items/task.py

@@ -34,6 +39,61 @@ def asset_chunk_id_to_ndjson_chunk_id(asset_chunk_id: str) -> str:
    return os.path.join(folder_name, "items.ndjson")


+def _init_azlogger() -> None:


This logger initialization is a bit ugly. I see that the pctasks CLI has a setup_logging method, but I haven't fully investigated where that's called. This shouldn't be called on import, since the AzureLogHandler is call pretty slow (I think it makes a network request in the background).

TomAugspurger · 2023-04-25T15:37:56Z

pctasks/dataset/pctasks/dataset/items/task.py

+    if i is not None and asset_count is not None:
+        # asset_chunk_info case
+        logger.info(
+            f"({((i+1)/asset_count)*100:06.2f}%) "
+            f"[{end_time - start_time:.2f}s] "
+            f" - {asset_uri} "
+            f"({i+1} of {asset_count})"
+        )
+    else:
+        # asset_uri case
+        logger.info(
+            f"Created items from {asset_uri} in " f"{end_time - start_time:.2f}s"
+        )


This is copy-pasted from below. It preserves the logs to stderr that we're used to.

lossyrob

We'll want to refactor/generalize this once we are looking to collect other task data, but this is a good first pass.

TomAugspurger · 2023-04-25T16:17:29Z

I'm pretty stumped by that CI failure at https://github.com/microsoft/planetary-computer-tasks/actions/runs/4799407304/jobs/8538988373. I can't reproduce it locally.

>       assert len(logger.handlers) == 1
E       assert 0 == 1
E        +  where 0 = len([])
E        +    where [] = <Logger monitor.pctasks.dataset.items.task (INFO)>.handlers

Debugging in b1abb2b.

TomAugspurger · 2023-04-25T17:02:24Z

Failed with the same issue. I'm hoping that 96068d8 fixes it by cleaning up some state that potentially leaks between tests. We don't want to try to initialize the Azure logger in every single create item task. I've set it up to only try initializing once.

I think an earlier test might have tried to initialize it, but failed since there was no APPLICATIONINSIGHTS_CONNECTION_STRING set. And then our test didn't even try to initialize it again.

Tom Augspurger added 3 commits April 25, 2023 09:43

lint

7f24ad6

fixups

0fb414e

TomAugspurger commented Apr 25, 2023

View reviewed changes

lossyrob approved these changes Apr 25, 2023

View reviewed changes

debug

b1abb2b

fixup state leaking between tests

96068d8

Tom Augspurger added 2 commits April 25, 2023 12:48

fixup

acdf913

lint again

8753ad4

TomAugspurger merged commit 24bb839 into main Apr 25, 2023

TomAugspurger deleted the tom/feature/monitor branch April 25, 2023 19:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[dataset]: Added tracing to create_item #185

[dataset]: Added tracing to create_item #185

TomAugspurger commented Apr 25, 2023

TomAugspurger Apr 25, 2023

TomAugspurger Apr 25, 2023

lossyrob left a comment

TomAugspurger commented Apr 25, 2023

TomAugspurger commented Apr 25, 2023

		@@ -34,6 +39,61 @@ def asset_chunk_id_to_ndjson_chunk_id(asset_chunk_id: str) -> str:
		return os.path.join(folder_name, "items.ndjson")


		def _init_azlogger() -> None:

[dataset]: Added tracing to create_item #185

[dataset]: Added tracing to create_item #185

Conversation

TomAugspurger commented Apr 25, 2023

TomAugspurger Apr 25, 2023

Choose a reason for hiding this comment

TomAugspurger Apr 25, 2023

Choose a reason for hiding this comment

lossyrob left a comment

Choose a reason for hiding this comment

TomAugspurger commented Apr 25, 2023

TomAugspurger commented Apr 25, 2023