Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dataset]: Added tracing to create_item #185

Merged
merged 7 commits into from
Apr 25, 2023
Merged

Conversation

TomAugspurger
Copy link
Contributor

This adds some tracing to monitor item creating time to the create_item task. After an item is created, we'll log a messaage to an application insights instance with some custom dimensions

  • type: pctasks.create_item for filtering
  • collection_id: The collection ID, for filtering
  • asset_uri: unique(ish) identifier for what assets were cataloged
  • duration_seconds: The time it took to create the item

For collections to use this, they just need to include APPLICATIONINSIGHTS_CONNECTION_STRING in the environment section of the dataset.yaml. We might consider automatically loading that into the environment.

Tom Augspurger added 3 commits April 25, 2023 09:43
This adds some tracing to monitor item creating time to the create_item
task. After an item is created, we'll log a messaage to an application
insights instance with some custom dimensions

* `type`: `pctasks.create_item` for filtering
* `collection_id`: The collection ID, for filtering
* `asset_uri`: unique(ish) identifier for what assets were cataloged
* `duration_seconds`: The time it took to create the item
@@ -34,6 +39,61 @@ def asset_chunk_id_to_ndjson_chunk_id(asset_chunk_id: str) -> str:
return os.path.join(folder_name, "items.ndjson")


def _init_azlogger() -> None:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logger initialization is a bit ugly. I see that the pctasks CLI has a setup_logging method, but I haven't fully investigated where that's called. This shouldn't be called on import, since the AzureLogHandler is call pretty slow (I think it makes a network request in the background).

Comment on lines +72 to +84
if i is not None and asset_count is not None:
# asset_chunk_info case
logger.info(
f"({((i+1)/asset_count)*100:06.2f}%) "
f"[{end_time - start_time:.2f}s] "
f" - {asset_uri} "
f"({i+1} of {asset_count})"
)
else:
# asset_uri case
logger.info(
f"Created items from {asset_uri} in " f"{end_time - start_time:.2f}s"
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is copy-pasted from below. It preserves the logs to stderr that we're used to.

Copy link
Contributor

@lossyrob lossyrob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll want to refactor/generalize this once we are looking to collect other task data, but this is a good first pass.

@TomAugspurger
Copy link
Contributor Author

I'm pretty stumped by that CI failure at https://github.com/microsoft/planetary-computer-tasks/actions/runs/4799407304/jobs/8538988373. I can't reproduce it locally.

>       assert len(logger.handlers) == 1
E       assert 0 == 1
E        +  where 0 = len([])
E        +    where [] = <Logger monitor.pctasks.dataset.items.task (INFO)>.handlers

Debugging in b1abb2b.

@TomAugspurger
Copy link
Contributor Author

Failed with the same issue. I'm hoping that 96068d8 fixes it by cleaning up some state that potentially leaks between tests. We don't want to try to initialize the Azure logger in every single create item task. I've set it up to only try initializing once.

I think an earlier test might have tried to initialize it, but failed since there was no APPLICATIONINSIGHTS_CONNECTION_STRING set. And then our test didn't even try to initialize it again.

Tom Augspurger added 2 commits April 25, 2023 12:48
@TomAugspurger TomAugspurger merged commit 24bb839 into main Apr 25, 2023
@TomAugspurger TomAugspurger deleted the tom/feature/monitor branch April 25, 2023 19:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants