-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[dataset]: Added tracing to create_item #185
Conversation
This adds some tracing to monitor item creating time to the create_item task. After an item is created, we'll log a messaage to an application insights instance with some custom dimensions * `type`: `pctasks.create_item` for filtering * `collection_id`: The collection ID, for filtering * `asset_uri`: unique(ish) identifier for what assets were cataloged * `duration_seconds`: The time it took to create the item
@@ -34,6 +39,61 @@ def asset_chunk_id_to_ndjson_chunk_id(asset_chunk_id: str) -> str: | |||
return os.path.join(folder_name, "items.ndjson") | |||
|
|||
|
|||
def _init_azlogger() -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logger initialization is a bit ugly. I see that the pctasks
CLI has a setup_logging
method, but I haven't fully investigated where that's called. This shouldn't be called on import, since the AzureLogHandler
is call pretty slow (I think it makes a network request in the background).
if i is not None and asset_count is not None: | ||
# asset_chunk_info case | ||
logger.info( | ||
f"({((i+1)/asset_count)*100:06.2f}%) " | ||
f"[{end_time - start_time:.2f}s] " | ||
f" - {asset_uri} " | ||
f"({i+1} of {asset_count})" | ||
) | ||
else: | ||
# asset_uri case | ||
logger.info( | ||
f"Created items from {asset_uri} in " f"{end_time - start_time:.2f}s" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is copy-pasted from below. It preserves the logs to stderr that we're used to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll want to refactor/generalize this once we are looking to collect other task data, but this is a good first pass.
I'm pretty stumped by that CI failure at https://github.com/microsoft/planetary-computer-tasks/actions/runs/4799407304/jobs/8538988373. I can't reproduce it locally. > assert len(logger.handlers) == 1
E assert 0 == 1
E + where 0 = len([])
E + where [] = <Logger monitor.pctasks.dataset.items.task (INFO)>.handlers Debugging in b1abb2b. |
Failed with the same issue. I'm hoping that 96068d8 fixes it by cleaning up some state that potentially leaks between tests. We don't want to try to initialize the Azure logger in every single create item task. I've set it up to only try initializing once. I think an earlier test might have tried to initialize it, but failed since there was no |
This adds some tracing to monitor item creating time to the create_item task. After an item is created, we'll log a messaage to an application insights instance with some custom dimensions
type
:pctasks.create_item
for filteringcollection_id
: The collection ID, for filteringasset_uri
: unique(ish) identifier for what assets were catalogedduration_seconds
: The time it took to create the itemFor collections to use this, they just need to include
APPLICATIONINSIGHTS_CONNECTION_STRING
in theenvironment
section of the dataset.yaml. We might consider automatically loading that into the environment.