Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Emit structured log events to ApplicationInsights #308

Merged
merged 13 commits into from
Jul 11, 2024

Conversation

TomAugspurger
Copy link
Contributor

@TomAugspurger TomAugspurger commented Jul 8, 2024

Description

This adds some structured logging to Application Insights for various events ("thing" Created / Finished for Workflow Run, Job, Job Partition, and Task).

Fixes # (issue)

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Deployed to staging. See #308 (comment) for some sample results.

@TomAugspurger
Copy link
Contributor Author

Test failure was

Job test-job failed. Status: failed
 -- Job partitions thread failed with update_job_partition_run_status() missing 4 required positional arguments: 'workflow_id', 'dataset_id', 'job_id', and 'partition_id'
--------------------------- Captured stdout teardown ---------------------------

I'm surprised that mypy didn't catch that.

@TomAugspurger TomAugspurger force-pushed the user/tom/feature/minimal-telemetry branch from 4f3a47c to 8fa0d3c Compare July 9, 2024 12:57
This adds some structured logging to Application Insights for various
events ("thing" Created / Finished for Workflow Run, Job, Job Partition,
and Task).
@TomAugspurger
Copy link
Contributor Author

TomAugspurger commented Jul 9, 2024

I've deployed this to staging. Here are some sample logs:

WorkflowRun

timestamp severityLevel type recordLevel datasetId level runId
2024-07-10T18:53:35.21652Z 1 WorkflowRunCreated WorkflowRun test-bug INFO 3392a265-43a3-40bb-a493-d22afe7c427d
2024-07-10T18:59:10.174577Z 2 WorkflowRunFinished WorkflowRun test-bug WARNING 3392a265-43a3-40bb-a493-d22afe7c427d

Job

timestamp severityLevel type recordLevel datasetId level runId jobId partitionId status
2024-07-10T18:53:35.276653Z 1 JobCreated Job test-bug INFO 3392a265-43a3-40bb-a493-d22afe7c427d a null running
2024-07-10T18:59:10.005458Z 2 JobFinished Job test-bug WARNING 3392a265-43a3-40bb-a493-d22afe7c427d a null failed
2024-07-10T19:49:40.634756Z 1 JobCreated Job test-bug INFO 6cb80e84-18d7-4f6e-86e4-fc7415823eaa a null running

Task

timestamp severityLevel type recordLevel datasetId level runId jobId partitionId taskId status errors
2024-07-10T19:49:43.284043Z 1 TaskCreated Task test-bug INFO 6cb80e84-18d7-4f6e-86e4-fc7415823eaa a 0 b submitted null
2024-07-10T19:54:15.014362Z 2 TaskFinished Task test-bug WARNING 6cb80e84-18d7-4f6e-86e4-fc7415823eaa a 0 b failed The task exited with an exit code representing a failure

Unfortunately they errors field isn't too helpful. This was for a workflow that deliberately failed with a ZeroDivisionError. I think "The task exited with an exit code representing a failure" is all that's available from the Batch API.


I'm not sure why JobPartition isn't showing up, but IMO that's much less valuable than WorkflowRun and Tasks.

Tom Augspurger added 2 commits July 9, 2024 09:12
(cherry picked from commit fd941cd31eb17295e0209d72066d23349d702d1f)
@TomAugspurger TomAugspurger changed the title Emit structloged log events to ApplicationInsights Emit structured log events to ApplicationInsights Jul 10, 2024
@TomAugspurger TomAugspurger force-pushed the user/tom/feature/minimal-telemetry branch from 8fa0d3c to a73a1f4 Compare July 10, 2024 15:56
Tom Augspurger added 4 commits July 10, 2024 12:05
* added to values.yaml
* fixed message / level for task
* ensure workflow run creation is logged
@TomAugspurger TomAugspurger merged commit a1d85b4 into main Jul 11, 2024
5 checks passed
@TomAugspurger TomAugspurger deleted the user/tom/feature/minimal-telemetry branch July 11, 2024 02:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant