Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: workflow.status is now set properly in metrics. Fixes #8895 #8939

Merged
merged 5 commits into from
Jun 23, 2022

Conversation

dpadhiar
Copy link
Member

@dpadhiar dpadhiar commented Jun 9, 2022

Signed-off-by: Dillen Padhiar [email protected]

Fixes #8895

In Argo workflows v3.3.6, it was found that running an example workflow for custom metrics at the workflow & template levels would publish {{workflow.status}} as an empty value instead of showing the correct status. This is because the global runtime parameters are not accurately reflected by the stored workflow spec.

Example workflow (original issue):

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: dag-task-
spec:
  entrypoint: dag-task
  metrics: # Custom metric workflow level
    prometheus:
      - name: playground_workflow_duration
        help: "Duration gauge by workflow level"
        labels:
          - key: "playground_id_workflow"
            value: "test"
          - key: status
            value: "{{workflow.status}}"
        gauge:
          realtime: false
          value: "{{workflow.duration}}"
      - name: playground_workflow_result_counter
        help: "Count of workflow execution by result status  - workflow level"
        labels:
          - key: "playground_id_workflow_counter"
            value: "test"
          - key: status
            value: "{{workflow.status}}"
        counter:
          value: "1"
  templates:
  - name: dag-task
    dag:
      tasks:
      - name: TEST-ONE
        template: echo
        arguments:
          parameters: 
            - name: message
              value: "console output-->TEST-{{item.command}}"
            - name: tag
              value: "{{item.tag}}"
        withItems:
          - { tag: TEST-ONE-A, command: ONE-A }
          - { tag: TEST-ONE-B, command: ONE-B }

      - name: TEST-TWO
        template: echo
        arguments:
          parameters: 
            - name: message
              value: "console output-->TEST-{{item.command}}"
            - name: tag
              value: "{{item.tag}}"
        withItems:
          - { tag: TEST-TWO-A, command: TWO-A }
          - { tag: TEST-TWO-B, command: TWO-B }

  - name: echo
    inputs:
      parameters:
      - name: message
      - name: tag
    metrics: # Custom metric template level
      prometheus:
        - name: playground_workflow_duration_task_seconds
          help: "Duration gauge by task name in seconds - task level"
          labels:
            - key: "playground_task_name"
              value: "{{inputs.parameters.tag}}"
            - key: status
              value: "{{status}}"
          gauge:
            realtime: false
            value: "{{duration}}"
        - name: playground_workflow_result_task_counter
          help: "Count of task execution by result status  - task level"
          labels:
            - key: "playground_task_name_counter"
              value: "{{inputs.parameters.tag}}"
            - key: status
              value: "{{status}}"
          counter:
            value: "1"
    container:
      image: alpine:3.7
      command: [echo, "{{inputs.parameters.message}}"]

Completed workflow spec:

spec:
  activeDeadlineSeconds: 300
  arguments: {}
  entrypoint: dag-task
  metrics:
    prometheus:
    - gauge:
        realtime: false
        value: '{{workflow.duration}}'
      help: Duration gauge by workflow level
      labels:
      - key: playground_id_workflow
        value: test
      - key: status
        value: Succeeded
      name: playground_workflow_duration
    - counter:
        value: "1"
      help: Count of workflow execution by result status  - workflow level
      labels:
      - key: playground_id_workflow_counter
        value: test
      - key: status
        value: Succeeded
      name: playground_workflow_result_counter

@dpadhiar dpadhiar marked this pull request as ready for review June 9, 2022 21:23
Copy link
Member

@tczhao tczhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps we should add something to the workflow/controller/operator_metrics_test.go as well

@sarabala1979
Copy link
Member

@dpadhiar can you add test?

@dpadhiar
Copy link
Member Author

@dpadhiar can you add test?

Yes, will do. @tczhao Will add in metrics test.

Signed-off-by: Dillen Padhiar <[email protected]>
@dpadhiar dpadhiar requested a review from tczhao June 10, 2022 21:36
return nil
}

func (woc *wfOperationCtx) setGlobalRuntimeParameters() {
woc.globalParams[common.GlobalVarWorkflowStatus] = string(woc.wf.Status.Phase)
}
Copy link
Member

@tczhao tczhao Jun 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the duration can be part of this function as well

// Update workflow duration variable
if woc.wf.Status.StartedAt.IsZero() {
woc.globalParams[common.GlobalVarWorkflowDuration] = fmt.Sprintf("%f", time.Duration(0).Seconds())
} else {
woc.globalParams[common.GlobalVarWorkflowDuration] = fmt.Sprintf("%f", time.Since(woc.wf.Status.StartedAt.Time).Seconds())
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add.

- key: playground_id_workflow_counter
value: test
- key: status
value: Succeeded
Copy link
Member

@tczhao tczhao Jun 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be "{{workflow.status}}"?

startedAt: "2022-06-10T20:29:42Z"
`

func TestRuntimeMetrics(t *testing.T) {
Copy link
Member

@tczhao tczhao Jun 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is testing the orignal bug.

You can write wf.yaml without the status
do operate twice with a pod phase change in between
and check metric before and after the 2nd operate to make sure it is changing

	makePodsPhase(ctx, woc, apiv1.PodSucceeded)
	woc = newWorkflowOperationCtx(woc.wf, controller)
	woc.operate(ctx)

@dpadhiar
Copy link
Member Author

@tczhao When I am testing with {{workflow.status}} in the workflow, this is what error I am seeing with this test:

Error Trace:	operator_metrics_test.go:920
Error:      	Expected value not to be nil.
Test:       	TestRuntimeMetrics

The line it is failing on is metric := controller.metrics.GetCustomMetric(metricDesc) as it is returning nil metrics
However printing metricDesc shows it is as this string: playground_workflow_new{playground_id_workflow_counter=test,status={{workflow.status}},}

@dpadhiar dpadhiar requested a review from sarabala1979 June 14, 2022 23:44
@tczhao
Copy link
Member

tczhao commented Jun 15, 2022

Curious, I ran the failed e2e test locally and it worked fine

@dpadhiar
Copy link
Member Author

Curious, I ran the failed e2e test locally and it worked fine

If I run the test again on Github, it may pass again. I've had multiple PRs where an e2e test that was unedited failed for taking too long at times.

Signed-off-by: Dillen Padhiar <[email protected]>
@sarabala1979 sarabala1979 self-assigned this Jun 22, 2022
@sarabala1979 sarabala1979 merged commit 89f3433 into argoproj:master Jun 23, 2022
@sarabala1979 sarabala1979 mentioned this pull request Jun 23, 2022
51 tasks
@rvignesh89
Copy link
Contributor

Is there any plan to release this in the upcoming release? I noticed this issue since v3.3.1 and have been waiting for the fix to be merged. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/metrics area/templating Templating with `{{...}}`
Projects
None yet
Development

Successfully merging this pull request may close these issues.

{{workflow.status}} is not working in metrics
5 participants