-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MetricsError: unable to compute
workflow.status in real-time metrics block with 'when' clause
#12693
Comments
Looks like a follow-up to #9424 (comment) |
MetricsError: unable to compute
workflow.status in metrics block with 'when' clause
MetricsError: unable to compute
workflow.status in metrics block with 'when' clauseMetricsError: unable to compute
workflow.status in metrics block with 'when' clause
Hmm this seems to have managed to regress twice despite the tests in #8939 🤔 Those might need adjusting. The code from that PR still exists as-is, so something else is causing it to regress |
This is still relevant in v3.5.5. |
When a metric is processed, the controller substitutes the variables in If you try an example with However, in your example, you don't need the |
I'm removing the Right now variables used in |
As part of implementing #12589 these need to meet the requirements imposed by opentelemetry as well. I'll try and remember to comment properly when I am not on mobile. |
Great root cause analysis here! ❤️
I didn't even notice the difference between #9424 and this that it's real-time, good catch!
Honestly, I haven't worked with them enough to have a solid opinion. The time of processing is relatively well defined in the docs. Alan maybe has put some more thought into this?
Since this behavior is well defined right now, at the very least I would think that it would throw a validation error on submission, since Putting it a different way, if there were a validation error, this wouldn't have popped up as a bug. Instead it may have popped up as, for instance, a feature to add Some other scattered, preliminary thoughts below:
Removing the |
Oh, I just realized the expression here is using the old |
MetricsError: unable to compute
workflow.status in metrics block with 'when' clauseMetricsError: unable to compute
workflow.status in real-time metrics block with 'when' clause
Realtime metrics with a The
The labels and their values can change, this is fine. Increasing the number of things you can do in real time metrics is really a question of looking at how variable scopes are handled. Getting other kinds of variables from the workflow (imagine that the controller has just restarted) is problematic. Side note: |
I'm relatively sure this is because most installation paths (including the Helm chart) only use the minimal CRDs and not the full CRDs. And I believe the minimal ones are used because the full ones are too big (see also #11266). If we don't already have a validating admission webhook, that's something we could use for Argo to deny creation of invalid resources. That can have any custom logic in it, but it is latency sensitive. Notably though, webhook configs may not be installed by users, and so the Controller and Server still have to handle invalid resources. But yes definitely a separate thread 😅 |
Feel free to tag me whenever you see fit |
should we close this? |
As I wrote above, I would think this would throw a validation error as it uses an unsupported variable, rather than attempt to process it and then fail with a confusing error. |
I think the hardest part is to know the list of unsupported variables. The user might want to use |
Having lists of supported and unsupported variables might actually be useful for validation purposes in general; very declarative & deterministic. But I was thinking the inverse -- if a variable is not in the expression scope, it can throw a validation error. This issue sounds like the expression scope is broader than it should be for a realtime metric, no? |
Pre-requisites
:latest
What happened/what did you expect to happen?
Using the
workflow.status
variable in the Prometheus metrics definition results in an error:Version
v3.5.4
Paste a small workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.
Logs from the workflow controller
The text was updated successfully, but these errors were encountered: