Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

argo cron should report when CronWorkflow names exceed the length limit #6781

Open
crenshaw-dev opened this issue Sep 21, 2021 · 14 comments
Open
Labels

Comments

@crenshaw-dev
Copy link
Member

Summary

The argo cron command faithfully lists CronWorkflows, even ones that are rendered invalid due to the name exceeding the 52-character limit.

The workflow controller does log the issue. But argo cron gives the impression that nothing is wrong.

Use Cases

My team would benefit from this when a dev (who is unfamiliar with the workflow controller and its logs) deploys a CronWorkflow and then uses argo cron to maintain that CronWorkflow.


Message from the maintainers:

Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.

@crenshaw-dev crenshaw-dev added the type/feature Feature request label Sep 21, 2021
@tyrken
Copy link

tyrken commented Jun 6, 2022

No error seen in Argo Web UI either, and it's only in the workflow controller logs when first created, not when you expect it to be scheduled.

@terrytangyuan
Copy link
Member

terrytangyuan commented Jun 6, 2022

Would anyone want to submit a PR to improve this experience?

@tyrken
Copy link

tyrken commented Jun 6, 2022

Sure, but as a non-golang, non-JS dev it might be a little clunky. Guessing I'd need to add a generic message field to the status like Workflow has, or maybe better a NameInvalid condition given you already seem to show that in the Web UI at least.

Know any existing code that might do this I can look at for inspiration?

@agilgur5
Copy link

agilgur5 commented Sep 1, 2023

Related: #10264 for the UI.

It would be nice if the Controller could statically reject a CronWorkflow if it can tell that its generated name would be too long. That would prevent issues like this from happening in the first place.

@bkanuka
Copy link

bkanuka commented Mar 7, 2024

My team just spent far too long diagnosing why a cron workflow was not running. Everything in the UI looked good, including saying that the workflow would run in X minutes and then it just wouldn't.

ArgoCD (which deploys the CronWorkflows) also showed that CronWorkflow was properly deployed.

I believe this is actually a pretty high priority issue because it (almost) silently prevents a CronWorkflow from running, while simultaneously indicating that all is okay and that it will run. I completely agree that the CronWorkflow manifest should have been rejected when submitted to the cluster.

@alexec @terrytangyuan could you take another look at this issue and adjust labels if you see fit? I'm happy to provide more details if it helps.

@agilgur5
Copy link

agilgur5 commented Mar 8, 2024

could you take another look at this issue and adjust labels if you see fit?

the labels are correct. we do not currently have a priority label for features, but can look at upvotes to determine popularity (this issue has 4 upvotes at the time of writing).

ArgoCD (which deploys the CronWorkflows) also showed that CronWorkflow was properly deployed.

Did you use Server-Side Apply for this? Because there is also argoproj/argo-cd#16817 when using client-side apply

I completely agree that the CronWorkflow manifest should have been rejected when submitted to the cluster.

It seems that Workflows does not seem to have a validating admission controller, which is something we could use to reject invalid resources. I mentioned this in #12693 (comment) as well, and the caveat still applies -- if a validating webhook is not installed or bypassed, the case of invalid resources still has to be handled in some way, and so this issue would still be relevant for that case.

@bkanuka
Copy link

bkanuka commented Mar 8, 2024

I guess I would argue that a silent failure that stops CronWorkflows from running (while indicating all is good), is actually a bug rather than an enhancement.

My team deployed a CronWorkflow and expected it to run and it didn't - plus because there was no Failure message from Argo, it didn't trigger an alert.

We do have the validating webhook and Server-Side apply.

@Joibel
Copy link
Member

Joibel commented Mar 8, 2024

Did the metric argo_workflows_error_count with the cause CronWorkflowSubmissionError go up?

I'm not saying this is sufficient indication, but it is the indication I would expect.

@Joibel
Copy link
Member

Joibel commented Mar 8, 2024

I am unaware of a validating admission controller option on argo workflows. I don't think this is currently possible, I'm intrigued by what you have enabled there.

@agilgur5
Copy link

agilgur5 commented Mar 8, 2024

I guess I would argue that a silent failure that stops CronWorkflows from running (while indicating all is good), is actually a bug rather than an enhancement.

The Controller does log that it is invalid, per the issue description. And that is the only thing it can do at this time, as it does not handle admission.

To have other tooling like the CLI -- what this issue requests -- show that it is invalid would be a feature. They are not capable of that right now and so that would be adding new behavior.
A bug would be existing, incorrect behavior, e.g. showing it is invalid when it is actually valid. Right now it has no indication whatsoever, the CLI does not validate while listing.

@jessesuen
Copy link
Member

Possible improvements:

  1. display message in status that the cronworkflow is invalid due to its length
  2. CLI validation upon creation attempt and rejecting it if length is to great

@tooptoop4
Copy link
Contributor

i know this won't help creation via kubectl but do u think a cli flag that runs argo lint before accepting the submission could prevent this being created via argo cli?

@carlosrodfern
Copy link
Contributor

carlosrodfern commented Nov 21, 2024

@jessesuen

I wonder if CRD validation rules would help: https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definitions/#validation-rules

Here is an example:

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: foos.bar.test.io
spec:
  group: bar.test.io
  versions:
    - name: v1alpha1
      served: true
      storage: true
      schema:
        openAPIV3Schema:
          type: object
          x-kubernetes-validations:
          - rule: "size(self.metadata.name) <= 52"
            message: "name must be less than or equal 52 characters."
          properties:
            metadata:
              type: object
            spec:
              type: object
              properties:
                someProp:
                  type: string
  names:
    kind: Foo
    plural: foos
  scope: Namespaced
apiVersion: bar.test.io/v1alpha1
kind: Foo
metadata:
  name: this-is-a-very-very-very-very-very-very-very-long-string
spec:
  someProp: test
$ kubectl apply -f test-for-crd.yaml

The Foo "this-is-a-very-very-very-very-very-very-very-long-string" is invalid: <nil>: Invalid value: "object": name must be less than 52 characters.

That feature is only stable since kubernetes 1.29, though.

carlosrodfern added a commit to carlosrodfern/argo-workflows that referenced this issue Nov 22, 2024
Fixes: argoproj#6781

Signed-off-by: Carlos Rodriguez-Fernandez <[email protected]>
@carlosrodfern
Copy link
Contributor

@jessesuen

I wonder if CRD validation rules would help: https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definitions/#validation-rules

Here is an example:

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: foos.bar.test.io
spec:
  group: bar.test.io
  versions:
    - name: v1alpha1
      served: true
      storage: true
      schema:
        openAPIV3Schema:
          type: object
          x-kubernetes-validations:
          - rule: "size(self.metadata.name) <= 52"
            message: "name must be less than or equal 52 characters."
          properties:
            metadata:
              type: object
            spec:
              type: object
              properties:
                someProp:
                  type: string
  names:
    kind: Foo
    plural: foos
  scope: Namespaced
apiVersion: bar.test.io/v1alpha1
kind: Foo
metadata:
  name: this-is-a-very-very-very-very-very-very-very-long-string
spec:
  someProp: test
$ kubectl apply -f test-for-crd.yaml

The Foo "this-is-a-very-very-very-very-very-very-very-long-string" is invalid: <nil>: Invalid value: "object": name must be less than 52 characters.

That feature is only stable since kubernetes 1.29, though.

I actually prefer the programmatic solution because the ValidateCronWorkflow(..) function has some valuable checks beyond the character count.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants