Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade Argo to v2.11+ #4553

Closed
Ark-kun opened this issue Sep 27, 2020 · 33 comments
Closed

Upgrade Argo to v2.11+ #4553

Ark-kun opened this issue Sep 27, 2020 · 33 comments
Assignees
Labels
area/backend help wanted The community is welcome to contribute.

Comments

@Ark-kun
Copy link
Contributor

Ark-kun commented Sep 27, 2020

Issues fixed:

Improvements:

@xinbinhuang
Copy link

xinbinhuang commented Sep 28, 2020

@Ark-kun This is currently blocking my current work. Though I can work around it somehow, I would like to get this addressed quickly. Do you have a timeline for this to be fixed or if I can help with the upgrades?

@Bobgy Bobgy added the help wanted The community is welcome to contribute. label Sep 29, 2020
@Bobgy
Copy link
Contributor

Bobgy commented Sep 29, 2020

Will the upgrade need upgrading argo client? If not, I think you can try upgrading the argo installation in your own cluster and see if it fixes the problem.

@Bobgy
Copy link
Contributor

Bobgy commented Sep 29, 2020

Upgrading the client is a lot harder, there's some go module dependencies issue to fix. There's an ongoing PR working on this, you may help there: #4498.

@xinbinhuang
Copy link

Will the upgrade need upgrading argo client? If not, I think you can try upgrading the argo installation in your own cluster and see if it fixes the problem.

It seems like the KFP images only include additional licenses on top of the original Argo images.
Are there any other changes that I need to be aware of?

If there are only additional lincenses, I can switch the image to the official argo images to see if it works.

@Bobgy
Copy link
Contributor

Bobgy commented Sep 30, 2020

Yes, it's just additional licenses. You can just switch to official argo images.

@xinbinhuang
Copy link

@Bobgy I tried to update the Argo image to v2.11.1, but now I get this error from the workflow-controller repeatedly, and new pipeline runs seem to get into an infinite loop with unknown status. Any ideas?

time="2020-10-02T16:27:22Z" level=info msg="config map" name=workflow-controller-configmap
time="2020-10-02T16:27:22Z" level=info msg="Configuration:\nartifactRepository:\n  archiveLogs: true\n  s3:\n    accessKeySecret:\n      key: accesskey\n      name: mlpipeline-minio-artifact\n    bucket: parala-kfp-artifacts\n    endpoint: minio-service.kubeflow:9000\n    insecure: true\n    keyPrefix: artifacts\n    secretKeySecret:\n      key: secretkey\n      name: mlpipeline-minio-artifact\nexecutorImage: gcr.io/ml-pipeline/argoexec:v2.7.5-license-compliance\nmetricsConfig: {}\nnamespace: kubeflow\nnodeEvents: {}\npodSpecLogStrategy: {}\nsso:\n  clientId:\n    key: \"\"\n  clientSecret:\n    key: \"\"\n  issuer: \"\"\n  redirectUrl: \"\"\ntelemetryConfig: {}\n"
time="2020-10-02T16:27:22Z" level=info msg="Persistence configuration disabled"
time="2020-10-02T16:27:22Z" level=info msg="Starting Workflow Controller" version=v2.11.1
time="2020-10-02T16:27:22Z" level=info msg="Workers: workflow: 32, pod: 32"
time="2020-10-02T16:27:22Z" level=info msg="Performing periodic GC every 5m0s"
time="2020-10-02T16:27:22Z" level=info msg="Persistence disabled - so archived workflow GC disabled - you must restart the controller if you enable this"
time="2020-10-02T16:27:22Z" level=info msg="Starting workflow TTL controller (resync 20m0s)"
time="2020-10-02T16:27:22Z" level=info msg="Starting prometheus metrics server at localhost:9090/metrics"
time="2020-10-02T16:27:22Z" level=info msg="Starting CronWorkflow controller"
E1002 16:27:22.063415       1 reflector.go:153] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:105: Failed to list *unstructured.Unstructured: workflowtemplates.argoproj.io is forbidden: User "system:serviceaccount:kubeflow:argo" cannot list resource "workflowtemplates" in API group "argoproj.io" at the cluster scope
time="2020-10-02T16:27:22Z" level=info msg="Started workflow TTL worker"
E1002 16:27:23.068520       1 reflector.go:153] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:105: Failed to list *unstructured.Unstructured: workflowtemplates.argoproj.io is forbidden: User "system:serviceaccount:kubeflow:argo" cannot list resource "workflowtemplates" in API group "argoproj.io" at the cluster scope
E1002 16:27:24.073525       1 reflector.go:153] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:105: Failed to list *unstructured.Unstructured: workflowtemplates.argoproj.io is forbidden: User "system:serviceaccount:kubeflow:argo" cannot list resource "workflowtemplates" in API group "argoproj.io" at the cluster scope
E1002 16:27:25.078845       1 reflector.go:153] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:105: Failed to list *unstructured.Unstructured: workflowtemplates.argoproj.io is forbidden: User "system:serviceaccount:kubeflow:argo" cannot list resource "workflowtemplates" in API group "argoproj.io" at the cluster scope

@xinbinhuang
Copy link

@Bobgy I tried to update the Argo image to v2.11.1, but now I get this error from the workflow-controller repeatedly, and new pipeline runs seem to get into an infinite loop with unknown status. Any ideas?

time="2020-10-02T16:27:22Z" level=info msg="config map" name=workflow-controller-configmap
time="2020-10-02T16:27:22Z" level=info msg="Configuration:\nartifactRepository:\n  archiveLogs: true\n  s3:\n    accessKeySecret:\n      key: accesskey\n      name: mlpipeline-minio-artifact\n    bucket: parala-kfp-artifacts\n    endpoint: minio-service.kubeflow:9000\n    insecure: true\n    keyPrefix: artifacts\n    secretKeySecret:\n      key: secretkey\n      name: mlpipeline-minio-artifact\nexecutorImage: gcr.io/ml-pipeline/argoexec:v2.7.5-license-compliance\nmetricsConfig: {}\nnamespace: kubeflow\nnodeEvents: {}\npodSpecLogStrategy: {}\nsso:\n  clientId:\n    key: \"\"\n  clientSecret:\n    key: \"\"\n  issuer: \"\"\n  redirectUrl: \"\"\ntelemetryConfig: {}\n"
time="2020-10-02T16:27:22Z" level=info msg="Persistence configuration disabled"
time="2020-10-02T16:27:22Z" level=info msg="Starting Workflow Controller" version=v2.11.1
time="2020-10-02T16:27:22Z" level=info msg="Workers: workflow: 32, pod: 32"
time="2020-10-02T16:27:22Z" level=info msg="Performing periodic GC every 5m0s"
time="2020-10-02T16:27:22Z" level=info msg="Persistence disabled - so archived workflow GC disabled - you must restart the controller if you enable this"
time="2020-10-02T16:27:22Z" level=info msg="Starting workflow TTL controller (resync 20m0s)"
time="2020-10-02T16:27:22Z" level=info msg="Starting prometheus metrics server at localhost:9090/metrics"
time="2020-10-02T16:27:22Z" level=info msg="Starting CronWorkflow controller"
E1002 16:27:22.063415       1 reflector.go:153] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:105: Failed to list *unstructured.Unstructured: workflowtemplates.argoproj.io is forbidden: User "system:serviceaccount:kubeflow:argo" cannot list resource "workflowtemplates" in API group "argoproj.io" at the cluster scope
time="2020-10-02T16:27:22Z" level=info msg="Started workflow TTL worker"
E1002 16:27:23.068520       1 reflector.go:153] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:105: Failed to list *unstructured.Unstructured: workflowtemplates.argoproj.io is forbidden: User "system:serviceaccount:kubeflow:argo" cannot list resource "workflowtemplates" in API group "argoproj.io" at the cluster scope
E1002 16:27:24.073525       1 reflector.go:153] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:105: Failed to list *unstructured.Unstructured: workflowtemplates.argoproj.io is forbidden: User "system:serviceaccount:kubeflow:argo" cannot list resource "workflowtemplates" in API group "argoproj.io" at the cluster scope
E1002 16:27:25.078845       1 reflector.go:153] pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:105: Failed to list *unstructured.Unstructured: workflowtemplates.argoproj.io is forbidden: User "system:serviceaccount:kubeflow:argo" cannot list resource "workflowtemplates" in API group "argoproj.io" at the cluster scope

Solved: adding --namespaced to the workflow-controller args

@xinbinhuang
Copy link

@Bobgy There are some changes needed to be made to use the latest version of Argo workflow-controller, and I can create a PR for that. I wonder if it needs to be updated after the cli client is merged?

@Bobgy
Copy link
Contributor

Bobgy commented Oct 13, 2020

@xinbinhuang You can try if upgrading argo itself solves your problem, if it passes all of our e2e tests, we can get it merged before the cli client.

@NikeNano
Copy link
Member

If we like to update the Argo version to 2.11.X I can look in to it, I guess it will be kind of similar to the update to 2.7 @Bobgy?

@NikeNano
Copy link
Member

/assign

@Bobgy
Copy link
Contributor

Bobgy commented Oct 26, 2020

@NikeNano thank you for offering help! That'll be great!

What's even better is using the chance to document how to upgrade argo, so others can learn from you next time.

@NikeNano
Copy link
Member

What's even better is using the chance to document how to upgrade argo, so others can learn from you next time.

Sounds like a good idea, will include it!

@Bobgy
Copy link
Contributor

Bobgy commented Oct 29, 2020

FYI, when upgrading to 2.11.6, you should be aware that Google requires all images to contain necessary license information in the docker image.
That's why we built gcr.io/ml-pipeline/argoexec:v2.7.5-license-compliance from https://github.com/kubeflow/pipelines/tree/master/third_party/argo. We might be missing some documentation there, so feel free to ask me when you start about that part.

I think we can split into two PRs, one upgrading the image and one upgrading the go package.

EDIT: I built https://github.com/kubeflow/testing/tree/master/py/kubeflow/testing/go-license-tools to automatically collect go dependency licenses from GitHub.

@Bobgy Bobgy unassigned rmgogogo and Bobgy Oct 29, 2020
@NikeNano
Copy link
Member

I think we can split into two PRs, one upgrading the image and one upgrading the go package.

EDIT: I built https://github.com/kubeflow/testing/tree/master/py/kubeflow/testing/go-license-tools to automatically collect go dependency licenses from GitHub.

Cool, I will look in to it when I managed to fix the dependencies correctly.

@xinbinhuang
Copy link

Sorry, the issue drifted away from my focus previously. I managed to switch over to the official argo version 2.11.6 server side for my deployment, and everything has been running smoothly. It seems that server side is more straightforward.

@NikeNano Have you started on this? If not, I can create a PR tonight to summarize what I did and you can look into it and include extra depedencies and licenese as such.

@NikeNano
Copy link
Member

I have done some initial work but make a PR with your solution @xinbinhuang, and I can help out :)

@NikeNano
Copy link
Member

FYI related work on argo to update the dependencies : argoproj/argo-workflows#4426

@NikeNano
Copy link
Member

NikeNano commented Jan 9, 2021

This was just merged to argo, argoproj/argo-workflows#4810 (comment), will start to look at it as well to see if we could push this.

This will be part of argo v3
argoproj/argo-workflows#4810 (comment)

@capri-xiyue
Copy link
Contributor

capri-xiyue commented Jan 13, 2021

@NikeNano Is there any ETA on this? And what are the remaining items of upgrading Argo to v2.11+? Do we have a task list for upgrading Argo to v2.11+?

@NikeNano
Copy link
Member

@NikeNano Is there any ETA on this? And what are the remaining items of upgrading Argo to v2.11+? Do we have a task list for upgrading Argo to v2.11+?

If we want to go for version3 we have to wait for the release until we can update as far as I see. Which should be in the end of January hopefully, argoproj/argo-workflows#4425 (comment). I guess this might not be necessary, but last time I looked at it there where some dependencies issues that I could't solve with out the need for upgrading argo, maybe you could figure it out @capri-xiyue? When I did the update to 2.7 their where a lot of issues with collision between dependencies. I suggest we wait for the release before we try to do the update.

@capri-xiyue
Copy link
Contributor

argoproj/argo-workflows#4425 (comment)

I think it will be fine to wait until the end of Jan for argo version3 if it makes the updating dependency easier.
@Bobgy https://kubernetes.io/blog/2020/12/02/dont-panic-kubernetes-and-docker/ Docker support won't be removed until late 2021. I think it should be safe to update argo after the version 3 gets released.
@NikeNano How do we usually update depdencies like argo in kfp? Is there any docs to update dependecies like argo? For example, do we have any script of updating dependencies in KFP like https://github.com/knative/eventing/blob/master/hack/update-deps.sh? I'm just wondering why there will be dependencies issues when upgrading argo. I thought go module should be able to resolve the dependencies automatically.

@Bobgy
Copy link
Contributor

Bobgy commented Jan 14, 2021

Thanks, makes sense to me waiting for argo v3 if that' end of Jan.

@NikeNano How do we usually update depdencies like argo in kfp? Is there any docs to update dependecies like argo? For example, do we have any script of updating dependencies in KFP like https://github.com/knative/eventing/blob/master/hack/update-deps.sh? I'm just wondering why there will be dependencies issues when upgrading argo. I thought go module should be able to resolve the dependencies automatically.

We do not usually update dependencies, so each time we update, they have already been pretty old and many things could be breaking after some dependencies are updated.

It's worth discussing the upgrade strategy in a project health issue.

@NikeNano
Copy link
Member

... I'm just wondering why there will be dependencies issues when upgrading argo. I thought go module should be able to resolve the dependencies automatically.

@capri-xiyue I am pretty new to go in general and have always found go dependency management to be a bit unclear. Especially how it sometimes tries to automatically resolve issues.... see https://github.com/golang/go/wiki/Modules#can-i-control-when-gomod-gets-updated-and-when-the-go-tools-use-the-network-to-satisfy-dependencies. I think it would be great if we document this, which I remember you also asked for as part of the upgrade of argo @Bobgy. But lets make a seperate issue and continue the discussion on.

@capri-xiyue
Copy link
Contributor

@NikeNano @Bobgy I created a issue for the discussion of upgrade strategy #4999

@Bobgy
Copy link
Contributor

Bobgy commented Jan 27, 2021

Progress for argo v3: https://github.com/argoproj/argo/milestone/20

@Bobgy
Copy link
Contributor

Bobgy commented Jan 27, 2021

Asked about upstream updates in argoproj/argo-workflows#4953

EDIT: got reply from argo maintainer, the suggestion is to upgrade to v2.12 now, v3 will be backward compatible, but it'll still take a while, the first RC hasn't been released yet (but will soon).

@NikeNano
Copy link
Member

I will give it a new try to update to 2.12.

@alexec
Copy link

alexec commented Jan 28, 2021

Let us know if you want help!

google-oss-robot pushed a commit that referenced this issue Mar 3, 2021
* tests

* added go mod file

* updated go.mod

* argo latest stable

* upgrade argo

* clean up

* go mod tidy to clean up

* fixed test after backend

* go mod tidy clean up

* more clean up

* added helper function and updated after feedback

* updated k8s.io/kubernetes to version 0.17.9

* updated go dependencies
@Bobgy
Copy link
Contributor

Bobgy commented Mar 10, 2021

Resolved by #5232.

Thank you for everyone who helped with this issue!
@NikeNano @alexec @xinbinhuang

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/backend help wanted The community is welcome to contribute.
Projects
None yet
Development

No branches or pull requests

8 participants