Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(backend): Upgrade argo to v3.4.16 #10568

Merged
merged 7 commits into from
Apr 16, 2024

Conversation

gmfrasca
Copy link
Member

@gmfrasca gmfrasca commented Mar 13, 2024

Description of your changes:
Implements #10469, also Fixes #8935

  • Upgrades Argo backend to v3.4.16, latest in the argo v3.4 release stream
  • Implements RetrievePodName function to work around NodeStatus.ID issue (see below)
  • Updates unit tests accordingly

Note: This PR supercedes #9301, copying description from PR:
Fixes the following CVEs:

https://github.com/advisories/GHSA-4f9f-mpmj-4c52
https://github.com/advisories/GHSA-98w6-hw73-ph8m
CVE-2022-23521
CVE-2022-41903
https://github.com/advisories/GHSA-grfr-78m7-q35q
https://github.com/advisories/GHSA-cfmr-vrgj-vqwv
https://github.com/advisories/GHSA-75qm-2q4j-qx6g

Breaking changes:

- Argo 3.4 removed support for choosing container runtime executors, emissary is the only option left: https://argoproj.github.io/argo-workflows/workflow-executors/
- Argo 3.4 no longer maintains the naming convention of NodeStatus.ID and NodeStatus.Name being identical.  This was addressed in the code changes in `workflow.go`

Notes:

  • The fix for the NodeStatus naming bug/breaking change was extracted to a seperate function, such that when it is fixed it can be easily updated in one place. Monitor the fix in argo here: feat: store podname in nodestatus argoproj/argo-workflows#12503
  • I believe the containerRuntimeExecutor value in the WorkflowController ConfigMap needs to be removed in all manifests, or else the WC Pod will fail on an invalid input json error. May need assistance testing across all platforms

Checklist:

Copy link

Hi @gmfrasca. Thanks for your PR.

I'm waiting for a kubeflow member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@gmfrasca gmfrasca force-pushed the argo-3416 branch 2 times, most recently from 93933f8 to dd939f3 Compare March 14, 2024 00:05
@chensun chensun requested a review from Tomcli March 14, 2024 01:00
@HumairAK
Copy link
Collaborator

/ok-to-test

@rimolive
Copy link
Member

/test kubeflow-pipeline-e2e-test

@gmfrasca
Copy link
Member Author

gmfrasca commented Mar 14, 2024

I believe the CI errors are due to the respective Argo images not existing in the gcr.io/ml-pipeline image repository (WorkflowController pod fails to come up because of ImageBackOff error, since the image doesn't exist), since this PR would be the one that updates them.

I'm assuming I don't have permissions to build and publish them to the repository so who would be responsible for taking that action?

@gmfrasca
Copy link
Member Author

Another quick note: When testing this, there was a unit test that was failing (Test_executeV2_Parameters), but this was also the case before any changes made by this PR, so presumably that's out-of-scope here. All other unit tests should pass

@rimolive
Copy link
Member

@chensun @zijianjoy Can you push Argo 3.4 images to gcr to ensure test will pass?

@chensun
Copy link
Member

chensun commented Mar 26, 2024

@chensun @zijianjoy Can you push Argo 3.4 images to gcr to ensure test will pass?

Done. gcr.io/ml-pipeline/argoexec:v3.4.16-license-compliance and gcr.io/ml-pipeline/workflow-controller:v3.4.16-license-compliance are now available.

#10618

@chensun
Copy link
Member

chensun commented Mar 26, 2024

/retest

@gmfrasca gmfrasca changed the title WIP: feat(backend): Upgrade argo to v3.4.16 feat(backend): Upgrade argo to v3.4.16 Mar 26, 2024
@gmfrasca gmfrasca force-pushed the argo-3416 branch 2 times, most recently from fe7c39f to b2fbc1c Compare March 26, 2024 22:05
@rimolive
Copy link
Member

/test kubeflow-pipeline-backend-test

@gmfrasca
Copy link
Member Author

gmfrasca commented Apr 3, 2024

looking into the e2e failure but in the meantime, looks like mkp was a network flake so retesting to see if investigation is needed

/test kubeflow-pipeline-mkp-test

Copy link
Member

@Tomcli Tomcli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @gmfrasca
/lgtm

Signed-off-by: Giulio Frasca <[email protected]>
Signed-off-by: Giulio Frasca <[email protected]>
Signed-off-by: Giulio Frasca <[email protected]>
- Argo 3.4.16 upgrade introduces a breaking change with inconsistent node.ID vs
  node.Name
- introduce a function in workflow.go to conditionally handle this

Signed-off-by: Giulio Frasca <[email protected]>
- PNS Executor was removed in Argo v3.4, so manifests no longer valid
- WorkflowController will fail to start if `containerRuntimeExecutor`
  provided as input parameter, so remove from WC ConfigMap and CM
  patches

Signed-off-by: Giulio Frasca <[email protected]>
- Stemming from upgrade to argo 3.4, Pod Name is no longer always the
  same as NodeID, which breaks a few tabs (PodInfo, PodEvents and
  PodLogs).  Add function to address this

Signed-off-by: Giulio Frasca <[email protected]>
@gmfrasca
Copy link
Member Author

^just addresses the rebase/merge conflicts (only affects go.mod and licenses CSVs)

@HumairAK
Copy link
Collaborator

@chensun / @zijianjoy bump, any thing else we can do to help get this PR merged?

@rimolive
Copy link
Member

/lgtm

@google-oss-prow google-oss-prow bot added the lgtm label Apr 16, 2024
Copy link
Member

@chensun chensun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

Thanks!

Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: chensun, Tomcli

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[frontend] ui can't retrieve pod logs when using argo-workflows 3.4+ (POD_NAMES change)
5 participants