feat: store podname in nodestatus #12503

isubasinghe · 2024-01-13T03:57:14Z

Motivation

Modifications

The changes are relatively simple, I reduced the pod name generation code as much as possible and stored the PodnName in
the node itself.

Verification

Verified via testing, this change introduces a slight change to existing behaviour however. The change here is that we must have created a pod before a podName is assigned.

Signed-off-by: isubasinghe <[email protected]>

Signed-off-by: Isitha Subasinghe <[email protected]>

sarabala1979 · 2024-01-16T06:26:01Z

@isubasinghe why do we need the pod name in node status? Can you provide more details about usecase/scenario?
This will increase the workflow object size that will reduce the number of steps.
if the usecase is to find the pod for that particular node, node name can be added as label in the pod object.

isubasinghe · 2024-01-16T10:09:01Z

@sarabala1979 the issue isn't created yet, but this PR addressed this comment by Alex where we decided to store the podName in the Node status: #10267 (comment)

isubasinghe · 2024-01-16T10:17:29Z

cmd/argo/commands/common/get.go

+		if node.PodName != nil {
+			podName = *node.PodName
+		} else {
+			expectedPodName := util.GeneratePodName(wfName, node.Name, templateName, node.ID, podNameVersion)
+			panic("podName absent expected " + expectedPodName + " for " + node.Name)
+		}


This is probably not the desirable behaviour, there might be cases where the pod has not been created yet.
What should we do in this case?

agilgur5

I only took a quick glance at this, but doesn't this need to modify the UI code as well?

Also, for backward-compatibility, we'd still have to generate the pod names for Workflows created before this change.

terrytangyuan

We'll need to test this thoroughly this time

isubasinghe · 2024-01-30T10:20:09Z

I only took a quick glance at this, but doesn't this need to modify the UI code as well?

I thought I replied to this, must have just not pressed the comment button I suppose, this was on purpose, wanted to make the backend changes first and follow it up with the frontend changes, but happy to make the UI changes here as well if you'd like me to.

Also, for backward-compatibility, we'd still have to generate the pod names for Workflows created before this change.

Hmmm yeah you are right. What I might do is also keep the previous pod name generation code.
We can than then also use that to test if these changes kept the original behaviour.

HumairAK · 2024-04-14T18:06:58Z

Bumping this, @isubasinghe do you think you can rebase and resolve the conflicts?

@terrytangyuan this is something we use in kubeflow pipelines to retrieve pod names (we are currently planning to use a workaround you prescribed here), we are very much interested in seeing this merged, so we can rely on Argo to provide this value, what can we do to get this some more traction?

terrytangyuan · 2024-04-14T18:43:23Z

We can discuss this in the upcoming contributors meeting to prioritize.

agilgur5 · 2024-04-15T01:21:38Z

@HumairAK this is currently a feature, so it wouldn't land until the next minor (currently 3.6). If KFP is only bumping to 3.4, this PR wouldn't solve your problem

HumairAK · 2024-04-15T17:30:17Z

@agilgur5 that is fine, hence why we plan to go with the work around for now, until we can upgrade to the version that (hopefully) has this change included, at which point we'll switch to relying on argo for this.

agilgur5

So this certainly simplifies things moving forward (not during a backward-compat window though), but in recent months I had found older comments from Alex C about trying to store as little as possible in the status due to our existing issues with large Workflows (nodeStatusOffload + general memory usage), such as: #9193 (comment):

Please don't add new fields to NodeStatus as each new fields needsO(n) extra storage when n is the number of nodes. It is better to traverse status and build a new structure if need. A one-time traverse would bo O(n).

I'm wondering if that was perhaps the original rationale behind some of the derivations in the codebase -- derive the field instead of storing it in status.

Effectively, this is a trade-off between storage and determinism. It actually reminds me quite heavily of the evolution of lockfiles in the JS ecosystem.
There though, determinism is simple to favor over storage, as hard disk/FS storage is cheap and plentiful (although VCS storage and diffs are a bit more complicated), compared to in-memory or etcd storage, where we have a hard limit

agilgur5 · 2024-04-16T15:40:47Z

@HumairAK ok. That does affect our prioritization though, as that means KFP won't be using this functionality in the short or likely near-term.

agilgur5 · 2024-04-16T17:41:36Z

Effectively, this is a trade-off between storage and determinism.

To be clear, I don't have a strong opinion on that per se; but I think we should be explicit about that -- larger workflows would be affected by this feature.

I probably lean toward agreeing with this feature, as it would simplify areas of the codebase where we do have some bugs / edge-cases currently that are handled in different ways in different places.

(not during a backward-compat window though)

The deprecation window is actually a bit complicated now that I think about it. If we ever want to remove the pod name generation code, we would break backward-compat with Workflows that were created in a version that doesn't contain the pod name in the status.

For instance, say 3.6 puts pod names into the status by default and has backward-compat to derive pod names when not present in the status. That backward-compat means the POD_NAMES env var still has to be set.

Then say, in 3.7 we were to remove the backward-compat -- all Workflows created with Argo <3.6 would no longer be processable, e.g. for retries or Pod logs in the UI and possibly other features.
We may perhaps want to keep this backward-compat around for more than 1 minor version as such (although I think Emissary only had 1 minor version of backward-compat as well: default in 3.3, only option in 3.4 -- so there may be precedent to only do 1 minor of backward-compat).

With that in mind, the pod name derivation code is going to still have to be around for a while. Defaulting to this feature means that'd we could eliminate bugs for newer Workflows, but might still have them present in older Workflows. So the frequency of the old bugs would decrease, but we may want to still fix them.
Or in other words, the backward-compat requirement decreases some of the benefit of this feature.

terrytangyuan

One concern is that this will increase the object size.

agilgur5 · 2024-06-19T02:18:46Z

Discussed in today's Contributor Meeting. The storage concern that I brought up earlier was still relevant, especially as we've had more users recently reporting etcd getting full (particularly on managed k8s providers where you cannot configure your etcd space), such as #12802.

But given that I brought it up and I personally think the trade-off is worthwhile in this case -- determinism over storage and treat storage as a separate problem to optimize independent of any one field -- this has the green light to go.

We do need to implement the backward compat in this PR as well as decide whether pod names will be on all nodes or just pod nodes.

github-actions · 2024-07-23T02:15:11Z

This PR has been automatically marked as stale because it has not had recent activity and needs further changes. It will be closed if no further activity occurs.

github-actions · 2024-08-06T02:15:27Z

This PR has been closed due to inactivity and lack of changes. If you would like to still work on this PR, please address the review comments and re-open.

HumairAK · 2024-08-06T16:20:48Z

this has the green light to go.

hmm would still be interested in seeing this change

@isubasinghe do you have the bandwidth to continue work on this?

isubasinghe · 2024-08-09T05:08:25Z

@HumairAK will continue on this when I get the chance

feat: store podname in nodestatus

5977da2

Signed-off-by: isubasinghe <[email protected]>

isubasinghe force-pushed the feature-podname-store branch from 0293772 to 5977da2 Compare January 13, 2024 03:58

isubasinghe added 2 commits January 13, 2024 15:03

fix: updated test to carry podName

7d4a1e8

Signed-off-by: Isitha Subasinghe <[email protected]>

fix: skip delete if omitted

de93f17

Signed-off-by: Isitha Subasinghe <[email protected]>

isubasinghe marked this pull request as ready for review January 14, 2024 08:41

isubasinghe requested review from terrytangyuan and sarabala1979 as code owners January 14, 2024 08:41

sarabala1979 self-assigned this Jan 16, 2024

isubasinghe commented Jan 16, 2024

View reviewed changes

agilgur5 reviewed Jan 17, 2024

View reviewed changes

terrytangyuan reviewed Jan 21, 2024

View reviewed changes

isubasinghe mentioned this pull request Jan 30, 2024

REQUEST: Promotion to Approver for @isubasinghe argoproj/argoproj#232

Closed

7 tasks

HumairAK mentioned this pull request Jan 31, 2024

feat!: Upgrade argo to v3.4.7 kubeflow/pipelines#9301

Closed

1 task

agilgur5 added the area/controller Controller issues, panics label Feb 17, 2024

gmfrasca mentioned this pull request Mar 13, 2024

feat(backend): Upgrade argo to v3.4.16 kubeflow/pipelines#10568

Merged

1 task

HumairAK mentioned this pull request Apr 14, 2024

[frontend] ui can't retrieve pod logs when using argo-workflows 3.4+ (POD_NAMES change) kubeflow/pipelines#8935

Closed

agilgur5 reviewed Apr 16, 2024

View reviewed changes

terrytangyuan reviewed Jun 19, 2024

View reviewed changes

juliev0 added the problem/more information needed Not enough information has been provide to diagnose this issue. label Jul 8, 2024

github-actions bot added the problem/stale This has not had a response in some time label Jul 23, 2024

github-actions bot closed this Aug 6, 2024

isubasinghe reopened this Aug 9, 2024

isubasinghe marked this pull request as draft August 9, 2024 05:08

github-actions bot removed problem/stale This has not had a response in some time problem/more information needed Not enough information has been provide to diagnose this issue. labels Aug 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: store podname in nodestatus #12503

feat: store podname in nodestatus #12503

isubasinghe commented Jan 13, 2024 •

edited

Loading

sarabala1979 commented Jan 16, 2024

isubasinghe commented Jan 16, 2024

isubasinghe Jan 16, 2024

agilgur5 left a comment

terrytangyuan left a comment

isubasinghe commented Jan 30, 2024

HumairAK commented Apr 14, 2024

terrytangyuan commented Apr 14, 2024

agilgur5 commented Apr 15, 2024 •

edited

Loading

HumairAK commented Apr 15, 2024

agilgur5 left a comment •

edited

Loading

agilgur5 commented Apr 16, 2024

agilgur5 commented Apr 16, 2024 •

edited

Loading

terrytangyuan left a comment

agilgur5 commented Jun 19, 2024

github-actions bot commented Jul 23, 2024

github-actions bot commented Aug 6, 2024

HumairAK commented Aug 6, 2024

isubasinghe commented Aug 9, 2024

feat: store podname in nodestatus #12503

Are you sure you want to change the base?

feat: store podname in nodestatus #12503

Conversation

isubasinghe commented Jan 13, 2024 • edited Loading

Motivation

Modifications

Verification

sarabala1979 commented Jan 16, 2024

isubasinghe commented Jan 16, 2024

isubasinghe Jan 16, 2024

Choose a reason for hiding this comment

agilgur5 left a comment

Choose a reason for hiding this comment

terrytangyuan left a comment

Choose a reason for hiding this comment

isubasinghe commented Jan 30, 2024

HumairAK commented Apr 14, 2024

terrytangyuan commented Apr 14, 2024

agilgur5 commented Apr 15, 2024 • edited Loading

HumairAK commented Apr 15, 2024

agilgur5 left a comment • edited Loading

Choose a reason for hiding this comment

agilgur5 commented Apr 16, 2024

agilgur5 commented Apr 16, 2024 • edited Loading

terrytangyuan left a comment

Choose a reason for hiding this comment

agilgur5 commented Jun 19, 2024

github-actions bot commented Jul 23, 2024

github-actions bot commented Aug 6, 2024

HumairAK commented Aug 6, 2024

isubasinghe commented Aug 9, 2024

isubasinghe commented Jan 13, 2024 •

edited

Loading

agilgur5 commented Apr 15, 2024 •

edited

Loading

agilgur5 left a comment •

edited

Loading

agilgur5 commented Apr 16, 2024 •

edited

Loading