-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(executor): Minimize the number of Kubernetes API requests made by executors #4954
Conversation
fa9ede2
to
3bafd3e
Compare
Signed-off-by: Alex Collins <[email protected]>
Signed-off-by: Alex Collins <[email protected]>
Signed-off-by: Alex Collins <[email protected]>
Signed-off-by: Alex Collins <[email protected]>
Signed-off-by: Alex Collins <[email protected]>
Signed-off-by: Alex Collins <[email protected]>
Signed-off-by: Alex Collins <[email protected]>
Signed-off-by: Alex Collins <[email protected]>
Signed-off-by: Alex Collins <[email protected]>
Signed-off-by: Alex Collins <[email protected]>
Before:
|
Signed-off-by: Alex Collins <[email protected]>
Good. The builds are failing as they should. |
Signed-off-by: Alex Collins <[email protected]>
Signed-off-by: Alex Collins <[email protected]>
Signed-off-by: Alex Collins <[email protected]>
Signed-off-by: Alex Collins <[email protected]>
Signed-off-by: Alex Collins <[email protected]>
Signed-off-by: Alex Collins <[email protected]>
Signed-off-by: Alex Collins <[email protected]>
Signed-off-by: Alex Collins <[email protected]>
Signed-off-by: Alex Collins <[email protected]>
Signed-off-by: Alex Collins <[email protected]>
Signed-off-by: Alex Collins <[email protected]>
Signed-off-by: Alex Collins <[email protected]>
Signed-off-by: Alex Collins <[email protected]>
Signed-off-by: Alex Collins <[email protected]>
Signed-off-by: Alex Collins <[email protected]>
Signed-off-by: Alex Collins <[email protected]>
return *ctr | ||
} | ||
|
||
func (woc *wfOperationCtx) newWaitContainer(tmpl *wfv1.Template) (*apiv1.Container, error) { | ||
ctr := woc.newExecContainer(common.WaitContainerName, tmpl) | ||
ctr.Command = []string{"argoexec", "wait"} | ||
ctr.Command = []string{"argoexec", "wait", "--loglevel", getExecutorLogLevel()} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice
func (p *PNSExecutor) Wait(ctx context.Context, containerNames, sidecarNames []string) error { | ||
|
||
allContainerNames := append(containerNames, sidecarNames...) | ||
go p.pollRootProcesses(ctx, allContainerNames) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this happens sufficiently early along in the process to remove the need for WaitInit().
continue | ||
log.Infof("Ignoring %d exit code of container '%s'", t.ExitCode, ctr.Name) | ||
} else { | ||
return wfv1.NodeFailed, msg |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this code is much cleaner now
// https://github.com/kubernetes/kubernetes/blob/ca6bdba014f0a98efe0e0dd4e15f57d1c121d6c9/pkg/kubelet/dockertools/labels.go#L37 | ||
"--filter=label=io.kubernetes.pod.namespace="+d.namespace, | ||
"--filter=label=io.kubernetes.pod.name="+d.podName, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neat!
func (p *PNSExecutor) pollRootProcesses(timeout time.Duration) { | ||
log.Warnf("Polling root processes (%v)", timeout) | ||
deadline := time.Now().Add(timeout) | ||
func (p *PNSExecutor) pollRootProcesses(ctx context.Context, containerNames []string) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The previous implementation limited our hard tight-looped polling to 1 minute, because it was quite aggressive. How do we limit this in the new implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I’ve removed this by accident and need to add it back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is fixed now, I find it is easier to do timeouts using context.WithTimeout
because you usually have a context available
Signed-off-by: Alex Collins <[email protected]>
Signed-off-by: Alex Collins <[email protected]>
@jessesuen I've addressed your comment. |
Signed-off-by: Alex Collins <[email protected]>
This PR minimizes the number of Kubernetes API requests made by the executor.
Today, start-up, the executors make 2 Kubernetes API requests immediately,
get pod
andwatch pod
. Tomorrow, a maximum of 1 request is now only made under two circumstances:k8sapi
executor. Obviously.I've added the logging of K8S requests to help diagnose regression in the future.
get pod
for getting annotations on start-up (read from mounted annotation path)watch pod
on start-up completelydocker
usedocker ps
to determine container IDs instead of API requestk8sapi
business as usualkubelet
uselocalhost/pods
API request to determine container IDspns
introduceARGO_CONTAINER_NAME
to identify the container name for processes, works on long-running containers.Testing Notes