-
Notifications
You must be signed in to change notification settings - Fork 372
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix context issue during cleanup of kind clusters #6771
base: main
Are you sure you want to change the base?
Conversation
ci/kind/kind-setup.sh
Outdated
creationTimestamp=$(kubectl get nodes --context kind-$kind_cluster_name -o json -l node-role.kubernetes.io/control-plane | \ | ||
for context in $(kubectl config get-contexts -o name | grep 'kind-'); do | ||
cluster_name=$(echo $context | sed 's/^kind-//') | ||
if docker ps --format '{{.Names}}' | grep -q "$cluster_name"; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this list all containers in the given cluster name , docker ps --format '{{.Names}}' | grep -q "$cluster_name"
?
docker ps --format '{{.Names}}' | grep '$cluster_name'
if docker ps --format '{{.Names}}' | grep -q "$cluster_name"; then | |
if docker ps --format '{{.Names}}' | grep '$cluster_name'; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to list the cluster names we just want to check if they are present or not, so -q is required here.
ci/kind/kind-setup.sh
Outdated
creationTimestamp=$(kubectl get nodes --context kind-$kind_cluster_name -o json -l node-role.kubernetes.io/control-plane | \ | ||
for context in $(kubectl config get-contexts -o name | grep 'kind-'); do | ||
cluster_name=$(echo $context | sed 's/^kind-//') | ||
if docker ps --format '{{.Names}}' | grep -q "$cluster_name"; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are we relying on this docker command instead of kind get clusters
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And kind get nodes
can also be used to list all node names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are not relying on kind get clusters because it lists those clusters also for which the creation is not yet completed or successful, and is in mid way, which creates a problem because when we try to get contexts of such clusters which are yet not completed it returns an error.
This error we may encounter in case of aborted jobs and multiple jobs run.
In case of aborted job suppose we abort the job as soon as the cluster creation starts, so when we do kind get clusters
we will get to see that cluster from aborted job in the list but since its context will not be available so the cleanup will panic and job will fail.
In case of multiple jobs run suppose there are two jobs running parallely and one job has just triggered cluster creation step and the other job triggers the cleanup function so it will list the cluster which is ye not created and then it will try to fetch the context for that cluster and the job will fail because of panic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sometimes because of these stale clusters present in the environment testbed also becomes unhealthy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jainpulkit22 Thanks for the explanation, my concern is whether docker ps --format '{{.Names}}' | grep -q "$cluster_name"
is a sufficient basis to determine if a cluster is ready. Is there something like status.conditions fields in the context that would allow us to accurately determine if the cluster is ready?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was trying to point out that at this stage we are already iterating over known contexts (for context in $(kubectl config get-contexts -o name | grep 'kind-')
, so we know that the context exists. It is not the same as the previous solution, which was first calling kind get clusters
, and then calling kubectl config get-contexts
.
In your comment and in the original issue, you talk about a "panic", but it's not clear to me what failure you are referring to exactly.
Note that even with your proposed solution, races seem possible:
- job 1 calls
kubectl config get-contexts
and getscontext1
- job 2 calls
kubectl config get-contexts
and getscontext1
- job 1 calls
docker ps
and finds the container it is looking for - job 2 calls
docker ps
and finds the container it is looking for - job 1 calls
kubectl get nodes --context context1 ...
succesfully - job 1 deletes the kind cluster sucessfully
- job 2 calls
kubectl get nodes --context context1 ...
butcontext1
does not exist anymore!
Either the code should be best effort and tolerate failures, or maybe you should use a lock (see flock
) for correctness, and then you don't need to worry about concurrent executions and you can use the most appropriate commands for the task.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In your comment and in the original issue, you talk about a "panic", but it's not clear to me what failure you are referring to exactly.
The failure I talked about in the issue can be explained with the following example.
Suppose there are two jobs running in parallel JobA and JobB, now one of the jobs has just triggered the cluster creation, and the other job say JobB in in the initial cleanup phase, so when it will run the kind get clusters
command it will list the cluster for which the creation has just started by JobA
, and then when it tries to run the command kubectl get nodes --context ...
it will panic because the context does not exist or we can say that cluster creation is yet not completed.
Now, coming to the race condition which you mentioned, I think that this race condition will not be there in practical, because there will be few seconds of time lag between two parallel runs so if one has already deleted the cluster then docker ps will not list that container.
So, the race condition which you mentioned here seems to be a race condition in Ideal scenario but not practically possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still believe that the docker
command is unnecessary:
We are not relying on kind get clusters because it lists those clusters also for which the creation is not yet completed or successful, and is in mid way, which creates a problem because when we try to get contexts of such clusters which are yet not completed it returns an error.
The outer loop is still for context in $(kubectl config get-contexts -o name | grep 'kind-'); do
, so we already have the context
it will panic
Do you mean a Go panic in kubectl
?
So, the race condition which you mentioned here seems to be a race condition in Ideal scenario but not practically possible.
2 jobs can be triggered at the same time. We should probably be tolerant of failures. If the kubectl
command (kubectl get nodes
) fails or the kind
command fails (kind delete cluster --name $cluster_name
), we don't have to fail the script.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still believe that the
docker
command is unnecessary:
The docker command is necessary here because sometimes the contexts are there in the kubectl config get contexts but the cluster is deleted or is in the creation step.
The outer loop is still
for context in $(kubectl config get-contexts -o name | grep 'kind-'); do
, so we already have the context
If there is any backlog cluster which is not deleted or created successfully because of the job being aborted or terminated, in this case some context may be present even though there is no running cluster for the same.So, just to be careful while deletion i used an extra check using docker command.
Do you mean a Go panic in
kubectl
?
Yes
2 jobs can be triggered at the same time. We should probably be tolerant of failures. If the
kubectl
command (kubectl get nodes
) fails or thekind
command fails (kind delete cluster --name $cluster_name
), we don't have to fail the script.
Okay we can have a fault tolerance for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the code, this ensures fault tolerance, I haven't used flock because i guess that would affect he parallel job run because if some job is in cluster creation or is running e2e and the other has acquired flock over the script file then it will not be able to read further data.
c7011d3
to
057ba01
Compare
timeout=$(($UNTIL_TIME_IN_MINS*60)) | ||
if [[ $diff -gt $timeout ]]; then | ||
echo "=== Cluster ${cluster_name} is present for more than $UNTIL_TIME_IN_MINS minutes ===" | ||
kind delete cluster --name $cluster_name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Considering the race condition, if one job retrieves the cluster name using kubectl config get-contexts -o name
before the kind delete cluster
deletes the the same cluster in another job, will kubectl get nodes --context $context
get stuck or immediately return an error if the cluster has already been deleted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 jobs can be triggered at the same time. We should probably be tolerant of failures. If the kubectl command (kubectl get nodes) fails or the kind command fails (kind delete cluster --name $cluster_name), we don't have to fail the script.
@antoninbas I feel we still need a lock to fully avoid race conditions and ensure the cluster information remains consistent during timestamp calculations and cluster deletion 🤔️.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tend to agree. Honestly I think the core issue is that we try to use kubeconfig and other sources of truth to determine which clusters to delete. We rely on some specific behavior from the tools and we have to tolerate failures. It is not very important in the grand scheme of things as long as we don't corrupt or cause jobs to fail while garbage-collecting clusters. But I do feel like a lot of pain could be avoided by maintaining our own file with a list of clusters, instead of relying on kind
or kubeconfig
.
Before creating a cluster: write the cluster name and creation timestamp to a file, using a file lock
: "${CLUSTER_NAME:=foobar}"
flock .clusters.lock --command "echo \"$CLUSTER_NAME $(date +%s)\" >> .clusters"
# kind create cluster --name $CLUSTER_NAME
When doing garbage collection, use the file as the only source of truth, and also use an exclusive file lock
(
flock -x 200
current_timestamp=$(date +%s)
while IFS=' ' read -r name creationTimestamp; do
if [[ -z "$name" || -z "$creationTimestamp" ]]; then
continue
fi
# Calculate the time difference
time_difference=$((current_timestamp - creationTimestamp))
# Check if the creation happened more than 1 hour ago (3600 seconds)
if (( time_difference > 3600 )); then
echo "The creation of $name happened more than 1 hour ago."
kind delete cluster --name "$name" || echo "Cluster could not be deleted"
else
echo "The creation of $name happened within the last hour."
echo "$name $creationTimestamp" >> .clusters.swp
fi
done < .clusters
mv .clusters.swp .clusters
) 200>>.clusters.lock
IMO, correctness is more "obvious" this way, and it's not more brittle than relying on kubeconfig.
The directory where we write .clusters
and .clusters.lock
need to persist across Jenkins jobs.
What do you think @XinShuYang @jainpulkit22
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, correctness is more "obvious" this way, and it's not more brittle than relying on kubeconfig. The directory where we write
.clusters
and.clusters.lock
need to persist across Jenkins jobs.What do you think @XinShuYang @jainpulkit22
Agree, to ensure persistence, the directory for .clusters
and .clusters.lock
can be set to WORKDIR
which is consistent across different Jenkins jobs. We can add a new parameter for this path in the script, making it adaptable to different environments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of creating a separate file named ".clusters" in the testbed, I have an alternative that we can acquire the lock on the default kubeconfig file so that the other prorcess will not be able to fetch the contexts untill the lock from one is released, so this can prevent the race condition too.
What's your take on this @antoninbas @XinShuYang
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of creating a separate file named ".clusters" in the testbed, I have an alternative that we can acquire the lock on the default kubeconfig file so that the other prorcess will not be able to fetch the contexts untill the lock from one is released, so this can prevent the race condition too. What's your take on this @antoninbas @XinShuYang
I am okay with placing this lock in the .kube
folder, but could you verify if THIS_DIR
is the correct path that is shared by different jobs? Additionally, I recommend using trap
to release the lock in case the context it protects exits unexpectedly, to prevent a deadlock issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay
Signed-off-by: Pulkit Jain <[email protected]>
057ba01
to
8612cb2
Compare
Fix context issue during cleanup of kind clusters.
Fixes #6768.