You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There is a potential issue in our statefulsets (STS) which is as follows:
Our STSs have an updateStrategy of rollingUpdate which means when you make certain edits to the STS, the STS will begin to do a rolling update of its pods. Examples of these updates include changing the image version, labels, annotations.
When the STS cycles through the pods, the PodStatus moves from ContainerCreating to Initialized. Then, the livenessProbe kicks in and marks the pod as Running; and then the readinessProbe kicks in and marks the pod as Ready.
As soon as this state is reached, the STS moves forward and starts to terminate and update the next pod in the STS.
At this point from the CR cluster's standpoint, the node has joined back, but the cluster may still be resolving under-replicated ranges, etc. It would be better from the CRDB perspective to wait for these types of issues to be resolved and stable before moving to taking the next pod down -- especially when the cluster is under load.
To remedy this situation, I propose that we add a startupProbe to the STS. The startupProbe is supported in k8s 1.16+. The startupProbe, when defined, delays the start of the livenessProbe and readinessProbe. Once it exits successfully, then the other probes kick in. If it doesn't exit successfully, then the pod is terminated and is subject to its restartPolicy.
Here is a startupProbe that I tested successfully.
startupProbe:
exec:
command:
- /bin/sh
- -c
- |
for i in {1..30};
do
UR=$(/cockroach/cockroach sql \
--certs-dir=/cockroach/cockroach-certs/ \
-e "SELECT SUM((metrics->>'ranges.underreplicated')::DECIMAL)::INT8 AS ranges_underreplicated FROM crdb_internal.kv_store_status S INNER JOIN crdb_internal.gossip_liveness L ON S.node_id = L.node_id WHERE L.decommissioning <> true;" \
--format raw \
--host=cockroachdb-public | awk '{if(NR>3)print}' | awk '{if(NR==1)print}'
);
echo "Under-replicated ranges: $UR" >> /usr/share/message;
if [ -z "$UR" ];
then
echo "No under-replicated ranges reported. Sleeping for 10 seconds - iteration $i" >> /usr/share/message;
sleep 10;
continue;
fi
if [ $UR -gt 0 ];
then
echo "Sleeping for 10 seconds - iteration $i" >> /usr/share/message;
sleep 10;
else
echo "breaking out of loop" >> /usr/share/message;
break;
fi
done
exit 0;
failureThreshold: 1
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
Here is a script that can be used to monitor pods in the STS as they are cycled:
A few things I haven't totally worked through that need further consideration and testing:
How well does it handle long-running startups (for instance, downloading an image for the first time)?
Various cluster configs (single-region, multi-region, single-node)
How does it react when running on k8s versions prior to 1.16?
Besides under-replicated ranges, are there other scenarios that the probe should consider? Resources available? Running jobs? Storage capacity? Gossip established with x% of the nodes in the cluster?
Tested the script on EKS 1.17 and found EKS doesn't support alpha features in 1.17. The StartupProbe is an alpha feature in 1.16 and became beta feature in 1.18 aws/containers-roadmap#947
James Hatcher (jhatcher9999) commented:
There is a potential issue in our statefulsets (STS) which is as follows:
To remedy this situation, I propose that we add a startupProbe to the STS. The startupProbe is supported in k8s 1.16+. The startupProbe, when defined, delays the start of the livenessProbe and readinessProbe. Once it exits successfully, then the other probes kick in. If it doesn't exit successfully, then the pod is terminated and is subject to its restartPolicy.
Here is a startupProbe that I tested successfully.
Here is a script that can be used to monitor pods in the STS as they are cycled:
A few things I haven't totally worked through that need further consideration and testing:
Jira Issue: DOC-1071
The text was updated successfully, but these errors were encountered: