akash tx update response & behavior when provider is out of resources #82

arno01 · 2022-01-04T18:21:22Z

When a user updates his deployment, he may get the following, confusing him, message:

in the following example he was using akashlytics to update his deployment

web: undefined [Warning] [FailedScheduling] [Pod] 0/6 nodes are available: 1 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 5 Insufficient cpu.
web: undefined [Normal] [SuccessfulCreate] [ReplicaSet] Created pod: web-6db9665ccb-92p4v
web: undefined [Normal] [ScalingReplicaSet] [Deployment] Scaled up replica set web-6db9665ccb to 1

This is happening because K8s won't destroy an old pod instance until it ensures the new one has been created.
Since there is no available node for deploying the new pod, it gets stuck in "Running" & "Pending" state.
Things will move on as soon as one of the nodes gets enough CPU, RAM & disk requested by the deployment.
This is how K8s is working in order to prevent the service outage, however the user might want to get a better message. OR, alternatively, a user could be granted an option such as --force which would destroy the previously running Pod, i.e. that would probably be similar to destroy & recreate method.

root@foxtrot:~# kubectl -n $NS get pods
NAME                   READY   STATUS    RESTARTS   AGE
web-69989588c7-2w5c4   1/1     Running   0          17h
web-6db9665ccb-92p4v   0/1     Pending   0          18m

root@foxtrot:~# kubectl -n $NS describe pods | grep -Ew "^Name:|cpu:"
Name:         web-69989588c7-2w5c4
      cpu:                10
      cpu:                10
Name:           web-6db9665ccb-92p4v
      cpu:                10
      cpu:                10

Jan 04 12:08:23 foxtrot.provider start-provider.sh[349407]: D[2022-01-04|12:08:23.444] inventory fetched                            module=provider-cluster cmp=service cmp=inventory-service nodes=7
Jan 04 12:08:23 foxtrot.provider start-provider.sh[349407]: D[2022-01-04|12:08:23.445] node resources                               module=provider-cluster cmp=service cmp=inventory-service node-id=foxtrot.provider available-cpu="units:<val:\"6875\" > attributes:<key:\"arch\" value:\"amd64\" > " available-memory="quantity:<val:\"16263143424\" > " available-storage="quantity:<val:\"225335708095\" > "
Jan 04 12:08:23 foxtrot.provider start-provider.sh[349407]: D[2022-01-04|12:08:23.445] node resources                               module=provider-cluster cmp=service cmp=inventory-service node-id=golf.provider available-cpu="units:<val:\"125\" > attributes:<key:\"arch\" value:\"amd64\" > " available-memory="quantity:<val:\"30669094912\" > " available-storage="quantity:<val:\"880184186644\" > "
Jan 04 12:08:23 foxtrot.provider start-provider.sh[349407]: D[2022-01-04|12:08:23.445] node resources                               module=provider-cluster cmp=service cmp=inventory-service node-id= available-cpu="units:<val:\"0\" > attributes:<key:\"arch\" > " available-memory="quantity:<val:\"0\" > " available-storage="quantity:<val:\"0\" > "
Jan 04 12:08:23 foxtrot.provider start-provider.sh[349407]: D[2022-01-04|12:08:23.445] node resources                               module=provider-cluster cmp=service cmp=inventory-service node-id=alpha.ingress available-cpu="units:<val:\"3625\" > attributes:<key:\"arch\" value:\"amd64\" > " available-memory="quantity:<val:\"13253083136\" > " available-storage="quantity:<val:\"849673462802\" > "
Jan 04 12:08:23 foxtrot.provider start-provider.sh[349407]: D[2022-01-04|12:08:23.446] node resources                               module=provider-cluster cmp=service cmp=inventory-service node-id=bravo.ingress available-cpu="units:<val:\"8025\" > attributes:<key:\"arch\" value:\"amd64\" > " available-memory="quantity:<val:\"22012821504\" > " available-storage="quantity:<val:\"313339421714\" > "
Jan 04 12:08:23 foxtrot.provider start-provider.sh[349407]: D[2022-01-04|12:08:23.446] node resources                               module=provider-cluster cmp=service cmp=inventory-service node-id=charley.ingress available-cpu="units:<val:\"5625\" > attributes:<key:\"arch\" value:\"amd64\" > " available-memory="quantity:<val:\"27698855936\" > " available-storage="quantity:<val:\"96585534905\" > "
Jan 04 12:08:23 foxtrot.provider start-provider.sh[349407]: D[2022-01-04|12:08:23.446] node resources                               module=provider-cluster cmp=service cmp=inventory-service node-id=delta.ingress available-cpu="units:<val:\"3625\" > attributes:<key:\"arch\" value:\"amd64\" > " available-memory="quantity:<val:\"31188185088\" > " available-storage="quantity:<val:\"880234408520\" > "

The text was updated successfully, but these errors were encountered:

arno01 · 2022-01-04T18:22:17Z

cc @boz @dmikey

88plug · 2022-01-04T18:33:01Z

I am that user -

Filled up a providers machines with some xmrig deployments to over 90% fill rate - had 2 of them crash, went to re-deploy a new image using the update button in Akashlytics and was unable to

web: undefined [Warning] [FailedScheduling] [Pod] 0/6 nodes are available: 1 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 5 Insufficient cpu.
web: undefined [Normal] [SuccessfulCreate] [ReplicaSet] Created pod: web-6db9665ccb-92p4v
web: undefined [Normal] [ScalingReplicaSet] [Deployment] Scaled up replica set web-6db9665ccb to 1

Strasser-Pablo · 2022-02-20T22:34:04Z

As a provider, I have the same situation with the same user deploying 2 deployement and beeing unable to update the deployement. A new replica set is created but because of lack of ressource the new replica set never come online. Note that there is no disruption of service as the old replica set stay up.

boz · 2022-02-21T02:36:28Z

Thanks all for the report. Interesting case. Few thoughts:

Some kind of optional --force with a clear message around it is a good suggestion.
Inventory "overcommit" can be reduced.
Inventory can always reserve double the largest deployed resources.

All of them have drawbacks, of course. In the meantime, for mining and other stateless workloads, I suggest closing the deployment and creating a new one if you hit the described scenario.

Strasser-Pablo · 2022-02-21T14:03:16Z

I think having a default option which use --force is a good solution. I think the majority of deployment can be forced. For the rare case where high availability is needed a special option for that could be used but with the understanding that update may be more difficult. Currently as a provider when I see a deployment stuck because of that, generally miner I just force delete the old replica set and make sure the deployment work again.

dmikey · 2022-08-30T14:46:02Z

I've hit this today, bug still in place.

andy108369 · 2023-01-06T12:35:00Z

This is still happening with provider-services 0.1.0, akash 0.20.0.
Especially when provider is packed.

[Warning] [FailedScheduling] [Pod] 0/5 nodes are available: 5 Insufficient cpu. preemption: 0/5 nodes are available: 5 No preemption victims found for incoming pod.

andy108369 · 2023-03-22T10:58:56Z

akash force new replicasets workaround

Create /usr/local/bin/akash-force-new-replicasets.sh file

cat > /usr/local/bin/akash-force-new-replicasets.sh <<'EOF'
#!/bin/bash
#
# Version: 0.2 - 25 March 2023
# Files:
# - /usr/local/bin/akash-force-new-replicasets.sh
# - /etc/cron.d/akash-force-new-replicasets
#
# Description:
# This workaround goes through the newest deployments/replicasets, pods of which can't get deployed due to "insufficient resources" errors and it then removes the older replicasets leaving the newest (latest) one.
# This is only a workaround until a better solution to https://github.com/akash-network/support/issues/82 is found.
#

kubectl get deployment -l akash.network/manifest-service -A -o=jsonpath='{range .items[*]}{.metadata.namespace} {.metadata.name}{"\n"}{end}' |
  while read ns app; do
    kubectl -n $ns rollout status --timeout=10s deployment/${app} >/dev/null 2>&1
    rc=$?
    if [[ $rc -ne 0 ]]; then
      if kubectl -n $ns describe pods | grep -q "Insufficient"; then
        OLD="$(kubectl -n $ns get replicaset -o json -l akash.network/manifest-service --sort-by='{.metadata.creationTimestamp}' | jq -r '(.items | reverse)[1:][] | .metadata.name')"
        for i in $OLD; do kubectl -n $ns delete replicaset $i; done
      fi
    fi
  done
EOF

Mark it as executable file

chmod +x /usr/local/bin/akash-force-new-replicasets.sh

Create the crontab job /etc/cron.d/akash-force-new-replicasets to run the workaround every 5 minutes

cat > /etc/cron.d/akash-force-new-replicasets << 'EOF'
PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
SHELL=/bin/bash

*/5 * * * * root /usr/local/bin/akash-force-new-replicasets.sh
EOF

andy108369 · 2023-03-22T13:22:29Z

Workaround has been added to the docs https://akash.network/docs/providers/provider-faq-and-guide/#force-new-replicaset-workaround by @chainzero

sterburg · 2023-03-22T19:49:50Z

Let the user define what kind of SLA he wants on his deployments.
If he is ok with a short downtime during redeployment then just set the strategy to Recreate instead of Rolling.
That was rhe old pod is taken down first before spinning up a new pod.
If he want high-availibility SLA then set strategy to Rolling (and replicas to 2+).

Other thing is EvictionPolicy:
You can set to evict pods with a lower priorityClass whenever a higher priority pod gets scheduled or whenever the node has an eviction pressure (is full).
After pods are evicted (force delete) and node is freed up, new pod schedules can take place.
This way you have a disruption of the lower priority pods, not the highest prio ones. And as soon as that temporary extra pod during rrdeployment is gone the evicted low prio pod would fit back onto the node too.

And the last thing:
If the user buys a certain amount of resources the resources should get reserved using a resourcequota.
This way we make sure there is always room left.

I am mentioning all this because a shell script as a cronjob doesn't really feel like the right "cloud-native" way of working.
Kubernetes has so many powerful features, especially for these type of scheduling use-cases that it is a waste not to do it the correct way.

The other thing is:
A professional provider would not let its cluster fill up completely and always keep a margin. I would suggest to make some margin mandatory.
and to make capacity monitoring and management mandatory (guidelines?).

This issue should be an edge use-case as node fill-ups should not happen.
the focus of this issue should be to prevent, not to remediate in heinsight.

andy108369 · 2024-11-19T12:57:16Z

Let the user define what kind of SLA he wants on his deployments. If he is ok with a short downtime during redeployment then just set the strategy to Recreate instead of Rolling. That was rhe old pod is taken down first before spinning up a new pod. If he want high-availibility SLA then set strategy to Rolling (and replicas to 2+).

Other thing is EvictionPolicy: You can set to evict pods with a lower priorityClass whenever a higher priority pod gets scheduled or whenever the node has an eviction pressure (is full). After pods are evicted (force delete) and node is freed up, new pod schedules can take place. This way you have a disruption of the lower priority pods, not the highest prio ones. And as soon as that temporary extra pod during rrdeployment is gone the evicted low prio pod would fit back onto the node too.

And the last thing: If the user buys a certain amount of resources the resources should get reserved using a resourcequota. This way we make sure there is always room left.

I am mentioning all this because a shell script as a cronjob doesn't really feel like the right "cloud-native" way of working. Kubernetes has so many powerful features, especially for these type of scheduling use-cases that it is a waste not to do it the correct way.

The other thing is: A professional provider would not let its cluster fill up completely and always keep a margin. I would suggest to make some margin mandatory. and to make capacity monitoring and management mandatory (guidelines?).

This issue should be an edge use-case as node fill-ups should not happen. the focus of this issue should be to prevent, not to remediate in heinsight.

Thank you @sterburg for your observations, all are making total sence to me.
I've turned this into the discussion post here

arno01 changed the title ~~akash tx update response when provider is out of resources~~ akash tx update response & behavior when provider is out of resources Jan 4, 2022

andy108369 transferred this issue from akash-network/node Jan 6, 2023

andy108369 transferred this issue from akash-network/provider Mar 9, 2023

troian added the sev2 label Mar 15, 2023

andy108369 added P2 and removed P2 labels Mar 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

akash tx update response & behavior when provider is out of resources #82

akash tx update response & behavior when provider is out of resources #82

arno01 commented Jan 4, 2022

arno01 commented Jan 4, 2022

88plug commented Jan 4, 2022 •

edited

Loading

Strasser-Pablo commented Feb 20, 2022

boz commented Feb 21, 2022

Strasser-Pablo commented Feb 21, 2022

dmikey commented Aug 30, 2022

andy108369 commented Jan 6, 2023 •

edited

Loading

andy108369 commented Mar 22, 2023 •

edited

Loading

andy108369 commented Mar 22, 2023 •

edited

Loading

sterburg commented Mar 22, 2023

andy108369 commented Nov 19, 2024

akash tx update response & behavior when provider is out of resources #82

akash tx update response & behavior when provider is out of resources #82

Comments

arno01 commented Jan 4, 2022

arno01 commented Jan 4, 2022

88plug commented Jan 4, 2022 • edited Loading

Strasser-Pablo commented Feb 20, 2022

boz commented Feb 21, 2022

Strasser-Pablo commented Feb 21, 2022

dmikey commented Aug 30, 2022

andy108369 commented Jan 6, 2023 • edited Loading

andy108369 commented Mar 22, 2023 • edited Loading

akash force new replicasets workaround

andy108369 commented Mar 22, 2023 • edited Loading

sterburg commented Mar 22, 2023

andy108369 commented Nov 19, 2024

88plug commented Jan 4, 2022 •

edited

Loading

andy108369 commented Jan 6, 2023 •

edited

Loading

andy108369 commented Mar 22, 2023 •

edited

Loading

andy108369 commented Mar 22, 2023 •

edited

Loading