-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
akash tx update response & behavior when provider is out of resources #82
Comments
I am that user - Filled up a providers machines with some xmrig deployments to over 90% fill rate - had 2 of them crash, went to re-deploy a new image using the update button in Akashlytics and was unable to
|
As a provider, I have the same situation with the same user deploying 2 deployement and beeing unable to update the deployement. A new replica set is created but because of lack of ressource the new replica set never come online. Note that there is no disruption of service as the old replica set stay up. |
Thanks all for the report. Interesting case. Few thoughts:
All of them have drawbacks, of course. In the meantime, for mining and other stateless workloads, I suggest closing the deployment and creating a new one if you hit the described scenario. |
I think having a default option which use --force is a good solution. I think the majority of deployment can be forced. For the rare case where high availability is needed a special option for that could be used but with the understanding that update may be more difficult. Currently as a provider when I see a deployment stuck because of that, generally miner I just force delete the old replica set and make sure the deployment work again. |
I've hit this today, bug still in place. |
This is still happening with provider-services 0.1.0, akash 0.20.0.
|
akash force new replicasets workaround
|
Workaround has been added to the docs https://akash.network/docs/providers/provider-faq-and-guide/#force-new-replicaset-workaround by @chainzero |
Let the user define what kind of SLA he wants on his deployments. Other thing is EvictionPolicy: And the last thing: I am mentioning all this because a shell script as a cronjob doesn't really feel like the right "cloud-native" way of working. The other thing is: This issue should be an edge use-case as node fill-ups should not happen. |
Thank you @sterburg for your observations, all are making total sence to me. |
When a user updates his deployment, he may get the following, confusing him, message:
This is happening because K8s won't destroy an old pod instance until it ensures the new one has been created.
Since there is no available node for deploying the new pod, it gets stuck in "Running" & "Pending" state.
Things will move on as soon as one of the nodes gets enough CPU, RAM & disk requested by the deployment.
This is how K8s is working in order to prevent the service outage, however the user might want to get a better message. OR, alternatively, a user could be granted an option such as
--force
which would destroy the previously running Pod, i.e. that would probably be similar todestroy
&recreate
method.The text was updated successfully, but these errors were encountered: