-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve mysql write performance and stability when offloading #13290
Comments
If it's slow on inserts, I don't think an index would improve that performance. Otherwise it would be good to know what slow queries you see on what version of Argo
We might be able to do this for
There's an existing feature request for blob storage: #4162. But both might suffice for status offloading 🤔
We might want to add configurable workers for this; i.e. |
I add following index, workflows that Error caused by alter table argo_workflows add index `argo_workflows_i2` (`clustername`,`uid`, `version`, `updatedat`); |
Hmm looks like the Would you like to submit a PR to add them for offloaded Workflows? |
Yea that might make more sense since a deletion has to do a lookup. That specific deletion may be removed in #13286
This would still be useful -- your Server and Controller logs should show any slow queries |
When there are lots of slow queries, that delete may cause deadlock or lock wait timeout, I think #13286 will help, if this pr is merged, I think it is not necessary of above secondary index. But if we remove that delete code, is there a situation that table records increase continuously due to periodic gc speed is slower then insert? |
@agilgur5 Anton, I opened a pull request, add some code to compress nodes for both offloading and archiving, can you help review this code when you you have time? |
@imliuda when providing logs, please use text instead of images, as text is much more accessible. Otherwise that's a great data point, I'm surprised
Mmm the index would still make the periodic GC deletes faster
Technically, I think that's always been possible. I mentioned above that we could add Oh, this is for offloading though. We might be able to add something similar but I'm actually not familiar with the code for offloading's periodic GC if it works the same way as all other Controller GC (which generally have a configurable number of workers) |
For this proposal, I see it was mentioned My company is facing similar issues as mentioned here, where we run super large-scale batch workflows which consume more than 50% of etcd under load. Having a performant alternative to etcd as Argo's stateful backing store would greatly benefit us. We also have an organizational aversion to relational databases, so I'm wondering if we've considered things like S3 or DynamoDB for offloading? These are both extremely performant over a network and might be easier to manage than a relational DB? |
Summary
We enabled node status offload and workflows archiving, and we have observed some performance and stability issues.
Use Cases
When running a large cluster, kube-apiserver may be the performance bottleneck, we need offload and archive to improve kube-apiserver performance. currently, it needs some optimisation and improvement to make it stable in a production eviroment.
Some thoughts
The text was updated successfully, but these errors were encountered: