-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Client Mount Rate #186
Comments
The nnf-sos code is creating each ClientMount resource in a separate go thread, so they should be done in parallel at some level: There might be something in the k8s client library that's serializing things underneath our controller, though. |
I'll see what we can do here. We might be able to open multiple client connections to the Server, or send the create request from multiple worker nodes, or something else. 20-25 creates/second is too slow. |
|
I think that the first issue to solve here is the client-side throttling. There are QPS and burst settings that are configured on the controllers, and that's why we're only seeing 20-25 creates per second. On our internal system, I'm seeing the same speed. I bumped QPS from 20 (default) to 500, and burst from 30 (default) to 1000. That gave me 300 creates per second when creating 300 clientmounts. I'll put out a change to expose some environment variables that will let us change those values so we can tune it. |
The environment variables are available in master now: NearNodeFlash/nnf-sos@7cd399d |
@behlendorf Do the environment variables solve this issue? |
@ajfloeder we'll need to retest this. I don't believe we've done any similar scale testing since this was merged. |
When performing an allocation involving a large number of compute nodes the workflow can spend the majority of its time in the "Setup" phase mounting clients. Based on contents of the nnf-controller-manager logs it looks like the mounts are requested sequentially. And according to the timing information in the log this happens at a rate of between 20-25 mounts/second.
Could this be sped up by issuing the requests asynchronously. The kube-apiserver seems like it's probably not the limiting factor and it could handle the increased load.
The text was updated successfully, but these errors were encountered: