-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nomad UI painfully slow when job counts goes from hundreds to thousands #14787
Comments
Hi @djenriquez, thanks for raising this — we'll take a look and update this once we have more info. |
Hey @djenriquez! Nice to meet you. We're super grateful that you raised this issue and it looks like the Nomad Community at large is also noticing this problem. We're noticing that the issue may be the result of JavaScript Promises on the For the But for the We're very excited to work with you to find the right solution and we welcome any and all feedback about how you're searching and filtering for jobs (along with any feedback about the Nomad UI). We're in the process of planning a lot great new features into the UI and we're eager to solve any big challenges or even small "papercuts" that you're experiencing. I'll be heading out on vacation soon, but I'll try my best to be responsive today and tomorrow on this issue and revisit this when I return. Looking forward to hearing from you! Life is so rich, |
Hi @ChaiWithJai, thanks so much for providing these commits. I'll go ahead see how I might be able to plug this into our current system and verify its results. It will likely be next week when I can provide results, however. |
Hey @djenriquez! I'm back in the office and wanted to circle back up with you. Were you able to try these commits out? |
Hi @ChaiWithJai I realize I dropped the ball on checking back on this issue. Are we able to reconvene? |
Greetings! Is there any update to the fix? The UI is slowing down to a halt whenever there are more than thousand jobs(including dead jobs) in the cluster. |
Looks like theres a PR: #14989, looking to test this out against |
Dropping a note to say that this is something we intend to prioritize soon; see #14989 (comment) for a little more context. |
Hi there, is there an update on the fix yet or expected version for the fix? Thanks! |
@jhyx2022 Serendipitous timing! We've been developing a new endpoint to complement /jobs that will should make things a lot snappier. You can follow along with a few of the issues: These should have the effect of a more limited initial pull of jobs on the main index in the UI. There'll still be the ability to paginate, search, and filter your list down, but those functions will no longer be front-end dependent. |
Great news, appreciate the update! |
Thanks to everyone for your patience on this issue. Pleased to say that #20452 is now merged and will be releasing in the upcoming Nomad 1.8. Among other things, it handles pagination for the jobs index and doesn't overload itself with child jobs that eat up memory at index level. I hope that this makes the overall experience of using the web UI much smoother! |
I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues. |
Nomad version
Output from
nomad version
Nomad v1.3.3 (428b2cd8014c48ee9eae23f02712b7219da16d30)
Operating system and Environment details
Amazon Linux release 2 (Karoo)
Issue
We have a particular use case where Nomad is used to orchestrate full sandboxes for our developers in our development environment. These sandboxes represent our complete stack of services, which means ~100 jobs, including periodic batch jobs.
The higher the number of total jobs, the slower the Nomad UI becomes. Initially, we thought this might be an issue with the actual Nomad servers handling the sheer amount of work, but thats is not the case. Nomad's core is able to handle, at one point, over 10,000 jobs /w ~maybe 50,000 allocations just fine. RPC calls through its API were responsive and the metrics we track showed no struggle whatsoever.
However, the UI was a different story, as it would sit on the Nomad loader graphic for a period of time that seemed to grow linearly with the amount of jobs being run. Interestingly the API requests the UI made to the Nomad servers were responsive, according to chrome dev tools, providing supporting evidence that the backend is not the issue.
Also, when looking at the waterfall chart from chrome dev tools, we see a call to
/v1/namespaces?index=1
that eventually is canceled by the browser. Not sure if this request is misleading, but the page renders once that request pops up in the network analyzer, so it seems there is some blocking call at that part of the flow.Reproduction steps
Spin up atleast 1000 jobs /w ~3000 allocations then navigate to the UI.
Expected Result
UI load time grows proportionately with the API response time for requests made to the Nomad server.
Actual Result
UI load time degrades as more jobs and allocations are running on the Nomad cluster while the API responds performantly.
We're open to scheduling a remote session if that makes it easier to see the issue.
The text was updated successfully, but these errors were encountered: