-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nomad Client generates lots of IOPS when idle, saturating HDD #9047
Comments
Update: just stumbled on the logic in |
Hi @ashtuchkin just wanted to make sure you knew we saw this one and someone will look into it. |
I appreciate it, thank you!
…On Mon, Oct 12, 2020, 11:49 Tim Gross ***@***.***> wrote:
Hi @ashtuchkin <https://github.com/ashtuchkin> just wanted to make sure
you knew we saw this one and someone will look into it.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#9047 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAEZKHK635QE3ZPFKXGOPGTSKMQPJANCNFSM4SIHUSDA>
.
|
I've added a PR that fixes this issue in our environment. We'll use my fork with this change for now to unblock the tests, but I'd be happy to go back to using upstream when this issue is fixed (whether by merging the PR or not). |
Fixes hashicorp#9047, see problem details there. As a solution, we use BoltDB's 'Batch' mode that combines multiple parallel writes into small number of transactions. See https://github.com/boltdb/bolt#batch-read-write-transactions for more information.
Fixes #9047, see problem details there. As a solution, we use BoltDB's 'Batch' mode that combines multiple parallel writes into small number of transactions. See https://github.com/boltdb/bolt#batch-read-write-transactions for more information.
…hicorp#9093) Fixes hashicorp#9047, see problem details there. As a solution, we use BoltDB's 'Batch' mode that combines multiple parallel writes into small number of transactions. See https://github.com/boltdb/bolt#batch-read-write-transactions for more information.
I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues. |
Nomad version
Nomad v0.12.1 (14a6893)
Operating system and Environment details
Ubuntu 16.04, Linux kernel 4.15.0-70
3 on-premise servers, each having 64Gb RAM, 32 cores CPU, 1Tb HDD.
Issue
I'm testing batch job scheduling on my small cluster and noticed that after running 1-2 thousand allocations, the Nomad Clients slow down to a crawl: 5-10 minutes to start small jobs, even if there's no other jobs running. Restarting the clients doesn't help. The problem disappears only after I issue
nomad system gc
command.After a short investigation, I noticed that almost all the time is spent by the Nomad Client continuously writing to its state file (
<data root>/client/state.db
), fully saturating disk iops, which slows down everything else on the host. Note, this is with an idle cluster - no jobs are running at that point.A deeper investigation (golang profiling) took me to
Client.saveState()
function. It's called every 60 seconds byClient.periodicSnapshot()
and callsPersistState()
for all allocations tracked by the client. EachPersistState()
internally issues a BoltDB transaction to unconditionally write current state of the allocation to the state file, ending with a mandatory fsync(). This results is a very high number of IOPS, especially for an HDD (which unfortunately I have to use), saturating disk access for the whole host.As an example, an average HDD tops out at maybe ~120 IOPS. If we assume 2 IO operations per
PersistState()
transaction, then the disk i/o would be at 100% at ~3600 allocations. In my tests it's even less than that - around 2000. Again, this is in idle state - nothing is actually running at that time on the cluster.In my use case each batch job can have 1000+ allocations, easily surpassing the numbers above. After this happens, the cluster becomes unresponsive and it's pretty hard to fix it. The only way I found is issuing a
nomad system gc
command.So I have a couple of questions:
Note, I think garbage collection of allocations would help the situation somewhat, but if jobs are executed in an unpredictable manner, it would be very hard to tune the gc threshold period while still keeping the logs for a bit to help debug things.
Separately, minor thing, I noticed that there's an attempt at parallelizing the persistence in
Client.saveState()
function by running a goroutine per allocation. AFAIK it doesn't help at all, because write transactions in BoltDB are serialized using locks and each goroutine have to wait for its order anyway, making the whole thing sequential. Not a big issue, just muddies the code a little.Reproduction steps
echo "hello world"
batch job with job.group.count=3000.Job file
Any simple job should do. I'm using docker with something like this:
Nomad Client logs (if appropriate)
I have some logs and pprof traces, but not sure how helpful they would be. Let me know if you need them.
The text was updated successfully, but these errors were encountered: