-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Temporary system-wide lockups with ZSTD when writing to dataset. #13409
Comments
@jmb6 Someone other than me might know if your CPU can make use of all the hardware acceleration available for compression. But how about posting some top, iotop ie metrics on the system during that time writing to dataset when the freezes happen etc? |
Yeah, zstd is not ideal for interactive responsiveness at higher {compression, recordsize} levels. A few people have floated proposals to make this less bad, none of them have been merged yet. (I'd be curious if mine was of any use to you, for instance, but that only helps if the data is incompressible - compressible still costs you uninterrupted CPU time.) As far as CPU-specific optimizations, other than some strategic compilation hints in the code, the only thing it can currently use is the BMI2 instructions, which Haswell just barely would support. (Newer zstd revisions can do one or two things with other instructions in some strategic places, but when I benchmarked them for ZFS, it didn't seem to be much if any of a gain. I can go wire up my Haswell system later and see if that's any different, though.) |
@rincebrain Thanks for the efforts on that, looks like it was quite a bit of work with all that testing and is definitely something I'd want to try especially if using a "desktop" system with zstd, would be really great if @jmb6 can try your patch and report back : ) |
Even if it works for incompressible data, actually compressible data isn't going to be any better. Unfortunately, I don't have a great answer for that at the moment, though I've some ideas. @adamdmoss has a patch one could try for it - I believe #11709, though I haven't tested it recently. |
Sorry for the delay. Update: I just did a system update (ZFS didn't update though, neither did the kernel), and the problem seems to be less severe, but still there. Or at least I'm having trouble reproducing the problem to the severity I experienced it when I opened this issue. I can still get it to freeze for a moment if I execute a
(h)top shows that either iotop shows disk write activity is between 40-90mbps during the write (no other processes are writing on the system) and jumps up to 100-200mbps when the freezes happen.
I am using the default level though (zstd-3). I am using a recordsize of 1M but reducing it to 128K didn't seem to have much of an effect, I still got the freezes. It was my impression though that zstd was supposed to be really good at realtime compression. I just tested gzip, the freezing is worse with gzip. That's to be expected I guess, so it indicates it does indeed seem to be a CPU usage issue. But I would have thought that the kernel would preempt ZFS, so even so I'm not sure why there would be freezes even with high CPU usage. I was also able to get a very short freeze with lz4 and a well timed So it seems that the problem isn't actually zstd-specific, I just only noticed it first with zstd. Anyway I did a benchmark with the
|
Shockingly, if you force the system to stop everything it's doing and flush IO, you get increased latency while it does that. The problem, I believe, is that the writers doing the compression basically don't get preempted while running, so larger blocks or more expensive compression makes it markedly more exciting. The patch from @adamdmoss adds forced reschedules, IIRC, but that just makes it less bad, not fixes it. I have some experiments I need to get back to about having different binning of IO thread priority (beyond what we already have) so that you don't end up stalling everyone with this, but haven't yet. |
It once did, but I removed those parts. I learned to relax and love the PREEMPT kernels; haven't looked back since. |
Heh, FWIW I run with local patches which do similar things, coarsely. It's a good idea IMHO for other reasons, but only mildly helps responsiveness, the problem being that although it may make a compression task less likely to be scheduled when other stuff is going on, once it's scheduled you'll still get that kerTHUNK of uninterruptability. |
I'm not very familiar with Linux internals but can the kernel not preempt itself at almost any point like it can to userspace processes?
I'm running a PREEMPT kernel too. |
I believe the common modes are "more or less voluntary-only preemption" and "no, really, we might preempt you for any reason at any time outside preempt_disable blocks". |
Is there a reason why ZFS compression code runs in the former mode? |
I should be more clear - I believe those are the two modes for the kernel, not ones you can swap at runtime. The compression code wouldn't randomly schedule a preemption because A) it's mostly stock code that has no idea it would need to care, and B) you would wreck performance at the high end by unconditionally rescheduling every op. I had a few ideas in writing this reply though, maybe I'll try them... |
So if I understand correctly, the preemption mode is a compile time option. My kernel has preemption at any point enabled:
So why is the ZFS compressor not being preempted on my system? Is the compressor wrapped in a preempt_disable block or is this a bug? |
It's baffling to me why you're not getting good results with a PREEMPT kernel. Unless it's a Zen / 5.17 thing, though I doubt it. I think 5.17 has runtime-switchable preemption support via kernel self-patching, and I guess it's possible that it's not currently switched on for you, but I can't find any clear documentation for this. Alternatively, stock ZFS may run compression tasks at high priority (I'm looking at the code but I can't immediately remember if higher numbers are higher or lower priority 🤷) and the scheduler is just deciding not to interrupt it even though it could; you could try setting the spl module parameter spl_taskq_thread_priority to 0 to leave all task queue priorities as default. That could be interesting... |
Lower numbers win on Linux, I believe. Note that that tunable won't adjust existing thread priorities. |
I tried it out and it looks like it completely solved it. Even if I set the compression mode to gzip or zstd-9 there are no lockups at all, completely smooth. So yeah, looks like the kernel was deciding not to preempt the thread. I checked what the priority was before I set spl_taskq_thread_priority and z_wr_iss_h had a niceness of -20, which is the highest priority, and z_wr_iss was at -19. Could leaving that spl parameter set to 0 on a desktop system cause other problems? I just tried stock non-zen kernel as well but that didn't solve it. Only setting that parameter solves it. |
System information
ZFS installed with the
zfs-linux-zen
package from the archzfs repository.RAM: 20GB no ECC
CPU: i5-4440
No hypervisor.
zfs.zfs_arc_max=4294967296 zfs.zfs_arc_min=1073741824 zfs.l2arc_trim_ahead=1
Describe the problem you're observing
When
compression=zstd
is on, writing to the dataset causes the whole system, including cursor, to freeze for a short moment inconsistently about every 5 or so seconds while writing to the dataset.Describe how to reproduce the problem
Create a dataset with
compression=zstd recordsize=1M atime=off xattr=sa
and copy a large amount of data to it.The attributes after
compression=zstd
are what are also set on the dataset, I'm not sure if they contribute to the issue.The system should temporarily lock up multiple times until the write finishes. Executing a
sync
immediately after also helps to cause a temporary lock up.Include any warning/errors/backtraces from the system logs
dmesg shows no problems or anything relating to ZFS.
The text was updated successfully, but these errors were encountered: