-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pool Import/Export extremely slow #12693
Comments
Knowing where and which task is hung could be informative. It only taking a few minutes on readonly import is informative, but not exhaustively. What does |
Took this when it got stuck. sdn3, sdl3, and sdm3 are the missing cache devices.
|
Interesting, after it successfully imported, a simple Here is the complete log: Dmesg showing the stack traces for the hung tasks: One more thing: I tried kicking off a manual |
it may take a bit, but does It seems like it's trying to process a whole lot of async_destroy work that was queued up, and your pool isn't handling the load well. You could make it stop making any progress by twiddling the |
I've attached my It does look like it might be 'freeing'-related, since |
I mean, yeah, it was logging things about freeing things in dbgmsg. |
Okay, the pool seems to have recovered after chewing through the frees. The only thing I can think of that would have rapidly generated a bunch of space that needed to be freed would have been when I was experimenting with some VM scripts and thus was quickly creating and destroying lots of non-sparse 100GB zvols. However, only ~5GB would have actually been used by them. |
Ouch, the system that this one replicates to also seems to be having the issue now. Same symptoms, though queue size seems to at least be 10 this time rather than 1 (it is a single disk). I decided to try the suggestion mentioned in #12693 of setting zfs_free_min_time_ms to zero - this doesn't seem to have done anything. |
That's curious - it definitely helped pretty immediately for me, but I also already had the pool imported, and was blocking on commands when trying to initially issue the destroys. That said, there are a number of things a pool can think it has to block and do synchronously on import - a block device flamegraph might be informative? What does dbgmsg say on the affected system? |
Sadly, I already rebooted it so I don't have the dbgmsg (same for zpool events), but I did see the same "hung task" warnings in dmesg. |
Well, if it does the same thing on import each time, it's simple enough to recreate, I'd imagine, when you're okay with doing so. If it doesn't, that just raises further questions. I'd be curious to know what |
This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions. |
The zfs-2.1.6 release includes some significant work to improve pool import times (PR #12789). This may resolve the problem you were seeing. |
System information
I have tried this with both Debian 11 and a FreeBSD 13 LiveCD. Debian is installed on the system, while the FreeBSD LiveCD is an attempt at fixing it.
Describe the problem you're observing
When I
zpool import tank3
(no -F or other recovery options specified), it takes about an hour. At first, there is a reasonable amount of disk activity (~100MB/s per drive). After a while, it drops to very low numbers (75-200kb/s), though there is still some activity:Judging by the suspiciously low queue depths and low
b%
, perhaps it's trying to do one operation at a time, synchronously?Sometimes the speed goes up (note the normal-looking queue lengths and high b%), but falls back down:
Finally, after about an hour, I see write activity for the drive, and the import finishes:
Then, I try to
zpool export tank3
, thinking that's the end of it, and we see the same thing - qlen of 1 on a single drive, 0 on the others, and <200KB/s:Eventually gets past that point and goes back to normal IO rates, but then tells me that it can't export because the pool is busy.
The pool is 2x4TB mirror + 2x8TB mirror. There were also some cache drives, but I removed those to try to isolate the problem. As mentioned, I tried it both with the Debian that is actually installed on the server, as well as a FreeBSD live CD to isolate software issues. I also tried switching from the LSI SAS card I was using to onboard SATA, same results. I tried importing with only one drive from each mirror present, but I got the same results, thus ruling out a single faulty drive. Also, if I
dd
from the drives while the import is still going, it transfers at a decent rate (~40MB/s), so it doesn't seem to be a drive issue.If I do
zpool import readonly=on tank3
, it imports the pool after just a few minutes. Still longer than a normal pool would take, but at least the data is recoverable.Describe how to reproduce the problem
Include any warning/errors/backtraces from the system logs
Unsure of how I got to this point - I had an unclean shutdown due to a power outage, however, even before that, I noticed the pool getting a little slow.
On FreeBSD, there is nothing suspect in
dmesg
nor/var/log/messages
. On Linux, the behavior is mostly the same, though I also get "hung task" warnings indmesg
.The text was updated successfully, but these errors were encountered: