-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
" zfs: allocating allocated segment" with logbias=throughput #541
Comments
Indeed, that's for reporting this. We'll either disable logbias=throughput for now or if the issue ends up being pretty straight forward just fix it. |
This bug recently began happening for me on a computer running Ubuntu 11.10 and ZoL 0.6.0-rc7. Any action that loads The root pool is more than a year old and is usually above 90% utilization. The panic doesn't instantly happen on a new pool like it does on the older installation. Running
|
Google turned up quite a few very similar issues for both FreeBSD and Illumos. But it seems there is at least still one issue lurking. http://unix.derkeiler.com/Mailing-Lists/FreeBSD/current/2009-02/msg00346.html |
I've seen a variety of bug reports now which suggest that there is a lingering issue (in all the ZFS implementations) where the space map can be updated incorrectly when a file system is filed to capacity. |
The
|
I'm no longer able to reproduce this issue on a file-backed zpool with a recent ZFS version. |
Nevermind, reproduced the issue. It's still a problem. |
We need to wait for pending I/O to finish before trying to clear the dirty record's override state. This problem occurs when multiple threads call dbuf_dirty() for the same dbuf object while it is currently being synced via dmu_sync() but hasn't reached dmu_sync_done() yet: zfs: allocating allocated segment(offset=323629568 size=78336) SPLError: 2283:0:(spl-err.c:48:vpanic()) SPL PANIC Issue openzfs#541
I've been able to reproduce this issue reliably using the following program (which pretty much does what aptitude does):
And here's the stacktrace I get:
By adding some printk()s I've been able to backtrace this problem to two space_map_add() calls:
So apparently we're trying to add the same byte range to the free space map twice. Those two zios come from spa_free_sync_cb():
And even more tracing:
So something in dbuf_dirty()/dbuf_unoverride() is trying to free the same dva twice. At that point I ran out of ideas and turned on --enable-debug which made things a lot easier:
It looks like dbuf_dirty() is being called for a dbuf object that is currently being synced to disk by dmu_sync() but hasn't quite finished yet (i.e. dmu_sync_done()). The patch at 86d68d3 appears to fix the problem for me. However I'm not entirely sure whether dropping db->db_mtx at that spot is safe. I think it is but I might need a second opinion on that. :) |
@gunnarbeutner Nice job running this one to ground! But I think the right fix is to handle this up in zfs_putpage() by preventing the race is the first place. This is how the issue is handled in Illumos and it was only a problem on Linux because I accidentally dropped these locks when rewriting the function for the Linux VFS. Can you please review and test the patch, d1532291ad77b6658d01b7df76030e8b8c8d51af . Without the patch applied I was able to hit the issue with your test case, with it applied I can't produce the crash anymore. |
The patch looks good. I've been running my test program in a loop for about an hour now and haven't seen any crashes. aptitude appears to be working as well. |
In my automated testing I've noticed the patch hurts mmap'ed I/O performance more than I'd like. We'll need to think about this a little more before merging it, we may want to move the locking up to the broader zpl_writepages() function. |
OK, I was able to investigate this further. The patch actually doesn't significantly impact performance, but what it does do is expose an unrelated condition variable issue which results in a hung task. Before I merge this change I want to run down this second issue. |
Gunnar Beutner did all the hard work on this one by correctly identifying that this issue is a race between dmu_sync() and dbuf_dirty(). Now in all cases the caller is responsible for preventing this race by making sure the zfs_range_lock() is held when dirtying a buffer which may be referenced in a log record. The mmap case which relies on zfs_putpage() was not taking the range lock. This code was accidentally dropped when the function was rewritten for the Linux VFS. This patch adds the required range locking to zfs_putpage(). It also adds the missing ZFS_ENTER()/ZFS_EXIT() macros which aren't strictly required due to the VFS holding a reference. However, this makes the code more consistent with the upsteam code and there's no harm in being extra careful here. Original-patch-by: Gunnar Beutner <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue openzfs#541
It turns out the condition variable hang was accidentally introduced by the previous version of this patch. It accidentally introduced a lock inversion between the zfs range lock and the VMs page lock. The updated patch which I just merged doesn't contain this flaw because the range lock is now acquired after the page lock is dropped. |
Gunnar Beutner did all the hard work on this one by correctly identifying that this issue is a race between dmu_sync() and dbuf_dirty(). Now in all cases the caller is responsible for preventing this race by making sure the zfs_range_lock() is held when dirtying a buffer which may be referenced in a log record. The mmap case which relies on zfs_putpage() was not taking the range lock. This code was accidentally dropped when the function was rewritten for the Linux VFS. This patch adds the required range locking to zfs_putpage(). It also adds the missing ZFS_ENTER()/ZFS_EXIT() macros which aren't strictly required due to the VFS holding a reference. However, this makes the code more consistent with the upsteam code and there's no harm in being extra careful here. Original-patch-by: Gunnar Beutner <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes openzfs#541
Steps to reproduce:
and IO on every pool halts.
After reboot, zdb -b reports leaked space.
resetting logbias to latency solves the problem.
zfs version: ff998d8
I've been able to reproduce the issue on two different machines, on 4 different pools.
I think that it is better to temporarily (until problem gets really fixed) not honour logbias=throughput setting if integrity of space maps is at risk. (and machine needs to be rebooted after that)
The text was updated successfully, but these errors were encountered: