-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Large KMem Allocation When Using ZVOL on RAIDZ2 atop an LSI 1078 (MegaRAID 8888ELP) #3684
Comments
This is starting to look pretty grave. Happens on master, on DeHackEd's bleeding edge 2 branch, and all of our builds since start of July. Any chance that this could be caused by the pool consisting of a set of RAID0 individual disk volumes comprising the RAIDZ2? The MegaRAID 8888ELP backing this setup doesnt actually do JBOD far as i can tell (and LSI's download sites being gone isn't helping matters, at least SMC has some firmware). I've run this with and without @rdolbeau's SSE calculation patch in the stacks, same result. The pool layout is a RAIDZ2 at ashift=9 of 10 2T Constellation ES drives on the aforementioned controller, with each disk presented as a RAID0 volume. ZVOLs in use have been destroyed and recreated at every ZFS rebuild (patch-stack/version deployment). I'm going to try an older build, but i am somewhat limited in the number of ZFS deployments i can do here since the system runs off a thumb drive which will eventually die from the IO abuse of building DKMS modules (we've seen them run for >1y when they do quarterly deployments, and much less under these conditions). The system in question is backing an OpenStack Glance storage and Horizon node atop a VM. It kills the fuel deployment in an ugly way, registering the kernel stack trace in dmesg of the physical host, but stalling the VM actually being provisioned for services (after the OS image is pushed) without killing the deployment process. The disks are all new, and have been run through smartctl long tests in xyratex chassis (our primary storage systems, with direct SAS/SATA access in JBOD mode), so i dont think they're the culprit. Anyone wanna weigh in? I'd love to hear someone tell me this controller is garbage and i should have my head examined for even trying this through RAID0 abstractions, but we've done this on other systems where we couldnt get clients to purchase a real HBA off the bat, and its never been this bad before. |
It gets more interesting by the minute: the pool now refuses to import altogether throwing the following in dmesg and hanging zpool import:
The "unable to handle kernel paging request" bit seems interesting... |
@sempervictus do you determine this was in fact related to #3651? |
Testing that host is a bit of a problem right now - i've had to switch the system back to running the native RAID6 for the time being, and its booting directly from PXE through Fuel to test OpenStack. I've built out an identical patch stack sans #3651 and am testing it presently on a system which only has mirrored VDEVs (a 5 mirror span). However, under significant IO load, it has shown none of the symptoms described above. |
Think I can reproduce this as well 4.0.9 kernel with zfs/spl master as of 03/09/2015 (d/m/y). oddly, also running openstack... raidz3 on individual luks disks |
When support for large blocks was added DMU_MAX_ACCESS was increased to allow for blocks of up to 16M to fix in a transaction handle. This had the side effect of increasing the max_hw_sectors_kb for volumes, which are scaled off DMU_MAX_ACCESS, to 64M from 10M. This is an issue for volumes which by default use an 8K block size because it results in dmu_buf_hold_array_by_dnode() allocating a large array for the dbufs. The solution is to restore the maximum size to ~10M this patch specifically changes it to 16M which is close enough. Signed-off-by: Brian Behlendorf <[email protected]> Issue openzfs#3684
@sempervictus @prometheanfire could you please verify the patch in #3710 resolves the issue. |
When support for large blocks was added DMU_MAX_ACCESS was increased to allow for blocks of up to 16M to fit in a transaction handle. This had the side effect of increasing the max_hw_sectors_kb for volumes, which are scaled off DMU_MAX_ACCESS, to 64M from 10M. This is an issue for volumes which by default use an 8K block size because it results in dmu_buf_hold_array_by_dnode() allocating a 64K array for the dbufs. The solution is to restore the maximum size to ~10M. This patch specifically changes it to 16M which is close enough. Signed-off-by: Brian Behlendorf <[email protected]> Issue openzfs#3684
done |
When support for large blocks was added DMU_MAX_ACCESS was increased to allow for blocks of up to 16M to fit in a transaction handle. This had the side effect of increasing the max_hw_sectors_kb for volumes, which are scaled off DMU_MAX_ACCESS, to 64M from 10M. This is an issue for volumes which by default use an 8K block size because it results in dmu_buf_hold_array_by_dnode() allocating a 64K array for the dbufs. The solution is to restore the maximum size to ~10M. This patch specifically changes it to 16M which is close enough. Signed-off-by: Brian Behlendorf <[email protected]> Closes openzfs#3684
Doing a bit of testing on a new storage system, and seem to be able to consistently reproduce a large allocation notice in dmesg:
The patch stack in question starts with DeHackEd's bleedingedge2 and consists of:
Figure this may be of interest to anyone running a similar stack of maintaining the included changes. System seems to run just fine, though i have observed ARC dip to almost zero since its on a test pool with no data (rebounds fine with data added).
The text was updated successfully, but these errors were encountered: