-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
issues #2217, #3681 - set of commits dealing with zvol_*minor*() processing #3830
Conversation
Hi @bprotopopov any update on this pull request ? I can help testing it if you rebase it on current master. |
@odoucet keep in mind, that there might be some issues lurking in these patches in their current state ( sempervictus had those in his patchstack and data integrity or corruption (?) issues) - so unless the buildbot TESTs runs pass I wouldn't use those in production or even testing |
Sure, I will rebase asap. Will go through the discussion to see the source of potential data integrity concerns (in openzfs/spl#514 I presume). |
I would be very interested in seeing an example of such failures. I've been using these patches for a while and did not observe any issues to date. Typos courtesy of my iPhone On Jan 22, 2016, at 10:13 AM, kernelOfTruth aka. kOT, Gentoo user <[email protected]mailto:[email protected]> wrote: @odoucethttps://github.com/odoucet keep in mind, that there might be some issues lurking in these patches in their current state ( sempervictus had those in his patchstack and data integrity or corruption (?) issues) - so unless the buildbot TESTs runs pass I wouldn't use those in production or even testing Reply to this email directly or view it on GitHubhttps://github.com//pull/3830#issuecomment-173946387. |
@bprotopopov there's an error message referencing one of these commits from @sempervictus at openzfs/spl#514 (comment) , openzfs/spl#514 (comment) particularly openzfs/spl#514 (comment) it might be an incompatibility issue with ABD2 patch stack - so when not combining it with the ABD2 patch stack on top of zfs master assumedly safe ? (unless the buildbots state otherwise) Sorry for making it sound like these patches eat data for breakfast ;) No blame was originally intended |
Not a problem, @kernelOfTruth :) So, this patch deals with interfacing with Linux udev - creating, removing, renaming device links for zvols and snapshots. Originally, this was done synchronously, but that approach had issues, e.g. deadlocks. Since Linux udev is already asynchronous, I thought it would make sense to propagate that asynchrony to the zvol*minor() operations, and in doing so, to simplify resolution of the locking issues. If there are issues with say not being able to dispatch some operations, then this could result at worst in missing or dangling device links for zvols or snapshots, but could not result in zvol's data integrity issues, to my knowledge. The error message you are referring to is intended to indicate that there was an operation "create minor" dispatched for a zvol in a pool, but when the processing commenced, the pool was already gone (perhaps exported or destroyed). Yet looking at the error message, I can see that the pool name in the argument seems wrong, perhaps corrupted - pool names cannot contain '@' characters to my knowledge. If there is a way to reproduce this issue, I would like to look into it further to see what is going on. Best regards. |
@bprotopopov thanks improving this. I'd like to spend some time going over it and it would be helpful if you could rebase it against master. |
bf7f92e
to
d4365f2
Compare
Rebased against tip of the master, ran quick tests (ztest). |
@behlendorf, I noticed some failures in zconfig.sh reported above, so I tried to run it, and based on the error messages I get, I conclude that the test makes some assumptions about where the build artifacts are. In my case, they are not there (I simply built/installed the RPMs - make rpm-utils rpm-kmod, yum localinstall *). Can you please explain how to build spl/zfs so that zconfig.sh knows where to find the zfs/spl modules to load ? |
Ah, never mind, @behlendorf, if I run
the test finds the modules. |
I have Centos 6.5 handy, it runs the test cleanly:
|
@bprotopopov all the tests are run by first installing the zfs packages in to the image so it's just like a real user system. Then grabbing and running a given test script from the zfsonlinux/zfs-buildbot repository, which is shown at the top of the test output in buildbot. This way you should be able to easily reproduce a failure on your system. In this case wget the
You can run some of these scripts in tree but be aware they often depend on having udev rules being installed to create devices as such. In this case it looks like maybe a slightly change in behavior is causing the issue. |
Ran the test on CentOs 6.7 - runs clean.
Not sure why it does not pass in the automated runs. |
Hi, @behlendorf
Accidentally left test 10 enabled - passed as well. I am running the tests on a VM. Could this be an issue? |
@bprotopopov the buildbot runs all the tests in VMs. Let me pull the patch and see if I can reproduce it locally. |
I was able to reproduce the issue under CentOS 7.1 It appears that despite
I'm not sure why you weren't able to reproduce this, nor have I carefully looked at the patch to determine why this happens. |
@behlendorf, yes, that seems strange. I'll look through the script to see what could be going on. |
@behlendorf, I believe I understand what might be happening here. My changes result in asynchronous scanning for zvols/snapshots to be presented with device links. Meaning, asynchronous w.r.t. completion of zpool import, for instance. So, even if one does Looking at the test_3(), when the snapdev is set first time, there are a few commands run before the test uses the links, and it works, but the second time, the test tries to use device links right after import, and that may fail if snapdev_set() dispatch is delayed. The two reasons I did this are as follows. First, I wanted to make all the minor-related operations asynchronous and to cleanly deal with the deadlock between zvol_open() and zfs_rename() (as well as other similar deadlocks). The zvol_*minor() and zvol_set_snapdev() call dmu_objset_find() which takes all sorts of locks in all sorts of orders. Second, doing such things asynchronously speeds up operations like import quite significantly if the pool has large DSL namespace and logs of zvols/snapshot all over the place. On the other hand, this does seem to have an unexpected effect on the guarantees provided by udevadm settle. [Those guarantees of course are conditional on synchronous generation of udev events by some previously run commands, as illustrated by the failure in question.] To address this, I can think of a way to implement an analog of udev trigger/settle for the zvol_*minor() and zvol_set_snapdev() operations, e.g. 'zpool settle [poolname]' that could be run before udevadm trigger/settle. The former will guarantee that all the previously posted udev event generation requests have been processed and the events generated, while the latter will guarantee that all the rules for those were processed. What do you think? |
@bprotopopov I think we definitely want to leave this as an asynchronous operation for all the reasons you mentioned. Anyone using Linux these days should already be familiar with the fact that block devices can, and do, appear asynchronously when they're good and ready. If that's all that's going on here I think it would be reasonable to update the zconfig test case it account for this possible behavior. However, I don't think that's all that going on here. It's been almost two hours now and the VM where I reproduced the issue still hasn't created those devices. On top of that it's fairly clear the kernel never registered them because they don't appear under /sys/block/. It's feels more like snapdev was somehow set to hidden when the devices should have been registered. Sorry, I should have included this in my original comment. As an aside if your need to you can modify scripts/zconfig.sh in your patch and it'll test using that version. |
Yes, two hours is definitely a problem. Is there a way I can get the dmesg or system logs ? Is the snapdev property still set on the zvol ? |
The console link in the buildbot results will have the dmesg output for that test. I'll try and take a closer look in to what's happening tomorrow since I was able to reproduce it. |
@behlendorf, I was able to reproduce the issue after I have inserted a 1 sec delay in the worker thread function, effectively delaying processing of the async requests. I am looking into the issue. |
@behlendorf, this is what I came up with so far. |
2e4fc08
to
418852a
Compare
Added Copyright lines in the comments. |
I can see build issues, but I guess this is not related to this specific commit. With some work already commited on master, can we expect this PR to be commited as well ? Do you have any deadlock remaining ? |
@odoucet yes that's the plan, hopefully after one last round of review. @bprotopopov can you do a quick rebase on to master and drop the one patch which has already been merged. |
418852a
to
54718f5
Compare
@behlendorf, done, the only conflict in the comments. I can also squash the 54718f5 with d2b67a3 if that is preferred |
I think it does make sense to squash. I only kept the commits separate for the review purposes. |
54718f5
to
e72196f
Compare
@behlendorf I have rebased, done squashing, dropped the extra patch. |
44ace55
to
b933792
Compare
Pushing to prompt the test cycle. |
@behlendorf, which branch in zfsonlinux/zfs should I be rebasing against ? Is it master or a release branch ? |
You need to rebase on master branch. |
OK, thanks, @odoucet. |
property before traversing snapshots of a dataset zfsonlinux issue openzfs#3681 - lock order inversion between zvol_open() and dsl_pool_sync()...zvol_rename_minors() Add support for asynchronous zvol minor operations. Create a per-pool zvol taskq for asynchronous zvol tasks. There are a few key design decisions to be aware of. * Each taskq must be single threaded to ensure tasks are always processed in the order in which they were dispatched. * There is a taskq per-pool in order to keep the pools independent. This way if one pool is suspended it will not impact another. * The preferred location to dispatch a zvol minor task is a sync task. In this context there is easy access to the spa_t and minimal error handling is required because the sync task must succeed. Support for asynchronous zvol minor operations address issue openzfs#3681. Signed-off-by: Brian Behlendorf <[email protected]>
…n() and dsl_pool_sync()...zvol_rename_minors() Remove trylock of spa_namespace_lock as it is no longer needed when zvol minor operations are performed in a separate context with no prior locking state; the spa_namespace_lock is no longer held when bdev->bd_mutex or zfs_state_lock might be taken in the code paths originating from the zvol minor operation callbacks.
@bprotopopov thanks for rebasing these patch. I made another pass looking over the latest patches and they look good to me. I'm going to run them through some additional manual testing but assuming I don't find anything and you don't want to make any additional change I believe they're ready to merge. |
@bprotopopov everything LGTM. I'll get these updated patches merged, thanks for all your hard work on this! |
It was nice working with you on this, @behlendorf ! |
Two issues are being addressed: improved performance of dmu_objset_find() scans called from zvol_minor() operations, and a deadlock resulting from lock order inversion resulting from zvol_minor() work occurring in the context of various callers with complex locking states (multiple locks taken in specific order).
The first issue is addressed by performing dmu_objset_find() scan in two stages:
The second issue is resolved by performing zvol_minor() operations asynchronously (taskq) in a context that is free from any pre-existing locking state. This allows for much simplified deadlock analysis and (hopefully) deadlock-free implementation. The implementation includes two commits - commit 3586fa9 with the bulk of the asynchronous work support, and commit bf7f92e that removes the trylock() of the spa_namespace_lock that is no longer needed.