-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Linux v5.13 broke zvol_first_open lock retry logic, so open() on a zvol dev node now sometimes fails with errno=ERESTARTSYS in userspace (particularly noticeable in udev) #12301
Comments
I've now also tried the latest zfs git master version (2.1.99-335_g4694131a0), still on kernel 5.13, and the |
I've created a branch that patches It's based upon some other cleanup patches I submitted in PR #12302. https://github.com/jgottula/zfs/tree/zvol_id-retry-on-ERESTARTSYS |
I was just looking back over the internal Linux kernel errno.h header, and figured it would be a good idea to highlight some of what's written in the comments there. https://github.com/torvalds/linux/blob/v5.13/include/linux/errno.h#L8-L14 /*
* These should never be seen by user programs. To return one of ERESTART*
* codes, signal_pending() MUST be set. Note that ptrace can observe these
* at syscall exit tracing, but they will never be left for the debugged user
* process to see.
*/
#define ERESTARTSYS 512 After re-reading this, I'm thinking that I don't remember anything like Perhaps in Linux 5.12 (or thereabouts) something about how the |
@behlendorf I think I've found the smoking gun here. Back in one of the issues I referenced in the top post (#612, from way back in 2012), discussion surrounded how to solve a lock inversion deadlock that manifested in cases like trying to use zvols as vdevs and other interesting situations. (One also imagines that that very same lock contention situation could arise in an occasional pseudorandom manner, when multiple processes try to do approximately a million zvol-related system calls all at the same time... such as when systemd-udevd spawns an army of workers at the exact moment that a zvol dev node pops into existence, for example.) The solution to the problem at the time, which was implemented in 65d5608, was to take advantage of a mechanism provided by Here's an excerpt from that commit message:
Okay so here's the problem: the From what I can gather from code comments and commit messages, it seems that the only reason So, with v5.9-rc1, we start seeing some refactoring of the restart logic in Next, with v5.11-rc1, we see much more substantial refactoring of Finally, with v5.13-rc1, we arrive at this commit, literally with title "block: remove the -ERESTARTSYS handling in blkdev_get_by_dev" and description "Now that md has been cleaned up we can get rid of this hack", whose code changes consist solely of just ripping the So this clearly indicates that for Linux v5.13 and above, the And that actually fits nicely with the symptoms of I took a brief look at the Linux stable tree, and it doesn't look like that third patch has been backported to anything pre-v5.13 at all. So that's nice at least.
I'm going to go back and look through some of my system logs and see if I can confirm whether I was in fact seeing udev errors symptomatic of this problem back when I was on kernel v5.12.2. I may even rollback my kernel and zfs packages to v5.12.2 and do a quick boot with that, to see if I can reproduce the behavior with those existing builds that I still have laying around. (2021-07-05: I had a convenient chance to try this today, as I managed to OOM and hardlock my kernel this afternoon by being an idiot, giving me an "opportunity" to "reboot". Reverted kernel back to 5.12.2; cannot reproduce. So it's pretty much 100% guaranteed to be a v5.13+ only thing. Which frankly fits what the code was showing pretty well anyway. I did check back on pre-5.13 boots and confirmed that I was having udev symlink issues at that time; but they must have been due to something unrelated.) In any case, the finding here appears to be that #612 has effectively been un-fixed by recent kernels (definitely v5.13+, and maybe even v5.11+, depending on what I find); and that some new way of resolving that lock ordering situation needs to be found. |
@jgottula good find, yes that would entirely explain it. We'll have to do some investigation and see if there's some other way we can restructure the locking to avoid the original deadlock. |
Given some of the original scenarios involved with the lock order involved zpool-on-zvol setups, I wonder if this means that those have been rendered non-working once again. (On either an "always" or perhaps "sporadic" basis... I haven't bothered to look in-depth at the logic there to know where it would land on the never~sometimes~always spectrum.) I should also check if there are any existing tests in the test suite that do the using-zvol-as-a-vdev thing, and (if such do exist) whether they still pass. And if there aren't any tests for that, then there probably ought to be. |
This branch is still around in case anyone wants to mess around with it. Rebased onto master just now. |
Here are some more commit references, for convenience's sake. This is the commit from way back in Jan 2009 that fixed the original And then these commits (one, two) make up the v5.13-era fix to |
This configuration has actually always been a little bit dodgy on all the platforms and really is discouraged. The test suite does use this kind of configuration in a few places (since it was convenient), but the newest kernel we've been testing with is 5.12.13 so we may just not have seen this yet. I'd expect this to manifest itself as a test case failure now, and not a deadlock, since ZFS will be returning Thanks for the handy references, if you're so inclined to dig in this more let me know. Unfortunately, it might be a while before anyone has the time to really dig in to this. |
I found another thing that's likely to be rather relevant here. The commit that ripped the Here's the original 15-part patch set: move bd_mutex to the gendisk. Patch 01 seems to have been dropped, as its changes never showed up in Linus's master branch at any point. (And that patch didn't make it into v2 of the patch set either; see below.) Patches 02 thru 07 were merged to Linus's tree during the v5.13 merge window and made it into the v5.13 release. Patches 08 thru 15 were held back for later, however. The patches that were merged here were all in So anyway, later on, we have a v2 patch set basically just containing not-merged-in-v5.13 patches that were previously numbered 08 thru 15, rebased for v5.14: move bd_mutex to the gendisk v2. All eight of the patches in this set were merged into Linus's tree during the (still ongoing) v5.14 merge window. And among other tidbits, it does implement the titular replacement of (There was also sort of a "pre-v2" patch set, move bd_mutex to the gendisk (resend), that was posted to the linux-block list between v5.13-rc1 and v5.13-rc2. But that one didn't go anywhere itself.) I don't know the precise reasoning that led to the I imagine, though, that things would be less complicated for us if only they'd been nice and left the Instead, we now (potentially) need one fix for v5.13 (But: I haven't yet looked into the minutiae to figure out the exact implications of the change from |
Easy reproduction methodWow. Turns out it's really easy to make this bug manifest even in a non-udev scenario. Basically just trying to open any number of not-already-open zvols at "essentially the same time" (without any particularly amazing precision) will trigger a lot of This command is a super easy way to get a system with a bunch of zvols to vomit a load of errors every single time. At least on a multi-core machine with full preempt and 1000hz tick, in any case. (This uses GNU parallel, with the job count set to 10x as many jobs as there are CPU threads, to spin up lots of instances of
Somewhat useless statisticsHere are some failure statistics for the
|
Got an easy opportunity to double-check kernel v5.12 today; I'd had vague suspicions that this problem had possibly reached back into v5.12 or v5.11 one way or another. I've updated the comments and title here to reflect that this is definitely a v5.13+ problem only.
|
In my case ``multiple zpools'' cause the wrong /dev/ simlinks during bootup udev action. My Fedorora34 box has two zpools: After booting buggy symlinks On the other hand everything goes well if the 2nd zpool is removed. |
Any updates on this ? It seems that the problem exists also in 5.14.8. Thanks. |
We've got a good handle on what the core issue here is as explained above in this #12301 (comment). I'm still investigating how best to deal with it. |
Since I compile my own kernels, I've personally been working around the problem by applying a small kernel patch that reverts a8ed1a0607 (the commit that removed the Obviously it's not an actual solution per se; and it's not helpful to anyone who isn't compiling their own kernels; but in any case it's been a workable stopgap measure for me while this issue has been pending. (I'd noticed that some of my "blast SSD contents to a zvol for backup" scripts were occasionally hitting the |
Transitioning from freebsd, I found myself with It sounds like @behlendorf is on it; let me know if any additional information is needed; simply wanted to provide datapoint with versions below. $ uname -a && sudo zfs version
Linux ubuntu 5.13.0-20-generic #20-Ubuntu SMP Fri Oct 15 14:21:35 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
zfs-2.0.6-1ubuntu2
zfs-kmod-2.0.6-1ubuntu2 and interestingly, after a bit of poking around, the |
This test case may fail on 5.13 and newer Linux kernels if the /dev/zvol/ device is not created by udev. Signed-off-by: Brian Behlendorf <[email protected]> Issue openzfs#12301
Due to a possible lock inversion the zvol open call path on Linux needs to be able to retry in the case where the spa_namespace_lock cannot be acquired. For Linux 5.12 an older kernel this was accomplished by returning -ERESTARTSYS from zvol_open() to request that blkdev_get() drop the bdev->bd_mutex lock, reaquire it, then call the open callback again. However, as of the 5.13 kernel this behavior was removed. Therefore, for 5.12 and older kernels we preserved the existing retry logic, but for 5.13 and newer kernels we retry internally in zvol_open(). This should always succeed except in the case where a pool's vdev are layed on zvols, in which case it may fail. To handle this case vdev_disk_open() has been updated to retry when opening a device when -ERESTARTSYS is returned. Signed-off-by: Brian Behlendorf <[email protected]> Issue openzfs#12301
Due to a possible lock inversion the zvol open call path on Linux needs to be able to retry in the case where the spa_namespace_lock cannot be acquired. For Linux 5.12 an older kernel this was accomplished by returning -ERESTARTSYS from zvol_open() to request that blkdev_get() drop the bdev->bd_mutex lock, reaquire it, then call the open callback again. However, as of the 5.13 kernel this behavior was removed. Therefore, for 5.12 and older kernels we preserved the existing retry logic, but for 5.13 and newer kernels we retry internally in zvol_open(). This should always succeed except in the case where a pool's vdev are layed on zvols, in which case it may fail. To handle this case vdev_disk_open() has been updated to retry when opening a device when -ERESTARTSYS is returned. Signed-off-by: Brian Behlendorf <[email protected]> Issue openzfs#12301
Due to a possible lock inversion the zvol open call path on Linux needs to be able to retry in the case where the spa_namespace_lock cannot be acquired. For Linux 5.12 an older kernel this was accomplished by returning -ERESTARTSYS from zvol_open() to request that blkdev_get() drop the bdev->bd_mutex lock, reaquire it, then call the open callback again. However, as of the 5.13 kernel this behavior was removed. Therefore, for 5.12 and older kernels we preserved the existing retry logic, but for 5.13 and newer kernels we retry internally in zvol_open(). This should always succeed except in the case where a pool's vdev are layed on zvols, in which case it may fail. To handle this case vdev_disk_open() has been updated to retry when opening a device when -ERESTARTSYS is returned. Signed-off-by: Brian Behlendorf <[email protected]> Issue openzfs#12301
This test case may fail on 5.13 and newer Linux kernels if the /dev/zvol/ device is not created by udev. Reviewed-by: Rich Ercolani <[email protected]> Reviewed-by: John Kennedy <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue openzfs#12301 Closes openzfs#12738
Due to a possible lock inversion the zvol open call path on Linux needs to be able to retry in the case where the spa_namespace_lock cannot be acquired. For Linux 5.12 an older kernel this was accomplished by returning -ERESTARTSYS from zvol_open() to request that blkdev_get() drop the bdev->bd_mutex lock, reaquire it, then call the open callback again. However, as of the 5.13 kernel this behavior was removed. Therefore, for 5.12 and older kernels we preserved the existing retry logic, but for 5.13 and newer kernels we retry internally in zvol_open(). This should always succeed except in the case where a pool's vdev are layed on zvols, in which case it may fail. To handle this case vdev_disk_open() has been updated to retry when opening a device when -ERESTARTSYS is returned. Signed-off-by: Brian Behlendorf <[email protected]> Issue openzfs#12301
@tonyhutter I've been mulling this over for a while now, and unfortunately I don't really see a significantly cleaner way to handle this. I've gone ahead and opened PR #12759 with a variant of the change you suggested above. |
I think I am affected by this as well; the When trying to reset
|
@behlendorf & @tonyhutter: Won't this get worse when we have namespace delegation to a ton of containers brute forcing their way through the acquisition of that |
@sempervictus it's definitely not the ideal solution, that is going to require more extensive changes to how the |
My problem was fixed with PR #12759 , applied on top of 2.1.1 |
Due to a possible lock inversion the zvol open call path on Linux needs to be able to retry in the case where the spa_namespace_lock cannot be acquired. For Linux 5.12 an older kernel this was accomplished by returning -ERESTARTSYS from zvol_open() to request that blkdev_get() drop the bdev->bd_mutex lock, reaquire it, then call the open callback again. However, as of the 5.13 kernel this behavior was removed. Therefore, for 5.12 and older kernels we preserved the existing retry logic, but for 5.13 and newer kernels we retry internally in zvol_open(). This should always succeed except in the case where a pool's vdev are layed on zvols, in which case it may fail. To handle this case vdev_disk_open() has been updated to retry when opening a device when -ERESTARTSYS is returned. Signed-off-by: Brian Behlendorf <[email protected]> Issue openzfs#12301
Due to a possible lock inversion the zvol open call path on Linux needs to be able to retry in the case where the spa_namespace_lock cannot be acquired. For Linux 5.12 an older kernel this was accomplished by returning -ERESTARTSYS from zvol_open() to request that blkdev_get() drop the bdev->bd_mutex lock, reaquire it, then call the open callback again. However, as of the 5.13 kernel this behavior was removed. Therefore, for 5.12 and older kernels we preserved the existing retry logic, but for 5.13 and newer kernels we retry internally in zvol_open(). This should always succeed except in the case where a pool's vdev are layed on zvols, in which case it may fail. To handle this case vdev_disk_open() has been updated to retry when opening a device when -ERESTARTSYS is returned. Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Tony Nguyen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue #12301 Closes #12759
This should be fixed, now that #12759 is merged. Thanks to @tonyhutter and @behlendorf for coming up with the fix and to those who helped test it. I still have some additional testing of that PR pending; specifically to make sure that anyone who applied the "undo the commit in 5.13 that removed the
Shall I close this issue and leave it to you folks to create new issues for these additional/follow-on concerns/ideas/future-improvements? Or should I just keep this one open? |
Let's go ahead and close this out now that the fix has been merged. We can open new issues for any remaining followup work. |
Due to a possible lock inversion the zvol open call path on Linux needs to be able to retry in the case where the spa_namespace_lock cannot be acquired. For Linux 5.12 an older kernel this was accomplished by returning -ERESTARTSYS from zvol_open() to request that blkdev_get() drop the bdev->bd_mutex lock, reaquire it, then call the open callback again. However, as of the 5.13 kernel this behavior was removed. Therefore, for 5.12 and older kernels we preserved the existing retry logic, but for 5.13 and newer kernels we retry internally in zvol_open(). This should always succeed except in the case where a pool's vdev are layed on zvols, in which case it may fail. To handle this case vdev_disk_open() has been updated to retry when opening a device when -ERESTARTSYS is returned. Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Tony Nguyen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue openzfs#12301 Closes openzfs#12759
Due to a possible lock inversion the zvol open call path on Linux needs to be able to retry in the case where the spa_namespace_lock cannot be acquired. For Linux 5.12 an older kernel this was accomplished by returning -ERESTARTSYS from zvol_open() to request that blkdev_get() drop the bdev->bd_mutex lock, reaquire it, then call the open callback again. However, as of the 5.13 kernel this behavior was removed. Therefore, for 5.12 and older kernels we preserved the existing retry logic, but for 5.13 and newer kernels we retry internally in zvol_open(). This should always succeed except in the case where a pool's vdev are layed on zvols, in which case it may fail. To handle this case vdev_disk_open() has been updated to retry when opening a device when -ERESTARTSYS is returned. Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Tony Nguyen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue openzfs#12301 Closes openzfs#12759
Due to a possible lock inversion the zvol open call path on Linux needs to be able to retry in the case where the spa_namespace_lock cannot be acquired. For Linux 5.12 an older kernel this was accomplished by returning -ERESTARTSYS from zvol_open() to request that blkdev_get() drop the bdev->bd_mutex lock, reaquire it, then call the open callback again. However, as of the 5.13 kernel this behavior was removed. Therefore, for 5.12 and older kernels we preserved the existing retry logic, but for 5.13 and newer kernels we retry internally in zvol_open(). This should always succeed except in the case where a pool's vdev are layed on zvols, in which case it may fail. To handle this case vdev_disk_open() has been updated to retry when opening a device when -ERESTARTSYS is returned. Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Tony Nguyen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue openzfs#12301 Closes openzfs#12759
Due to a possible lock inversion the zvol open call path on Linux needs to be able to retry in the case where the spa_namespace_lock cannot be acquired. For Linux 5.12 an older kernel this was accomplished by returning -ERESTARTSYS from zvol_open() to request that blkdev_get() drop the bdev->bd_mutex lock, reaquire it, then call the open callback again. However, as of the 5.13 kernel this behavior was removed. Therefore, for 5.12 and older kernels we preserved the existing retry logic, but for 5.13 and newer kernels we retry internally in zvol_open(). This should always succeed except in the case where a pool's vdev are layed on zvols, in which case it may fail. To handle this case vdev_disk_open() has been updated to retry when opening a device when -ERESTARTSYS is returned. Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Tony Nguyen <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Issue openzfs#12301 Closes openzfs#12759
Description: - This reverts commit a8ed1a0. - It breaks ZFS sometimes: openzfs/zfs#12301 (comment) Bug: N/A Change-Id: I1a17247218d65a1cd1d6822c37075de922301bfd Signed-off-by: Thomas Lamprecht <[email protected]> Signed-off-by: Ionut Nechita <[email protected]>
Description: - This reverts commit a8ed1a0. - It breaks ZFS sometimes: openzfs/zfs#12301 (comment) Bug: N/A Change-Id: I1a17247218d65a1cd1d6822c37075de922301bfd Signed-off-by: Thomas Lamprecht <[email protected]> Signed-off-by: Ionut Nechita <[email protected]>
Description: - This reverts commit a8ed1a0. - It breaks ZFS sometimes: openzfs/zfs#12301 (comment) Bug: N/A Change-Id: I1a17247218d65a1cd1d6822c37075de922301bfd Signed-off-by: Thomas Lamprecht <[email protected]> Signed-off-by: Ionut Nechita <[email protected]>
Description: - This reverts commit a8ed1a0. - It breaks ZFS sometimes: openzfs/zfs#12301 (comment) Bug: N/A Change-Id: I1a17247218d65a1cd1d6822c37075de922301bfd Signed-off-by: Thomas Lamprecht <[email protected]> Signed-off-by: Ionut Nechita <[email protected]>
Description: - This reverts commit a8ed1a0. - It breaks ZFS sometimes: openzfs/zfs#12301 (comment) Bug: N/A Change-Id: I1a17247218d65a1cd1d6822c37075de922301bfd Signed-off-by: Thomas Lamprecht <[email protected]> Signed-off-by: Ionut Nechita <[email protected]>
Description: - This reverts commit a8ed1a0. - It breaks ZFS sometimes: openzfs/zfs#12301 (comment) Bug: N/A Change-Id: I1a17247218d65a1cd1d6822c37075de922301bfd Signed-off-by: Thomas Lamprecht <[email protected]> Signed-off-by: Ionut Nechita <[email protected]>
System information
The problem
At system startup right after pool import, or at any other time when a large number of zvol dev nodes pop into existence (e.g. when setting
snapdev=visible
on a volume dataset with many snapshots), a small random smattering of the/dev/zvol/*
symlinks that are expected to be created, are missing. This is just the immediately user-noticeable symptom, though.Adding the following udev rule to heighten the log level for zvol devices helps diagnose what's going on. You can throw this in as
/etc/udev/rules.d/00-debug.rules
and load it up at runtime viaudevadm control -R
if desired.The rule in
/usr/lib/udev/rules.d/60-zvol.rules
for creating the/dev/zvol/*
symlinks includes aPROGRAM
token which causes udev (systemd-udevd) to spawn an instance of/usr/lib/udev/zvol_id
per/dev/zd*
dev node. Many of these are run in parallel at once.The debug logging rule shown above reveals that what's happening is that, occasionally and somewhat randomly, one of these
zvol_id
processes will not be able to successfully perform theopen("/dev/zdN", O_RDONLY)
syscall to open the dev node it was asked to identify. The syscall fails, and the value inerrno
after the failure is 512.So... 512 isn't a valid userspace error value at all. I was fortunate to find that 512 corresponds to
ERESTARTSYS
in Linux kernel land. As far as I know, this is a Thing That Should Never Leak To Userspace. Yet, it has, for whatever reason. (Maybe something about the signal configuration in udev processes is unusual and allows for this? No idea.)Anyway. My naive understanding is that if a syscall handler (such as
zvol_open
) returns-ERESTARTSYS
, then the Linux kernel is supposed to automatically make things "just work" such that the thread doesn't exit back to userspace but instead makes a U-turn and auto-retries the syscall again. (Which, in this case, would serve the purpose of effecting a retry on some heavily contended locks.) And so in principle, glibc shouldn't have to do any magic; and the program itself shouldn't have to do some sort ofEINTR
loop. And, at least if things work as intended, noERESTARTSYS
512 values should be seen on the userspace side. (Yet, here we are.)It should also be noted that other udev-spawned programs (e.g.
udisks-part-id
) also occasionally encountered the exact same syscall error with the sameERESTARTSYS
error; often well-correlated with the failure inzvol_id
, presumably due to both programs running at nearly the same time and both facing similar lock contention.I did hack around with the
zvol_id
program, and experimented with putting theopen
syscall into a loop that would retry on failureif (errno == 512)
. I put number-of-tries printouts to stderr in that loop, and I found that theopen
call would eventually succeed; but generally after something on the order of 20-400 retries. (Probably would fare better with some kind of sleep call in between, but ehhh.)I do half-wonder if the reorganization that happened in 5df7e9d has anything to do with this issue: prior to that commit,
zvol_first_open
would directlyreturn (-SET_ERROR(ERESTARTSYS));
, whereas after the commit,zvol_first_open
(in one file) doesreturn (SET_ERROR(EINTR));
whilezvol_open
(in a different file) does some-EINTR
to-ERESTARTSYS
stuff. Doesn't seem like the whole story by any means, but possibly a complicating factor.I also wonder if there was some internal change to how the Linux kernel does its
ERESTARTSYS
magic, perhaps in 5.12. I haven't been able to actually track down where that logic is located, at the moment; I think it used to be in one particular file, but has since been moved to some other file; and my on-the-fly kretprobes knowledge apparently isn't good enough to figure out how to just print out a damn stack trace wheneverzvol_first_open
returns-EINTR
.Other issues and PRs of similarity or relevance
zpool command hangs when attempting to create a new pool on a zvol #612: Ancient issue concerning deadlocks when using zvols as vdevs (for whatever pathological reason someone would want to do such a thing); resolved by 65d5608 which originally introduced the
-ERESTARTSYS
mechanism tozvol_first_open
Zpool import crashes the single CPU machines. #6283: Issue concerning deadlocks on 1-CPU systems due to endless
-ERESTARTSYS
retries without any possibility for task switching to take place and resolve the lock situationschedule()
call in the-ERESTARTSYS
casezpool create with zvols can hang when racing with udev #7773: Issue concerning lock contention and
-ERESTARTSYS
inzvol_open
(never resolved, closed as stale)Information that might kinda help reproduce the problem maybe perhaps
I don't know if I have any specific reliable way to reproduce the problem.
The best I can say is that my pool has a lot of zvols (dozens to hundreds, particularly when
snapdev=visible
), and that I see this pretty consistently whenever a bunch of/dev/zd*
nodes come into existence at the same time and make systemd-udevd spin up a crapton of workers that all attempt to access some large number of those dev nodes at the same time (undoubtedly running into some degree of lock contention).I don't do any exotic things like attempting to build nested zfs contraptions involving using zvols as vdevs. (I generally regard myself as a sane person.)
I use a custom kernel that has, among other things,
CONFIG_PREEMPT=y
andCONFIG_HZ=1000
. The involuntary preemption thing could be relevant; not entirely sure, to be honest.This has been happening on an SMP system with multiple cores (AMD FX-8350).
If it's of any interest, I'm currently running systemd/udev version 249rc2.
I believe this was happening on 248.3 as well, and possibly prior to that. I'm 95% sure I was also seeing this behavior back on kernel 5.12.2 (at which time I was using zfs 2babd20).Oh yeah and I guess that little udev rule file I posted earlier that enables debug log_level for
/dev/zd*
dev nodes is probably a handy thing to have around if you're trying to reproduce this.Logs of interest
Nothing of particular note shows up in
/proc/spl/kstat/zfs/dbgmsg
(withzfs_flags=1
).What I've posted below, however, is some interesting udev
log_level=debug
output from the journal, isolated out for one particular zvol device that had difficulty when I switchedsnapdev=hidden
tosnapdev=visible
on a particular volume.In this case I was using an unmodified
zvol_id
binary that didn't have any printouts added to indicate e.g. whaterrno
was; but you can see it from theudisks-part-id
invocation that also failed on theopen
syscall.In case you read the log above and are now thinking "wtf are those symlinks?!"
In the log sequence posted above, it's hard not to notice that another idiotic thing is going on. Because there's some goofs in the udev rules file and in
zvol_id
, the nonzero exit status of 512 fromzvol_id
is modulo'd into 0, which of course is then treated as a "success", meaning that the contents ofstdout
are not ignored, but are consumed normally (as if they were the name of the/dev/zvol/*
symlink to create). And thenzvol_id
, for reasons I don't particularly understand, only ever prints stuff (including its error messages) to stdout; combine that with the wonders of space-separated list syntax, and you wind up with udev deciding that the rules are requesting that the following symlinks be created:(I plan to address this in a separate PR that I have pretty much all lined up and ready to go already. The
60-zvol.rules
file is extremely ancient; like, it's literally a decade old. And it uses a fairly impressive number of deprecated syntax quirks for a mere one-liner file. Most of the code forzvol_id
is about a decade old as well, and does dumb stuff like printing error messages to stdout. So yeah, I'll be taking care of all of this particular dumbness myself shortly. That PR will fix the bizarre symlinks symptom that's been going on; but it doesn't do anything to address the core mystery of "why the hell are thesezvol_open
calls failing, and in such a way thaterrno=ERESTARTSYS
is getting to userspace?")(This is now up as PR #12302.)
Stuff I can do to help
If this issue turns out to be a giant pain to reproduce, I'm happy to do stuff like flipping on
zfs_flags
bits, dumping various logs, adding extra debug printouts in particular places, creating/building/running custom one-off builds of the zfs module that I use on my personal system with my actual data with reckless abandon; basically whatever helps track this down. 😄The text was updated successfully, but these errors were encountered: