-
Notifications
You must be signed in to change notification settings - Fork 494
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NAS-115860 / 22.12 / fix importing zpools on SCALE #8972
Conversation
…according to PEP8" This reverts commit 8091d14.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you think it's good, i don't mind. I was just not seeing a reason to let a list/deque grow to huge lengths when we are only interested in last 2 events maximum
That's a valid point, let me make sure |
retest this please |
Also why is this a script outside of middleware? |
Discovered and witnessed on an internal M50 (no expansion shelf) as well as an R50b with an ES60 expansion shelf (in the field). The simplified version is that
/dev/disk/by-partuuid
symlinks aren't being created by the time the raw devices are being populated inside/dev/
.Since we're trying to import the zpools specifying
/dev/disk/by-partuuid
AND/dev/
it means zpool is importing the disks via gptid but if it can't find one, it's choosing a random raw device. The device letters for raw devices aren't guaranteed between reboots and often change so when the zpool is imported, certain devices are "missing" and other drives are being put in their place.This is painful because the zpool is now imported and in an unhealthy state. This adds a simple helper script that gets called as a prerequisite to the
ExecStart
entries in theix-zfs.service
file. Testing this on the M50, fixed the problem and the zpool imported with gptid's and produced a healthy pool.NOTE: This is still "flawed" and not something that I want to do but our hand is forced. The "proper" solution is for openzfs to actually integrate with systemd and use the proper mechanism(s) to automagically handle this.
The
ix-zfs.service
has a time limit of 15mins while we allow udev "events" to be received for a maximum of 10mins. This worked on the R50b and the M50 so I'm leaving those time limits for now.BONUS: adding
After=systemd-udevd-settle.service
is what I initially thought would be the "solution" but that emits a DeprecationWarning from systemd. Furthermore, reading the documentation uses big scary verbiage about how this is absolutely the wrong approach, so I've refrained from doing that.