NAS-115860 / 22.12 / fix importing zpools on SCALE #8972

yocalebo · 2022-05-17T17:21:02Z

Discovered and witnessed on an internal M50 (no expansion shelf) as well as an R50b with an ES60 expansion shelf (in the field). The simplified version is that /dev/disk/by-partuuid symlinks aren't being created by the time the raw devices are being populated inside /dev/.

Since we're trying to import the zpools specifying /dev/disk/by-partuuid AND /dev/ it means zpool is importing the disks via gptid but if it can't find one, it's choosing a random raw device. The device letters for raw devices aren't guaranteed between reboots and often change so when the zpool is imported, certain devices are "missing" and other drives are being put in their place.

This is painful because the zpool is now imported and in an unhealthy state. This adds a simple helper script that gets called as a prerequisite to the ExecStart entries in the ix-zfs.service file. Testing this on the M50, fixed the problem and the zpool imported with gptid's and produced a healthy pool.

NOTE: This is still "flawed" and not something that I want to do but our hand is forced. The "proper" solution is for openzfs to actually integrate with systemd and use the proper mechanism(s) to automagically handle this.

The ix-zfs.service has a time limit of 15mins while we allow udev "events" to be received for a maximum of 10mins. This worked on the R50b and the M50 so I'm leaving those time limits for now.

BONUS: adding After=systemd-udevd-settle.service is what I initially thought would be the "solution" but that emits a DeprecationWarning from systemd. Furthermore, reading the documentation uses big scary verbiage about how this is absolutely the wrong approach, so I've refrained from doing that.

bugclerk · 2022-05-17T17:21:15Z

Jira URL: https://jira.ixsystems.com/browse/NAS-115860

…according to PEP8" This reverts commit 8091d14.

debian/debian/ix-zfs.service

src/middlewared/middlewared/scripts/zpoolhelper.py

sonicaj

If you think it's good, i don't mind. I was just not seeing a reason to let a list/deque grow to huge lengths when we are only interested in last 2 events maximum

yocalebo · 2022-05-17T19:33:31Z

If you think it's good, i don't mind. I was just not seeing a reason to let a list/deque grow to huge lengths when we are only interested in last 2 events maximum

That's a valid point, let me make sure .append( doesn't block once it's full

yocalebo · 2022-05-17T19:57:02Z

retest this please

themylogin · 2022-05-26T07:26:25Z

@yocalebo @sonicaj why did we remove flake8-import-order?

themylogin · 2022-05-26T07:27:20Z

Also why is this a script outside of middleware?

yocalebo added 2 commits May 17, 2022 13:09

add helper script to be called by ix-zfs.service

b6db95c

call helper script in ix-zfs.service

ea57b31

yocalebo added the backport-22.02.2 label May 17, 2022

yocalebo requested a review from a team May 17, 2022 17:21

bugclerk changed the title ~~fix importing zpools on SCALE~~ NAS-115860 / 22.12 / fix importing zpools on SCALE May 17, 2022

Revert "Employ flake8-import-order to ensure correct import orders …

38d1734

…according to PEP8" This reverts commit 8091d14.

sonicaj reviewed May 17, 2022

View reviewed changes

debian/debian/ix-zfs.service Outdated Show resolved Hide resolved

moarrrr complicatedness

85275de

sonicaj reviewed May 17, 2022

View reviewed changes

src/middlewared/middlewared/scripts/zpoolhelper.py Show resolved Hide resolved

sonicaj approved these changes May 17, 2022

View reviewed changes

limit deque size to 2 items

6554f9c

yocalebo force-pushed the NAS-115860 branch from d1d9ff8 to 6554f9c Compare May 18, 2022 11:18

yocalebo merged commit 9c2d6be into master May 18, 2022

yocalebo deleted the NAS-115860 branch May 18, 2022 11:19

bugclerk mentioned this pull request May 18, 2022

NAS-115860 / 22.02.2 / fix importing zpools on SCALE (by yocalebo) #8977

Merged

bugclerk added the backported label May 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NAS-115860 / 22.12 / fix importing zpools on SCALE #8972

NAS-115860 / 22.12 / fix importing zpools on SCALE #8972

yocalebo commented May 17, 2022 •

edited

Loading

bugclerk commented May 17, 2022

sonicaj left a comment

yocalebo commented May 17, 2022

yocalebo commented May 17, 2022

themylogin commented May 26, 2022

themylogin commented May 26, 2022

NAS-115860 / 22.12 / fix importing zpools on SCALE #8972

NAS-115860 / 22.12 / fix importing zpools on SCALE #8972

Conversation

yocalebo commented May 17, 2022 • edited Loading

bugclerk commented May 17, 2022

sonicaj left a comment

Choose a reason for hiding this comment

yocalebo commented May 17, 2022

yocalebo commented May 17, 2022

themylogin commented May 26, 2022

themylogin commented May 26, 2022

yocalebo commented May 17, 2022 •

edited

Loading