-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sparse loop device provisioning with systemd units #2936
Conversation
Hello gdemonet,My role is to assist you with the merge of this Status report is not available. |
Waiting for approvalThe following approvals are needed before I can proceed with the merge:
Peer approvals must include at least 1 approval from the following list: |
# Prepare volume at startup (required for loop devices persistency). | ||
startup_states: sls | ||
sls_list: | ||
- /metalk8s/volumes/provisioned |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️
Branches have divergedThis pull request's source branch To avoid any integration risks, please re-synchronize them using one of the
Note: If you choose to rebase, you may have to ask me to rebuild |
@@ -0,0 +1,61 @@ | |||
#!/usr/bin/env python |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should specified 2 or 3 (seeing the import future I guess it's 2, but better be explicit).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And somewhere make sure it's installed on the system. Sure, we can assume it is (we're running salt-minion
after all), but still...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sticking with Python 2 for now, not exactly sure where we should check if it's installed though. In the service template? Or through Salt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Salt. This is where maybe at some point these files shouldn't be dropped as-is by Salt, but packaged properly, which then allows to add a dependency on some Python version.
partition_name = os.path.basename(os.path.realpath(path_by_partuuid)) | ||
device_name = os.path.basename( | ||
os.path.realpath("/sys/class/block/{}/..".format(partition_name)) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if using regex to extract loop[0-9]+
isn't simpler here, but maybe that way is more reliable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A regex would make assumptions about the naming, which is not guaranteed. Here all we assume is how sysfs is structured, which is kernel API (I believe...).
We could assert the final path we calculate is an actual block device (from stat
), and a loop device (also from stat
, though not sure how to figure out the major of loop
except by parsing /proc/devices
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Easy using awk
: awk '$2=="loop" {print $1}' /proc/devices
, and likely 7
on all systems, not dynamically registered, but still, hard-coding isn't great).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do prefer the regex as well. It's not like we're trying to make this logic work for "any kind of device partition", we know it is supposed to be a loop device, so if we don't find loop[0-9]+
, it's a good idea to bail out (if some other OS we support decides to break this behaviour, our code will explicitly tell us).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do agree we want to assert it's a loop-device. However, we should not do so based on some name, but based on the major of the block-device matching the major assigned to the loop
block-device driver.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a check to verify that:
- the device path exists
- it's a block device
- its major is the "loop" major (set to 7, hardcoded, sorry @NicolasT)
64594db
to
d1fd5e2
Compare
|
||
|
||
SPARSE_FILES_DIR = '/var/lib/metalk8s/storage/sparse/' | ||
LOOP_MAJOR = 7 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could cite Documentation/admin-guide/devices.txt from linux source tree as reference.
e.g.:
# See Documentation/admin-guide/devices.txt in linux source tree
LOOP_MAJOR = 7
def is_loop_device(path): | ||
device_stat = os.stat(path) | ||
return stat.S_ISBLK(device_stat.st_mode) \ | ||
and (_major(device_stat.st_rdev) == LOOP_MAJOR) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could use os.major(device_stat.st_rdev)
instead of coding your own _major
😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some issues
help='Cleanup MetalK8s sparse loop volumes', | ||
) | ||
|
||
parser.add_argument( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is required, why use a flag argument instead of a positional one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, remnants of the temporary implementation where I had an action
(provision
or cleanup
) as positional. Anyway, will change.
) | ||
|
||
parser.add_argument( | ||
'-i', '--uid', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not uid
, but uuid
(though again, not sure this should be a flag).
parser.add_argument( | ||
'-i', '--uid', | ||
required=True, | ||
help='UID of the MetalK8s Volume object', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UUID
import sys | ||
|
||
|
||
SPARSE_FILES_DIR = '/var/lib/metalk8s/storage/sparse/' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can drop the last /
(see below).
|
||
def check_sparse_file(volume_id): | ||
"""Check if the sparse file exists for this volume ID.""" | ||
sparse_file_path = SPARSE_FILES_DIR + volume_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
os.path.join
iso +
device_name = os.path.basename( | ||
os.path.realpath("/sys/class/block/{}/..".format(partition_name)) | ||
) | ||
device_path = '/dev/{}'.format(match.group(0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may be missing something, but where does match
come from? May need to add this file to the linted Python files in buildchain?
def is_loop_device(path): | ||
device_stat = os.stat(path) | ||
return stat.S_ISBLK(device_stat.st_mode) \ | ||
and (_major(device_stat.st_rdev) == LOOP_MAJOR) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not os.major
?
[Service] | ||
Type=oneshot | ||
ExecStart=/bin/flock --exclusive --wait 10 /var/lock/metalk8s-sparse-volume.lock \ | ||
-c "/sbin/losetup --find --partscan /var/lib/metalk8s/storage/sparse/%i" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could not have -c "..."
, and use the sudo
-style invocation, no?
Type=oneshot | ||
ExecStart=/bin/flock --exclusive --wait 10 /var/lock/metalk8s-sparse-volume.lock \ | ||
-c "/sbin/losetup --find --partscan /var/lib/metalk8s/storage/sparse/%i" | ||
ExecStop=/usr/local/bin/libexec/metalk8s-sparse-volume-cleanup --uid "%i" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/usr/local/bin/libexec
ain't right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops.
{%- if to_remove is not none %} | ||
{%- if 'sparseLoopDevice' in to_remove.spec %} | ||
Disable systemd provisioning of loop device for {{ volume_name }}: | ||
service.disabled: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this also 'stop' the service?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. I didn't realize it wasn't stopped, because I left the cleanup logic in the module 😕
b59a16f
to
b22f504
Compare
Waiting for approvalThe following approvals are needed before I can proceed with the merge:
Peer approvals must include at least 1 approval from the following list:
The following reviewers are expecting changes from the author, or must review again: |
SPARSE_FILES_DIR = '/var/lib/metalk8s/storage/sparse' | ||
|
||
# https://github.com/torvalds/linux/blob/master/Documentation/admin-guide/devices.txt#L191 | ||
LOOP_MAJOR = 7 | ||
|
||
# https://github.com/torvalds/linux/blob/master/include/uapi/linux/loop.h#L110 | ||
LOOP_CLR_FD = 0x4C01 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since those constants are always used by a single function, maybe it's better to declare them near their usage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I just store them at the top of my modules, out of habit :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hah, bike-shedding: I like them on top, actually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah sorry, missed your comment @NicolasT so already did the change. I think both sides have good arguments, but the stakes are too low for us to have this debate. So I won't change it anymore 😜
) | ||
|
||
if not is_loop_device(device_path): | ||
die( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: instead of having a die
function, maybe raising an exception (+ global try/catch around main()
) would be clearer (more idiomatic at least).
This (global try/catch) would also handle the errors you forgot/ignored on purpose (failure of os.stat
and os.open
for instance).
Not sure it's worth it though, it's your call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a good suggestion, and not too much work, I'll handle it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did something really basic, should do the trick 🙂
@@ -651,7 +651,8 @@ def _get_parts(self) -> Iterator[str]: | |||
Path('salt/metalk8s/volumes/init.sls'), | |||
Path('salt/metalk8s/volumes/prepared/init.sls'), | |||
Path('salt/metalk8s/volumes/prepared/installed.sls'), | |||
Path('salt/metalk8s/volumes/provisioned/init.sls'), | |||
Path('salt/metalk8s/volumes/prepared/files/[email protected]'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
About commit message: really good one 👍
On
to include the fix from higher versions of
losetup
Maybe add a link to the fix and/or mention in which version it was fixed (v2.25 IIRC), maybe when we migrate to CentOS 8 we can get rid of this workaround.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
e61b37d
to
6567a22
Compare
ConflictThere is a conflict between your branch Please resolve the conflict on the feature branch ( $ git fetch
$ git checkout origin/bugfix/2726-sparse-file-systemd
$ git merge origin/development/2.7
$ # <intense conflict resolution>
$ git commit
$ git push origin HEAD:bugfix/2726-sparse-file-systemd |
The previous approach was relying entirely on Salt (through our custom `metalk8s_volumes` execution/state module) to manage provisioning and cleanup of loop devices (on sparse files). The issues arise when (re)booting a node: - Salt minion has to execute a startup state, which may come in when kube-apiserver is not (yet) available - `losetup --find` calls, with the version available for CentOS 7, are not atomic, so we may lose some devices if we have too many (re-running the state manually is the current workaround) To fix these, we introduce two main changes: - management of loop devices is delegated to systemd; using a unit template file, we define a service per volume (passing the Volume's UUID as a parameter), which will provision and attach the device on start, and detach it on stop - `losetup --find` invocations from these units is made sequential by using an exclusive lockfile (this is not perfect, but avoids the need to re-implement the command ourselves to include the fix from losetup 2.25 and higher: see util-linux/util-linux@f7e21185) Note that we cannot simply use `losetup --detach` in the unit template, because sparse volumes may not always be discovered from the same path: - either the volume is formatted, and we can find it in /dev/disk/by-uuid/ - or it's kept raw, so we only have the UUID in the partition table, and we can discover the device through /dev/disk/by-partuuid/ Fixes: #2726
We adjust the `lint-python` tox environment to: - handle Python 2 and Python 3 files (we only run mypy on the latter) - always run both pylint and mypy, and report exit codes for both in case of failure See: #2726
a5fcb70
to
f6d45d9
Compare
Waiting for approvalThe following approvals are needed before I can proceed with the merge:
Peer approvals must include at least 1 approval from the following list:
The following reviewers are expecting changes from the author, or must review again: |
|
||
# Needed by the sparse volume cleanup script | ||
Ensure Python 2 is available: | ||
test.fail_without_changes: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I would've used a pkg_installed('python2')
or something along those lines here ;-) But sure, works for this particular case I guess.
Waiting for approvalThe following approvals are needed before I can proceed with the merge:
Peer approvals must include at least 1 approval from the following list: |
/approve |
In the queueThe changeset has received all authorizations and has been added to the The changeset will be merged in:
The following branches will NOT be impacted:
There is no action required on your side. You will be notified here once IMPORTANT Please do not attempt to modify this pull request.
If you need this pull request to be removed from the queue, please contact a The following options are set: approve |
I have successfully merged the changeset of this pull request
The following branches have NOT changed:
Please check the status of the associated issue None. Goodbye gdemonet. |
Component: salt, storage
Context:
The previous approach was relying entirely on Salt (through our custom
metalk8s_volumes
execution/state module) to manage provisioning andcleanup of loop devices (on sparse files).
The issues arise when (re)booting a node:
kube-apiserver is not (yet) available
losetup --find
calls, with the version available for CentOS 7, arenot atomic, so we may lose some devices if we have too many (re-running
the state manually is the current workaround)
Summary:
To fix these, we introduce two main changes:
template file, we define a service per volume (passing the Volume's UUID
as a parameter), which will provision and attach the device on start,
and detach it on stop
losetup --find
invocations from these units is made sequential byusing an exclusive lockfile (this is not perfect, but avoids the need to
re-implement the command ourselves to include the fix from higher
versions of
losetup
)Note that we cannot simply use
losetup --detach
in the unit template,because sparse volumes may not always be discovered from the same path:
/dev/disk/by-uuid/
we can discover the device through /dev/disk/by-partuuid/
Acceptance criteria:
losetup -l
afterCloses: #2726