-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Volume preparation using Salt formulas #1513
Volume preparation using Salt formulas #1513
Conversation
Hello ebaneck,My role is to assist you with the merge of this Status report is not available. |
Waiting for approvalThe following approvals are needed before I can proceed with the merge:
Peer approvals must include at least 1 approval from the following list: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some formatting issues, some stray stuff. Please mark comments about stuff remaining to implement as TODO
so we can clearly differentiate which comments are indicative and which are for ongoing items.
497b71e
to
b784072
Compare
b784072
to
c783531
Compare
Current state:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good stuff, some comments.
c53cc81
to
3dd1c8a
Compare
We now wandle the reboot for This new patchset started to address some of the comments, but a new patchset is coming soon to address the remaining ones: don't need to re-review now (you can wait the next patchset). |
3dd1c8a
to
e0bedd3
Compare
e0bedd3
to
7c61b1b
Compare
Ok, new version ready for review. This time everything was addressed. There are two new commits that may require your attention:
About ①, I'm wondering if we could use parallel states, but to do so we would need to move the installation of the packages out of this state (where could we put them?). |
232cb52
to
42410e6
Compare
This commit adds a minion startup state to ensure that all loop devices are always created when a minion reboots. Note that now we need to always specify a saltenv, thus we update the bootstrap accordingly. Closes: #1474
Since the implemented behavior differs from the usual one, this may be surprising to people reading the code. Let's use an explicit method instead. Refs: #1474 Signed-off-by: Sylvain Laperche <[email protected]>
We don't need `--show` because we aren't using the ouptut and we don't need `--partscan` because we aren't expecting (and don't want to handle) partitions here. Refs: #1474 Signed-off-by: Sylvain Laperche <[email protected]>
Formatting being a destructive operation which can brick an OS and/or causes data loss, we want to be really careful here. This commit adds a check to prevent formatting an already formatted device, that way if someone mispell `/dev/sdb` as `/dev/sda` they won't destroy their node :] Refs: #1474 Signed-off-by: Sylvain Laperche <[email protected]>
- don't use `mkfs` (it's deprecated), use the filesystem specific utils (such as `mkfs.ext4`, `mkfs.xfs`, …) instead - as a consequence check that the given FS type is supported - for `sparseLoopDevice`, format the sparse file instead of the associated loop device (less racy, we're sure that we are formatting what we want to) => replace `block_device` property by `path`. Refs: #1474 Signed-off-by: Sylvain Laperche <[email protected]>
When creating a sparse loop device, the formatting need to happen **before** associating the sparse file to a loop device (other `udev` won't create the symlink under `/dev/disk/by-uuid`). Since the step is now the last, let's rename `initialization` to `finalization`. Refs: #1474 Signed-off-by: Sylvain Laperche <[email protected]>
This command it not reliable: it cannot get the FS type from a sparse file (fair enough), but in such case it doesn't even fail reliably so you can't even know that something went wrong… Calling `salt-call disk.fstype /var/lib/metalk8s/storage/sparse/example` returns "/dev/sda1 ext4 41152736 7754388 31284864 20% /"… (because it fallbacks on `df` when `lsblk` fails and that's what `df` returns…) Even on an actual block device, it returns garbage if it's not formatted: `salt-call disk.fstype /dev/sdb` returns "devtmpfs devtmpfs 1932084 0 1932084 0% /dev" Let's use `blkid` instead… Refs: #1474 Signed-off-by: Sylvain Laperche <[email protected]>
Since Salt states have a mandatory name parameter, let's use it to pass the volume name instead of adding an extra-parameter. Refs: #1474 Signed-off-by: Sylvain Laperche <[email protected]>
If we want to be really sure that a device have been formatted by use we shouldn't check the FS type but the UUID. In fact, relying on the FS type for `is_formatted` can be wrong. - a Volume V is created with the StorageClass SC which has `fsType` set to `xfs` - we format the device with XFS - later, the StorageClass SC is deleted and someone reuse the name to create a new StorageClass SC which has `fsType` set to `ext4` - the server reboot, we reprovision all the volume declared on the node, when the pillar is updated it stored the volume V and embed the content of the (new) StorageClass SC - when we check if the device FS type (xfs) match the expected one (from the StorageClass, now `ext4`) we get a mismatch and think the device is not formatted - we then call `format`, here the safeguard detect that the device is already formated and we bail out and avoid data loss, but we have put the Volume V in a Failed state when everything was fine: that's not good… So yeah, we really should really on the UUID instead of the FS type. Note that this only hold for `is_formatted`, the safeguard in `format` still need to check if we already have something on the device before formatting to avoid data loss. Refs: #1474 Signed-off-by: Sylvain Laperche <[email protected]>
This commit adds a new Python module to call directly into the blkid library (using ctypes) instead of relying on the `blkid` command. Why do we do that? ================== Because we want to be as sure as possible that when `blkid` returns no informationg is means "there is no info" and not "maybe there is some info but I couldn't get them" (because if we mix two cases, we may actually format something because we thought "ho there is nothing, davai" whereas the reality is "there was data but I haven't seen them because reasons…"). Relying on `return code 2 == no data` seems kinda safe, but it's not 100% clear from the man page. Now we could check the source code (and I did) but that would only be valid for the current version we look at. So let's directly attack the library, making sure that any API call will result in an exception and thus if we end up with "no data" then that truly means "everything went well, but there was nothing there". Refs: #1474 Signed-off-by: Sylvain Laperche <[email protected]>
Call the real Salt state to prepare the volume (`metalk8s.volumes`) instead of calling `test.rand_sleep`. Refs: #1474 Signed-off-by: Sylvain Laperche <[email protected]>
Don't hardcode saltenv, instead compute it from the node's MetalK8s version. We use the node's version instead of the cluster one because that's also the one used by the Salt minion. Refs: #1474 Signed-off-by: Sylvain Laperche <[email protected]>
Until now, we used to rely on the value of `success` to know if a job was completed successfully or not. Unfortunately, that's not good enough: you can get `success: true` while nothing was actually run… See saltstack/salt#4002 This is a problem because if we think that the `PrepareVolume` step was a success, we will then try to get the size of the device with `disk.dump`, but if `PrepareVolume` didn't run then the device doesn't exists and `disk.dump` will fail => we wrongly move the Volume into a failed state. And we easily encounter this issue by creating several volume at once/within a short time interval. Try to spawn two state executions at the same time: salt-call state.sls metalk8s.volumes saltenv=metalk8s-2.4.0-dev pillar='{"volume": "foo"}'& salt-call state.sls metalk8s.volumes saltenv=metalk8s-2.4.0-dev pillar='{"volume": "bar"}' You'll get: local: Data failed to compile: ---------- The function "state.sls" is running as PID 32544 and was started at 2019, Aug 14 21:45:18.338129 with jid 20190814214518338129 And when it's done by the operator through the API, here is what we get for the second job: "bootstrap": { "return": [ "The function \"state.sls\" is running as PID 22874 and was started at 2019, Aug 14 20:47:46.622817 with jid 20190814204746622817" ], "retcode": 1, "success": true, "out": "highstate" } `success` is true, but `retcode` is non-zero… So, to be 100% sure that the spawned job succeeded we need to check the return code. Why not ignoring `success` and only rely on `retcode`? Because we want to tell the difference between: - the job ran but failed (success == false): move the Volume into Failed state - the job failed to run (success == true but retcode != 0), probably because one was already running: we can retry later Note that it seems possible to run states in parallel[1], but our state may install packages and that seems to be a dangerous things to do in parallel, so let's avoid… (But maybe we should move the deps install out of this state, that would probably make sense…) [1]: https://docs.saltstack.com/en/latest/ref/states/parallel.html Refs: #1474 Signed-off-by: Sylvain Laperche <[email protected]>
Because I mainly test the operator outside of the cluster (and thus running as admin:admin) I forgot to update those permissions when I moved from `test.ping` to real Salt states. We also need `@jobs` to poll the state of async jobs. We really need some automated testing here ^^" Refs: #1474 Signed-off-by: Sylvain Laperche <[email protected]>
42410e6
to
77ef94a
Compare
Retrospectively, using the complete state `metalk8s.volume.prepared`, while working, isn't a great idea. Using a dedicated state that only does the "provisioning" as several benefit: - it's faster (we don't need to check for depedencies, check if disks are formatted, …) - it's safer (we're sure that we aren't gonna trigger costly formating or things like that) - it can be ran in parallel (thus, it won't block the execution of other states while it's running). Refs: #1474 Signed-off-by: Sylvain Laperche <[email protected]>
77ef94a
to
3855fe1
Compare
I think this can be +1'd? |
Not AuthorI'm afraid I cannot do that, @slaperche-scality: Only the author of the pull request @Ebaneck can use this command. Please delete the corresponding comment so I can move on. The following options are set: wait |
/approve |
In the queueThe changeset has received all authorizations and has been added to the The changeset will be merged in:
The following branches will NOT be impacted:
There is no action required on your side. You will be notified here once IMPORTANT Please do not attempt to modify this pull request.
If you need this pull request to be removed from the queue, please contact a The following options are set: approve |
I have successfully merged the changeset of this pull request
The following branches have NOT changed:
Please check the status of the associated issue None. Goodbye ebaneck. |
Component:
'salt', 'storage', 'kubernetes'
Context:
see: #1474
Summary:
A combination of Salt state and custom salt modules should be able to take 0 or 1 device as arguments and prepare the specified volume if no argument is passed, we should prepare all volumes not yet prepared within the pillars.
Acceptance criteria:
Closes: #1474