Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Volume preparation using Salt formulas #1513

Merged

Conversation

Ebaneck
Copy link
Contributor

@Ebaneck Ebaneck commented Aug 9, 2019

Component:

'salt', 'storage', 'kubernetes'

Context:

see: #1474

Summary:

A combination of Salt state and custom salt modules should be able to take 0 or 1 device as arguments and prepare the specified volume if no argument is passed, we should prepare all volumes not yet prepared within the pillars.

Acceptance criteria:

  • Be able to successfully prepare loop devices and block devices as metalk8s volumes.
  • Ensure that prepared volumes are always available even after a restart.
  • Ensure that prepared volumes have a valid file system
  • Be able to write persistent data on all prepared volumes

Closes: #1474

@bert-e
Copy link
Contributor

bert-e commented Aug 9, 2019

Hello ebaneck,

My role is to assist you with the merge of this
pull request. Please type @bert-e help to get information
on this process, or consult the user documentation.

Status report is not available.

@bert-e
Copy link
Contributor

bert-e commented Aug 9, 2019

Waiting for approval

The following approvals are needed before I can proceed with the merge:

  • the author

  • one peer

Peer approvals must include at least 1 approval from the following list:

Copy link
Contributor

@jbertran jbertran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some formatting issues, some stray stuff. Please mark comments about stuff remaining to implement as TODO so we can clearly differentiate which comments are indicative and which are for ongoing items.

salt/metalk8s/volume/init.sls Outdated Show resolved Hide resolved
salt/metalk8s/volume/files/loopdevice.service.j2 Outdated Show resolved Hide resolved
salt/_states/metalk8s_volume.py Outdated Show resolved Hide resolved
salt/metalk8s/volume/prepare/init.sls Outdated Show resolved Hide resolved
salt/metalk8s/volume/prepare/init.sls Outdated Show resolved Hide resolved
salt/metalk8s/volume/prepare/init.sls Outdated Show resolved Hide resolved
salt/_modules/metalk8s_volume.py Outdated Show resolved Hide resolved
@Ebaneck Ebaneck changed the title WIP: Volume preparation using salt formulars WIP: Volume preparation using salt formulas Aug 9, 2019
@Ebaneck Ebaneck force-pushed the feature/1474-prepare-volume-with-Salt-formulars branch from 497b71e to b784072 Compare August 12, 2019 07:52
@slaperche-scality slaperche-scality force-pushed the feature/1474-prepare-volume-with-Salt-formulars branch from b784072 to c783531 Compare August 13, 2019 09:48
@slaperche-scality
Copy link
Contributor

slaperche-scality commented Aug 13, 2019

Current state:

  • Prepare a sparseLoopDevice
  • Prepare a rawBlockDevice
  • Calling the state with no arguments prepare all the volumes of the node
  • Handle reboot for sparseLoopDevice

Copy link
Contributor

@NicolasT NicolasT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good stuff, some comments.

salt/_modules/metalk8s_volumes.py Outdated Show resolved Hide resolved
salt/_modules/metalk8s_volumes.py Outdated Show resolved Hide resolved
salt/_modules/metalk8s_volumes.py Outdated Show resolved Hide resolved
salt/_modules/metalk8s_volumes.py Outdated Show resolved Hide resolved
salt/_modules/metalk8s_volumes.py Outdated Show resolved Hide resolved
salt/metalk8s/salt/minion/files/minion-99-metalk8s.conf.j2 Outdated Show resolved Hide resolved
salt/metalk8s/volumes/prepared/init.sls Outdated Show resolved Hide resolved
salt/metalk8s/volumes/prepared/init.sls Outdated Show resolved Hide resolved
salt/metalk8s/volumes/prepared/installed.sls Show resolved Hide resolved
salt/metalk8s/volumes/prepared/init.sls Show resolved Hide resolved
@slaperche-scality slaperche-scality force-pushed the feature/1474-prepare-volume-with-Salt-formulars branch from c53cc81 to 3dd1c8a Compare August 13, 2019 15:33
@slaperche-scality
Copy link
Contributor

We now wandle the reboot for sparseLoopDevice so the PR is no longer a draft.

This new patchset started to address some of the comments, but a new patchset is coming soon to address the remaining ones: don't need to re-review now (you can wait the next patchset).

@slaperche-scality slaperche-scality marked this pull request as ready for review August 13, 2019 15:35
@slaperche-scality slaperche-scality force-pushed the feature/1474-prepare-volume-with-Salt-formulars branch from 3dd1c8a to e0bedd3 Compare August 13, 2019 22:53
@NicolasT NicolasT changed the title WIP: Volume preparation using salt formulas Volume preparation using Salt formulas Aug 14, 2019
salt/_modules/metalk8s_volumes.py Outdated Show resolved Hide resolved
salt/_modules/metalk8s_volumes.py Outdated Show resolved Hide resolved
salt/_modules/metalk8s_volumes.py Outdated Show resolved Hide resolved
salt/_modules/metalk8s_volumes.py Outdated Show resolved Hide resolved
salt/_modules/metalk8s_volumes.py Show resolved Hide resolved
salt/_states/metalk8s_volumes.py Outdated Show resolved Hide resolved
salt/metalk8s/volumes/prepared/init.sls Show resolved Hide resolved
salt/_modules/metalk8s_volumes.py Show resolved Hide resolved
salt/metalk8s/volumes/prepared/init.sls Outdated Show resolved Hide resolved
storage-operator/pkg/salt/client.go Outdated Show resolved Hide resolved
@slaperche-scality slaperche-scality force-pushed the feature/1474-prepare-volume-with-Salt-formulars branch from e0bedd3 to 7c61b1b Compare August 16, 2019 07:53
@slaperche-scality
Copy link
Contributor

Ok, new version ready for review.

This time everything was addressed.

There are two new commits that may require your attention:

  1. ba91f00: this one handle the issue of "We can't run two state in parallel"
  2. be6d91d: as discussed on Slack this one tries to call into libblkid directly instead of calling the shell command

About ①, I'm wondering if we could use parallel states, but to do so we would need to move the installation of the packages out of this state (where could we put them?).

@slaperche-scality slaperche-scality force-pushed the feature/1474-prepare-volume-with-Salt-formulars branch from 232cb52 to 42410e6 Compare August 19, 2019 09:29
Ebaneck and others added 14 commits August 19, 2019 13:29
This commit adds a minion startup state to ensure that
all loop devices are always created when a minion reboots.

Note that now we need to always specify a saltenv, thus we update the
bootstrap accordingly.

Closes: #1474
Since the implemented behavior differs from the usual one, this may be
surprising to people reading the code.
Let's use an explicit method instead.

Refs: #1474
Signed-off-by: Sylvain Laperche <[email protected]>
We don't need `--show` because we aren't using the ouptut and we don't
need `--partscan` because we aren't expecting (and don't want to handle)
partitions here.

Refs: #1474
Signed-off-by: Sylvain Laperche <[email protected]>
Formatting being a destructive operation which can brick an OS and/or
causes data loss, we want to be really careful here.

This commit adds a check to prevent formatting an already formatted
device, that way if someone mispell `/dev/sdb` as `/dev/sda` they won't
destroy their node :]

Refs: #1474
Signed-off-by: Sylvain Laperche <[email protected]>
- don't use `mkfs` (it's deprecated), use the filesystem specific utils
  (such as `mkfs.ext4`, `mkfs.xfs`, …) instead
- as a consequence check that the given FS type is supported
- for `sparseLoopDevice`, format the sparse file instead of the
  associated loop device (less racy, we're sure that we are formatting
  what we want to) => replace `block_device` property by `path`.

Refs: #1474
Signed-off-by: Sylvain Laperche <[email protected]>
When creating a sparse loop device, the formatting need to happen
**before** associating the sparse file to a loop device (other `udev`
won't create the symlink under `/dev/disk/by-uuid`).

Since the step is now the last, let's rename `initialization` to
`finalization`.

Refs: #1474
Signed-off-by: Sylvain Laperche <[email protected]>
This command it not reliable: it cannot get the FS type from a sparse
file (fair enough), but in such case it doesn't even fail reliably so
you can't even know that something went wrong…

Calling `salt-call disk.fstype /var/lib/metalk8s/storage/sparse/example`
returns "/dev/sda1      ext4  41152736 7754388  31284864  20% /"…
(because it fallbacks on `df` when `lsblk` fails and that's what `df`
returns…)

Even on an actual block device, it returns garbage if it's not
formatted:
`salt-call disk.fstype /dev/sdb`
returns "devtmpfs       devtmpfs   1932084     0   1932084   0% /dev"

Let's use `blkid` instead…

Refs: #1474
Signed-off-by: Sylvain Laperche <[email protected]>
Since Salt states have a mandatory name parameter, let's use it to pass
the volume name instead of adding an extra-parameter.

Refs: #1474
Signed-off-by: Sylvain Laperche <[email protected]>
If we want to be really sure that a device have been formatted by use we
shouldn't check the FS type but the UUID.

In fact, relying on the FS type for `is_formatted` can be wrong.

- a Volume V is created with the StorageClass SC which has `fsType` set
  to `xfs`
- we format the device with XFS
- later, the StorageClass SC is deleted and someone reuse the name to
  create a new StorageClass SC which has `fsType` set to `ext4`
- the server reboot, we reprovision all the volume declared on the node,
  when the pillar is updated it stored the volume V and embed the
  content of the (new) StorageClass SC
- when we check if the device FS type (xfs) match the expected one (from
  the StorageClass, now `ext4`) we get a mismatch and think the device
  is not formatted
- we then call `format`, here the safeguard detect that the device is
  already formated and we bail out and avoid data loss, but we have put
  the Volume V in a Failed state when everything was fine: that's not
  good…

So yeah, we really should really on the UUID instead of the FS type.

Note that this only hold for `is_formatted`, the safeguard in `format`
still need to check if we already have something on the device before
formatting to avoid data loss.

Refs: #1474
Signed-off-by: Sylvain Laperche <[email protected]>
This commit adds a new Python module to call directly into the blkid
library (using ctypes) instead of relying on the `blkid` command.

Why do we do that?
==================

Because we want to be as sure as possible that when `blkid` returns no
informationg is means "there is no info" and not "maybe there is some
info but I couldn't get them" (because if we mix two cases, we may
actually format something because we thought "ho there is nothing,
davai" whereas the reality is "there was data but I haven't seen them
because reasons…").

Relying on `return code 2 == no data` seems kinda safe, but it's not
100% clear from the man page. Now we could check the source code  (and I
did) but that would only be valid for the current version we look at.

So let's directly attack the library, making sure that any API call will
result in an exception and thus if we end up with "no data" then that
truly means "everything went well, but there was nothing there".

Refs: #1474
Signed-off-by: Sylvain Laperche <[email protected]>
Call the real Salt state to prepare the volume (`metalk8s.volumes`)
instead of calling `test.rand_sleep`.

Refs: #1474
Signed-off-by: Sylvain Laperche <[email protected]>
Don't hardcode saltenv, instead compute it from the node's MetalK8s
version.
We use the node's version instead of the cluster one because that's also
the one used by the Salt minion.

Refs: #1474
Signed-off-by: Sylvain Laperche <[email protected]>
Until now, we used to rely on the value of `success` to know if a job
was completed successfully or not.
Unfortunately, that's not good enough: you can get `success: true` while
nothing was actually run…
See saltstack/salt#4002

This is a problem because if we think that the `PrepareVolume` step was
a success, we will then try to get the size of the device with
`disk.dump`, but if `PrepareVolume` didn't run then the device doesn't
exists and `disk.dump` will fail => we wrongly move the Volume into a
failed state.

And we easily encounter this issue by creating several volume at
once/within a short time interval.

Try to spawn two state executions at the same time:

    salt-call state.sls metalk8s.volumes saltenv=metalk8s-2.4.0-dev pillar='{"volume": "foo"}'&
    salt-call state.sls metalk8s.volumes saltenv=metalk8s-2.4.0-dev pillar='{"volume": "bar"}'

You'll get:
    local:
        Data failed to compile:
    ----------
        The function "state.sls" is running as PID 32544 and was started at 2019, Aug 14 21:45:18.338129 with jid 20190814214518338129

And when it's done by the operator through the API, here is what we get
for the second job:

    "bootstrap": {
      "return": [
        "The function \"state.sls\" is running as PID 22874 and was started at 2019, Aug 14 20:47:46.622817 with jid 20190814204746622817"
      ],
      "retcode": 1,
      "success": true,
      "out": "highstate"
    }

`success` is true, but `retcode` is non-zero…

So, to be 100% sure that the spawned job succeeded we need to check the
return code.

Why not ignoring `success` and only rely on `retcode`?
Because we want to tell the difference between:
- the job ran but failed (success == false): move the Volume into Failed
  state
- the job failed to run (success == true but retcode != 0), probably
  because one was already running: we can retry later

Note that it seems possible to run states in parallel[1], but our state
may install packages and that seems to be a dangerous things to do in
parallel, so let's avoid…
(But maybe we should move the deps install out of this state, that would
probably make sense…)

[1]: https://docs.saltstack.com/en/latest/ref/states/parallel.html

Refs: #1474
Signed-off-by: Sylvain Laperche <[email protected]>
Because I mainly test the operator outside of the cluster (and thus
running as admin:admin) I forgot to update those permissions when I
moved from `test.ping` to real Salt states.

We also need `@jobs` to poll the state of async jobs.

We really need some automated testing here ^^"

Refs: #1474
Signed-off-by: Sylvain Laperche <[email protected]>
@slaperche-scality slaperche-scality force-pushed the feature/1474-prepare-volume-with-Salt-formulars branch from 42410e6 to 77ef94a Compare August 19, 2019 11:52
Retrospectively, using the complete state `metalk8s.volume.prepared`,
while working, isn't a great idea.

Using a dedicated state that only does the "provisioning" as several
benefit:
- it's faster (we don't need to check for depedencies, check if disks
  are formatted, …)
- it's safer (we're sure that we aren't gonna trigger costly formating
  or things like that)
- it can be ran in parallel (thus, it won't block the execution of other
  states while it's running).

Refs: #1474
Signed-off-by: Sylvain Laperche <[email protected]>
@slaperche-scality slaperche-scality force-pushed the feature/1474-prepare-volume-with-Salt-formulars branch from 77ef94a to 3855fe1 Compare August 19, 2019 14:04
@NicolasT
Copy link
Contributor

I think this can be +1'd?

@bert-e
Copy link
Contributor

bert-e commented Aug 20, 2019

Not Author

I'm afraid I cannot do that, @slaperche-scality:

Only the author of the pull request @Ebaneck can use this command.

Please delete the corresponding comment so I can move on.

The following options are set: wait

@Ebaneck
Copy link
Contributor Author

Ebaneck commented Aug 20, 2019

/approve

@scality scality deleted a comment from Ebaneck Aug 20, 2019
@bert-e
Copy link
Contributor

bert-e commented Aug 20, 2019

In the queue

The changeset has received all authorizations and has been added to the
relevant queue(s). The queue(s) will be merged in the target development
branch(es) as soon as builds have passed.

The changeset will be merged in:

  • ✔️ development/2.4

The following branches will NOT be impacted:

  • development/1.0
  • development/1.1
  • development/1.2
  • development/1.3
  • development/2.0
  • development/2.1
  • development/2.2
  • development/2.3

There is no action required on your side. You will be notified here once
the changeset has been merged. In the unlikely event that the changeset
fails permanently on the queue, a member of the admin team will
contact you to help resolve the matter.

IMPORTANT

Please do not attempt to modify this pull request.

  • Any commit you add on the source branch will trigger a new cycle after the
    current queue is merged.
  • Any commit you add on one of the integration branches will be lost.

If you need this pull request to be removed from the queue, please contact a
member of the admin team now.

The following options are set: approve

@bert-e
Copy link
Contributor

bert-e commented Aug 20, 2019

I have successfully merged the changeset of this pull request
into targetted development branches:

  • ✔️ development/2.4

The following branches have NOT changed:

  • development/1.0
  • development/1.1
  • development/1.2
  • development/1.3
  • development/2.0
  • development/2.1
  • development/2.2
  • development/2.3

Please check the status of the associated issue None.

Goodbye ebaneck.

@bert-e bert-e merged commit 3855fe1 into development/2.4 Aug 20, 2019
@bert-e bert-e deleted the feature/1474-prepare-volume-with-Salt-formulars branch August 20, 2019 12:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Prepare volume with Salt
7 participants