Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrading to 2.7 does not re-enable loop device provisioning #2982

Closed
gdemonet opened this issue Dec 14, 2020 · 0 comments · Fixed by #2984
Closed

Upgrading to 2.7 does not re-enable loop device provisioning #2982

gdemonet opened this issue Dec 14, 2020 · 0 comments · Fixed by #2984
Assignees
Labels
complexity:easy Something that requires less than a day to fix kind:bug Something isn't working release:blocker An issue that blocks a release until resolved topic:lifecycle Issues related to upgrade or downgrade of MetalK8s topic:salt Everything related to SaltStack in our product topic:storage Issues related to storage

Comments

@gdemonet
Copy link
Contributor

Component: salt, upgrade

What happened:

When implementing a solution for #2726 in #2936, we failed to account for upgrade scenarios, where existing SparseLoopDevice Volumes already provisioned will not have a proper systemd unit created.
Thus, after upgrade, any such Volume will fail to have its device provisioned on node reboot.

What was expected: Seamless upgrade, loop devices still provisioned on node startup.

Steps to reproduce: From a MetalK8s 2.6 cluster with existing SparseLoopDevice Volumes, upgrade to 2.7 and reboot.

Resolution proposal (optional):

  • During upgrade, ensure the systemd unit files are created for existing Volumes
  • During downgrade, remove the unit files (and re-provision the devices using the metalk8s_volumes module)
@gdemonet gdemonet added kind:bug Something isn't working topic:storage Issues related to storage topic:lifecycle Issues related to upgrade or downgrade of MetalK8s complexity:easy Something that requires less than a day to fix release:blocker An issue that blocks a release until resolved topic:salt Everything related to SaltStack in our product labels Dec 14, 2020
@gdemonet gdemonet added this to the MetalK8s 2.7.0 milestone Dec 14, 2020
@gdemonet gdemonet self-assigned this Dec 14, 2020
gdemonet added a commit that referenced this issue Dec 15, 2020
This will simplify the introduction of other states depending on
`.installed`, into which we move the systemd unit template and cleanup
script creation.

See: #2982
gdemonet added a commit that referenced this issue Dec 15, 2020
Since the management of loop devices changed between 2.6 and 2.7
(introduction of systemd units for this purpose), we want to ensure
there will not be duplicated devices pointing to the same sparse file.
To do this, we introduce a "cleanup" formula, which can operate in two
modes, 'upgrade' and 'downgrade' (controlled via pillar).
For upgrade, we manage this cleanup in the `deploy_node` orchestrate,
during the drain.
For downgrade, we cannot change the `deploy_node` orchestrate, so we
manually drain and cleanup from the `downgrade` orchestrate.

See: #2982
gdemonet added a commit that referenced this issue Dec 15, 2020
This will ensure all loop devices are managed as expected after an
upgrade.
Note that this means invalid Volumes which were not prepared yet will
break upgrade.

Fixes: #2982
gdemonet added a commit that referenced this issue Dec 15, 2020
Since the management of loop devices changed between 2.6 and 2.7
(introduction of systemd units for this purpose), we want to ensure
there will not be duplicated devices pointing to the same sparse file.
To do this, we introduce a "cleanup" formula, which can operate in two
modes, 'upgrade' and 'downgrade' (controlled via pillar).
For upgrade, we manage this cleanup in the `deploy_node` orchestrate,
during the drain.
For downgrade, we cannot change the `deploy_node` orchestrate, so we
manually drain and cleanup from the `downgrade` orchestrate.

See: #2982
gdemonet added a commit that referenced this issue Dec 15, 2020
This will ensure all loop devices are managed as expected after an
upgrade.
Note that this means invalid Volumes which were not prepared yet will
break upgrade.

Fixes: #2982
gdemonet added a commit that referenced this issue Dec 15, 2020
This will simplify the introduction of other states depending on
`.installed`, into which we move the systemd unit template and cleanup
script creation.

See: #2982
gdemonet added a commit that referenced this issue Dec 15, 2020
Since the management of loop devices changed between 2.6 and 2.7
(introduction of systemd units for this purpose), we want to ensure
there will not be duplicated devices pointing to the same sparse file.
To do this, we introduce a "cleanup" formula, which can operate in two
modes, 'upgrade' and 'downgrade' (controlled via pillar).
For upgrade, we manage this cleanup in the `deploy_node` orchestrate,
during the drain.
For downgrade, we cannot change the `deploy_node` orchestrate, so we
manually drain and cleanup from the `downgrade` orchestrate.

See: #2982
gdemonet added a commit that referenced this issue Dec 15, 2020
This will ensure all loop devices are managed as expected after an
upgrade.
Note that this means invalid Volumes which were not prepared yet will
break upgrade.

Fixes: #2982
gdemonet added a commit that referenced this issue Dec 15, 2020
This will ensure all loop devices are managed as expected after an
upgrade.
Note that this means invalid Volumes which were not prepared yet will
break upgrade.

Fixes: #2982
gdemonet added a commit that referenced this issue Dec 15, 2020
This will ensure all loop devices are managed as expected after an
upgrade.
Note that this means invalid Volumes which were not prepared yet will
break upgrade.

Fixes: #2982
gdemonet added a commit that referenced this issue Dec 15, 2020
This will ensure all loop devices are managed as expected after an
upgrade.
Note that this means invalid Volumes which were not prepared yet will
break upgrade.

Fixes: #2982
gdemonet added a commit that referenced this issue Dec 15, 2020
This will ensure all loop devices are managed as expected after an
upgrade.
Note that this means invalid Volumes which were not prepared yet will
break upgrade.

Fixes: #2982
gdemonet added a commit that referenced this issue Dec 15, 2020
This will ensure all loop devices are managed as expected after an
upgrade.
Note that this means invalid Volumes which were not prepared yet will
break upgrade.

Fixes: #2982
gdemonet added a commit that referenced this issue Dec 16, 2020
The `volume_utils` module interacts with a C library, and some values
are `c_char_p` which translates to bytes in Python 3 (instead of string
in Python 2).
This change breaks the blkid probe, which in turns makes the
`metalk8s_volumes.is_prepared` execution function always return False.
Adding `.decode('ascii')` to the relevant places fixes this issue.

See: bddea0b
See: #2982
gdemonet added a commit that referenced this issue Dec 16, 2020
This will simplify the introduction of other states depending on
`.installed`, into which we move the systemd unit template and cleanup
script creation.

See: #2982
gdemonet added a commit that referenced this issue Dec 16, 2020
Since the management of loop devices changed between 2.6 and 2.7
(introduction of systemd units for this purpose), we want to ensure
there will not be duplicated devices pointing to the same sparse file.
To do this, we introduce a "cleanup" formula, which can operate in two
modes, 'upgrade' and 'downgrade' (controlled via pillar).
For upgrade, we manage this cleanup in the `deploy_node` orchestrate,
during the drain.
For downgrade, we cannot change the `deploy_node` orchestrate, so we
manually drain and cleanup from the `downgrade` orchestrate.

See: #2982
gdemonet added a commit that referenced this issue Dec 16, 2020
This will ensure all loop devices are managed as expected after an
upgrade.
Note that this means invalid Volumes which were not prepared yet will
break upgrade.

Fixes: #2982
gdemonet added a commit that referenced this issue Dec 16, 2020
The `volume_utils` module interacts with a C library, and some values
are `c_char_p` which translates to bytes in Python 3 (instead of string
in Python 2).
This change breaks the blkid probe, which in turns makes the
`metalk8s_volumes.is_prepared` execution function always return False.
Adding `.decode('ascii')` to the relevant places fixes this issue.

See: bddea0b
See: #2982
gdemonet added a commit that referenced this issue Dec 17, 2020
Since the management of loop devices changed between 2.6 and 2.7
(introduction of systemd units for this purpose), we want to ensure
there will not be duplicated devices pointing to the same sparse file.
To do this, we introduce a "cleanup" formula, which can operate in two
modes, 'upgrade' and 'downgrade' (controlled via pillar).
For upgrade, we manage this cleanup in the `deploy_node` orchestrate,
during the drain.
For downgrade, we cannot change the `deploy_node` orchestrate, so we
manually drain and cleanup from the `downgrade` orchestrate.

See: #2982
gdemonet added a commit that referenced this issue Dec 17, 2020
This will ensure all loop devices are managed as expected after an
upgrade.
Note that this means invalid Volumes which were not prepared yet will
break upgrade.

Fixes: #2982
gdemonet added a commit that referenced this issue Dec 17, 2020
The `volume_utils` module interacts with a C library, and some values
are `c_char_p` which translates to bytes in Python 3 (instead of string
in Python 2).
This change breaks the blkid probe, which in turns makes the
`metalk8s_volumes.is_prepared` execution function always return False.
Adding `.decode('ascii')` to the relevant places fixes this issue.

See: bddea0b
See: #2982
gdemonet added a commit that referenced this issue Jan 7, 2021
The `volume_utils` module interacts with a C library, and some values
are `c_char_p` which translates to bytes in Python 3 (instead of string
in Python 2).
This change breaks the blkid probe, which in turns makes the
`metalk8s_volumes.is_prepared` execution function always return False.
Adding `.decode('ascii')` to the relevant places fixes this issue.

See: bddea0b
See: #2982
gdemonet added a commit that referenced this issue Jan 7, 2021
Since the management of loop devices changed between 2.6 and 2.7
(introduction of systemd units for this purpose), we want to ensure
there will not be duplicated devices pointing to the same sparse file.
To do this, we introduce a "cleanup" formula, which can operate in two
modes, 'upgrade' and 'downgrade' (controlled via pillar).
For upgrade, we manage this cleanup in the `deploy_node` orchestrate,
during the drain.
For downgrade, we cannot change the `deploy_node` orchestrate, so we
manually drain and cleanup from the `downgrade` orchestrate.

See: #2982
gdemonet added a commit that referenced this issue Jan 7, 2021
This will ensure all loop devices are managed as expected after an
upgrade.
Note that this means invalid Volumes which were not prepared yet will
break upgrade.

Fixes: #2982
gdemonet added a commit that referenced this issue Jan 7, 2021
The `volume_utils` module interacts with a C library, and some values
are `c_char_p` which translates to bytes in Python 3 (instead of string
in Python 2).
This change breaks the blkid probe, which in turns makes the
`metalk8s_volumes.is_prepared` execution function always return False.
Adding `.decode('ascii')` to the relevant places fixes this issue.

See: bddea0b
See: #2982
gdemonet added a commit that referenced this issue Jan 7, 2021
This will simplify the introduction of other states depending on
`.installed`, into which we move the systemd unit template and cleanup
script creation.

See: #2982
gdemonet added a commit that referenced this issue Jan 7, 2021
Since the management of loop devices changed between 2.6 and 2.7
(introduction of systemd units for this purpose), we want to ensure
there will not be duplicated devices pointing to the same sparse file.
To do this, we introduce a "cleanup" formula, which can operate in two
modes, 'upgrade' and 'downgrade' (controlled via pillar).
For upgrade, we manage this cleanup in the `deploy_node` orchestrate,
during the drain.
For downgrade, we cannot change the `deploy_node` orchestrate, so we
manually drain and cleanup from the `downgrade` orchestrate.

See: #2982
gdemonet added a commit that referenced this issue Jan 7, 2021
This will ensure all loop devices are managed as expected after an
upgrade.
Note that this means invalid Volumes which were not prepared yet will
break upgrade.

Fixes: #2982
gdemonet added a commit that referenced this issue Jan 7, 2021
The `volume_utils` module interacts with a C library, and some values
are `c_char_p` which translates to bytes in Python 3 (instead of string
in Python 2).
This change breaks the blkid probe, which in turns makes the
`metalk8s_volumes.is_prepared` execution function always return False.
Adding `.decode('ascii')` to the relevant places fixes this issue.

See: bddea0b
See: #2982
gdemonet added a commit that referenced this issue Jan 8, 2021
This will simplify the introduction of other states depending on
`.installed`, into which we move the systemd unit template and cleanup
script creation.

See: #2982
gdemonet added a commit that referenced this issue Jan 8, 2021
Since the management of loop devices changed between 2.6 and 2.7
(introduction of systemd units for this purpose), we want to ensure
there will not be duplicated devices pointing to the same sparse file.
To do this, we introduce a "cleanup" formula, which can operate in two
modes, 'upgrade' and 'downgrade' (controlled via pillar).
For upgrade, we manage this cleanup in the `deploy_node` orchestrate,
during the drain.
For downgrade, we cannot change the `deploy_node` orchestrate, so we
manually drain and cleanup from the `downgrade` orchestrate.

See: #2982
gdemonet added a commit that referenced this issue Jan 8, 2021
This will ensure all loop devices are managed as expected after an
upgrade.
Note that this means invalid Volumes which were not prepared yet will
break upgrade.

Fixes: #2982
gdemonet added a commit that referenced this issue Jan 8, 2021
The `volume_utils` module interacts with a C library, and some values
are `c_char_p` which translates to bytes in Python 3 (instead of string
in Python 2).
This change breaks the blkid probe, which in turns makes the
`metalk8s_volumes.is_prepared` execution function always return False.
Adding `.decode('ascii')` to the relevant places fixes this issue.

See: bddea0b
See: #2982
gdemonet added a commit that referenced this issue Jan 8, 2021
This will simplify the introduction of other states depending on
`.installed`, into which we move the systemd unit template and cleanup
script creation.

See: #2982
gdemonet added a commit that referenced this issue Jan 8, 2021
Since the management of loop devices changed between 2.6 and 2.7
(introduction of systemd units for this purpose), we want to ensure
there will not be duplicated devices pointing to the same sparse file.
To do this, we introduce a "cleanup" formula, which can operate in two
modes, 'upgrade' and 'downgrade' (controlled via pillar).
For upgrade, we manage this cleanup in the `deploy_node` orchestrate,
during the drain.
For downgrade, we cannot change the `deploy_node` orchestrate, so we
manually drain and cleanup from the `downgrade` orchestrate.

See: #2982
gdemonet added a commit that referenced this issue Jan 8, 2021
This will ensure all loop devices are managed as expected after an
upgrade.
Note that this means invalid Volumes which were not prepared yet will
break upgrade.

Fixes: #2982
gdemonet added a commit that referenced this issue Jan 8, 2021
The `volume_utils` module interacts with a C library, and some values
are `c_char_p` which translates to bytes in Python 3 (instead of string
in Python 2).
This change breaks the blkid probe, which in turns makes the
`metalk8s_volumes.is_prepared` execution function always return False.
Adding `.decode('ascii')` to the relevant places fixes this issue.

See: bddea0b
See: #2982
gdemonet added a commit that referenced this issue Jan 11, 2021
Since the management of loop devices changed between 2.6 and 2.7
(introduction of systemd units for this purpose), we want to ensure
there will not be duplicated devices pointing to the same sparse file.
To do this, we introduce a "cleanup" formula, which can operate in two
modes, 'upgrade' and 'downgrade' (controlled via pillar).
For upgrade, we manage this cleanup in the `deploy_node` orchestrate,
during the drain.
For downgrade, we cannot change the `deploy_node` orchestrate, so we
manually drain and cleanup from the `downgrade` orchestrate.

See: #2982
gdemonet added a commit that referenced this issue Jan 11, 2021
This will ensure all loop devices are managed as expected after an
upgrade.
Note that this means invalid Volumes which were not prepared yet will
break upgrade.

Fixes: #2982
gdemonet added a commit that referenced this issue Jan 11, 2021
The `volume_utils` module interacts with a C library, and some values
are `c_char_p` which translates to bytes in Python 3 (instead of string
in Python 2).
This change breaks the blkid probe, which in turns makes the
`metalk8s_volumes.is_prepared` execution function always return False.
Adding `.decode('ascii')` to the relevant places fixes this issue.

See: bddea0b
See: #2982
gdemonet added a commit that referenced this issue Jan 12, 2021
This will simplify the introduction of other states depending on
`.installed`, into which we move the systemd unit template and cleanup
script creation.

See: #2982
gdemonet added a commit that referenced this issue Jan 12, 2021
Since the management of loop devices changed between 2.6 and 2.7
(introduction of systemd units for this purpose), we want to ensure
there will not be duplicated devices pointing to the same sparse file.
To do this, we introduce a "cleanup" formula, which can operate in two
modes, 'upgrade' and 'downgrade' (controlled via pillar).
For upgrade, we manage this cleanup in the `deploy_node` orchestrate,
during the drain.
For downgrade, we cannot change the `deploy_node` orchestrate, so we
manually drain and cleanup from the `downgrade` orchestrate.

See: #2982
gdemonet added a commit that referenced this issue Jan 12, 2021
This will ensure all loop devices are managed as expected after an
upgrade.
Note that this means invalid Volumes which were not prepared yet will
break upgrade.

Fixes: #2982
gdemonet added a commit that referenced this issue Jan 12, 2021
The `volume_utils` module interacts with a C library, and some values
are `c_char_p` which translates to bytes in Python 3 (instead of string
in Python 2).
This change breaks the blkid probe, which in turns makes the
`metalk8s_volumes.is_prepared` execution function always return False.
Adding `.decode('ascii')` to the relevant places fixes this issue.

See: bddea0b
See: #2982
gdemonet added a commit that referenced this issue Jan 12, 2021
This will simplify the introduction of other states depending on
`.installed`, into which we move the systemd unit template and cleanup
script creation.

See: #2982
gdemonet added a commit that referenced this issue Jan 12, 2021
Since the management of loop devices changed between 2.6 and 2.7
(introduction of systemd units for this purpose), we want to ensure
there will not be duplicated devices pointing to the same sparse file.
To do this, we introduce a "cleanup" formula, which can operate in two
modes, 'upgrade' and 'downgrade' (controlled via pillar).
For upgrade, we manage this cleanup in the `deploy_node` orchestrate,
during the drain.
For downgrade, we cannot change the `deploy_node` orchestrate, so we
manually drain and cleanup from the `downgrade` orchestrate.

See: #2982
gdemonet added a commit that referenced this issue Jan 12, 2021
The `volume_utils` module interacts with a C library, and some values
are `c_char_p` which translates to bytes in Python 3 (instead of string
in Python 2).
This change breaks the blkid probe, which in turns makes the
`metalk8s_volumes.is_prepared` execution function always return False.
Adding `.decode('ascii')` to the relevant places fixes this issue.

See: bddea0b
See: #2982
@bert-e bert-e closed this as completed in e9f5df1 Jan 12, 2021
gdemonet added a commit that referenced this issue Jan 17, 2021
The `target_is_2_7` variable was defined within an `if` block and used
outside of it, breaking the Jinja rendering.

See: #2982
See: 0355a8a
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
complexity:easy Something that requires less than a day to fix kind:bug Something isn't working release:blocker An issue that blocks a release until resolved topic:lifecycle Issues related to upgrade or downgrade of MetalK8s topic:salt Everything related to SaltStack in our product topic:storage Issues related to storage
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant