OCPBUGS-37711: fix: ensure wipefs / dmsetup remove use CombinedOutput to wait for both STDERR and STDOUT before continuing #687

jakobmoellerdev · 2024-08-07T13:00:51Z

This fix is a set of corrections applied to our wiping behavior in order to prohibit a race between different command invocations:

for given

for given 
lsblk -o Kname,name
(vdb was created with one partition (type 8E), after which a vg and lv thin pool have been created on it)
vdb   vdb
vdb1  `-vdb1
dm-0    |-some--other--vg-some--lv_tmeta
dm-2    | `-some--other--vg-some--lv
dm-1    `-some--other--vg-some--lv_tdata
dm-2      `-some--other--vg-some--lv

the commands will now be

dmsetup remove --force /dev/dm-2
dmsetup remove --force /dev/dm-0
dmsetup remove --force /dev/dm-2 //no-op
dmsetup remove --force /dev/dm-1
wipefs --all --force /dev/vdb // will do ioctl reload and cause partition table refresh

Previously, we encountered an issue because we used the RunCommandAsHostInto method as well as StartCommandWithOutputAsHost. However, these methods assume that (as is the case with lvm2) STDERR is not necessary to determine program completion, just STDOUT. However, if a program like wipefs / dmsetup issues a STDERR log after the STDOUT pipe closes, then this is not recognized. This results in the situation that wipefs calls can sometimes "double up" and race each other, which causes unpredictable wiping behavior.

By switching to CombinedOutputCommandAsHost we wait for both STDOUT & STDERR to complete fully before moving on as the optimizations for parsing via streaming are not necessary in dmsetup or wipefs.

for given lsblk -o Kname,name (vdb was created with one partition (type 8E), after which a vg and lv thin pool have been created on it) vdb vdb vdb1 `-vdb1 dm-0 |-some--other--vg-some--lv_tmeta dm-2 | `-some--other--vg-some--lv dm-1 `-some--other--vg-some--lv_tdata dm-2 `-some--other--vg-some--lv the commands will now be dmsetup remove --force /dev/dm-2 dmsetup remove --force /dev/dm-0 dmsetup remove --force /dev/dm-2 //no-op dmsetup remove --force /dev/dm-1 wipefs --all --force /dev/vdb // will do ioctl reload and cause partition table refresh Signed-off-by: Jakob Möller <[email protected]>

openshift-ci-robot · 2024-08-07T13:00:59Z

@jakobmoellerdev: This pull request references Jira Issue OCPBUGS-37711, which is valid.

3 validation(s) were run on this bug

bug is open, matching expected state (open)
bug target version (4.17.0) matches configured target version for branch (4.17.0)
bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @radeore

In response to this:

This fix is a set of corrections applied to our wiping behavior in order to prohibit a race between different command invocations:

for given
lsblk -o Kname,name
(vdb was created with one partition (type 8E), after which a vg and lv thin pool have been created on it)
vdb vdb
vdb1 -vdb1 dm-0 |-some--other--vg-some--lv_tmeta [dm-2](https://issues.redhat.com//browse/dm-2) | -some--other--vg-some--lv
dm-1 -some--other--vg-some--lv_tdata [dm-2](https://issues.redhat.com//browse/dm-2) -some--other--vg-some--lv

the commands will now be
dmsetup remove --force /dev/dm-2
dmsetup remove --force /dev/dm-0
dmsetup remove --force /dev/dm-2 //no-op
dmsetup remove --force /dev/dm-1
wipefs --all --force /dev/vdb // will do ioctl reload and cause partition table refresh

Previously, we encountered an issue because we used the RunCommandAsHostInto method as well as StartCommandWithOutputAsHost. However, these methods assume that (as is the case with lvm2) STDERR is not necessary to determine program completion, just STDOUT. However, if a program like wipefs / dmsetup issues a STDERR log after the STDOUT pipe closes, then this is not recognized. This results in the situation that wipefs calls can sometimes "double up" and race each other, which causes unpredictable wiping behavior.

By switching to CombinedOutputCommandAsHost we wait for both STDOUT & STDERR to complete fully before moving on as the optimizations for parsing via streaming are not necessary in dmsetup or wipefs.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot · 2024-08-07T13:01:00Z

@jakobmoellerdev: This pull request references Jira Issue OCPBUGS-37711, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug

bug is open, matching expected state (open)
bug target version (4.17.0) matches configured target version for branch (4.17.0)
bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @radeore

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

This fix is a set of corrections applied to our wiping behavior in order to prohibit a race between different command invocations:

for given
lsblk -o Kname,name
(vdb was created with one partition (type 8E), after which a vg and lv thin pool have been created on it)
vdb vdb
vdb1 -vdb1 dm-0 |-some--other--vg-some--lv_tmeta [dm-2](https://issues.redhat.com//browse/dm-2) | -some--other--vg-some--lv
dm-1 -some--other--vg-some--lv_tdata [dm-2](https://issues.redhat.com//browse/dm-2) -some--other--vg-some--lv

the commands will now be
dmsetup remove --force /dev/dm-2
dmsetup remove --force /dev/dm-0
dmsetup remove --force /dev/dm-2 //no-op
dmsetup remove --force /dev/dm-1
wipefs --all --force /dev/vdb // will do ioctl reload and cause partition table refresh

Previously, we encountered an issue because we used the RunCommandAsHostInto method as well as StartCommandWithOutputAsHost. However, these methods assume that (as is the case with lvm2) STDERR is not necessary to determine program completion, just STDOUT. However, if a program like wipefs / dmsetup issues a STDERR log after the STDOUT pipe closes, then this is not recognized. This results in the situation that wipefs calls can sometimes "double up" and race each other, which causes unpredictable wiping behavior.

By switching to CombinedOutputCommandAsHost we wait for both STDOUT & STDERR to complete fully before moving on as the optimizations for parsing via streaming are not necessary in dmsetup or wipefs.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci · 2024-08-07T13:01:10Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jakobmoellerdev

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [jakobmoellerdev]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci-robot · 2024-08-07T13:01:31Z

@jakobmoellerdev: This pull request references Jira Issue OCPBUGS-37711, which is valid.

3 validation(s) were run on this bug

bug is open, matching expected state (open)
bug target version (4.17.0) matches configured target version for branch (4.17.0)
bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @radeore

In response to this:

This fix is a set of corrections applied to our wiping behavior in order to prohibit a race between different command invocations:

for given
lsblk -o Kname,name
for given 
lsblk -o Kname,name
(vdb was created with one partition (type 8E), after which a vg and lv thin pool have been created on it)
vdb   vdb
vdb1  `-vdb1
dm-0    |-some--other--vg-some--lv_tmeta
[dm-2](https://issues.redhat.com//browse/dm-2)    | `-some--other--vg-some--lv
[dm-1](https://issues.redhat.com//browse/dm-1)    `-some--other--vg-some--lv_tdata
[dm-2](https://issues.redhat.com//browse/dm-2)      `-some--other--vg-some--lv
the commands will now be
dmsetup remove --force /dev/dm-2
dmsetup remove --force /dev/dm-0
dmsetup remove --force /dev/dm-2 //no-op
dmsetup remove --force /dev/dm-1
wipefs --all --force /dev/vdb // will do ioctl reload and cause partition table refresh

Previously, we encountered an issue because we used the RunCommandAsHostInto method as well as StartCommandWithOutputAsHost. However, these methods assume that (as is the case with lvm2) STDERR is not necessary to determine program completion, just STDOUT. However, if a program like wipefs / dmsetup issues a STDERR log after the STDOUT pipe closes, then this is not recognized. This results in the situation that wipefs calls can sometimes "double up" and race each other, which causes unpredictable wiping behavior.

By switching to CombinedOutputCommandAsHost we wait for both STDOUT & STDERR to complete fully before moving on as the optimizations for parsing via streaming are not necessary in dmsetup or wipefs.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

codecov-commenter · 2024-08-07T13:12:40Z

Codecov Report

Attention: Patch coverage is 38.29787% with 29 lines in your changes missing coverage. Please review.

Project coverage is 70.50%. Comparing base (0c7a38e) to head (d80e311).

Files	Patch %	Lines
internal/controllers/vgmanager/exec/exec.go	0.00%	16 Missing ⚠️
...ternal/controllers/lvmcluster/resource/csi_node.go	0.00%	4 Missing ⚠️
internal/controllers/vgmanager/wipe_devices.go	60.00%	2 Missing and 2 partials ⚠️
...ernal/controllers/vgmanager/exec/test/mock_exec.go	50.00%	3 Missing ⚠️
internal/controllers/vgmanager/dmsetup/dmsetup.go	80.00%	1 Missing ⚠️
internal/controllers/vgmanager/wipefs/wipefs.go	83.33%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #687      +/-   ##
==========================================
- Coverage   71.06%   70.50%   -0.57%     
==========================================
  Files          47       47              
  Lines        3225     3245      +20     
==========================================
- Hits         2292     2288       -4     
- Misses        766      788      +22     
- Partials      167      169       +2

Files	Coverage Δ
internal/controllers/vgmanager/dmsetup/dmsetup.go	`81.25% <80.00%> (+13.06%)`	⬆️
internal/controllers/vgmanager/wipefs/wipefs.go	`85.71% <83.33%> (ø)`
...ernal/controllers/vgmanager/exec/test/mock_exec.go	`55.55% <50.00%> (-27.78%)`	⬇️
...ternal/controllers/lvmcluster/resource/csi_node.go	`68.51% <0.00%> (-4.04%)`	⬇️
internal/controllers/vgmanager/wipe_devices.go	`76.31% <60.00%> (-3.40%)`	⬇️
internal/controllers/vgmanager/exec/exec.go	`0.00% <0.00%> (ø)`

... and 4 files with indirect coverage changes

internal/controllers/vgmanager/wipe_devices.go

openshift-ci · 2024-08-07T14:16:18Z

@jakobmoellerdev: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

suleymanakbas91 · 2024-08-07T14:17:32Z

/lgtm

openshift-ci-robot · 2024-08-07T14:20:37Z

@jakobmoellerdev: Jira Issue OCPBUGS-37711: All pull requests linked via external trackers have merged:

Jira Issue OCPBUGS-37711 has been moved to the MODIFIED state.

In response to this:

This fix is a set of corrections applied to our wiping behavior in order to prohibit a race between different command invocations:

for given
for given 
lsblk -o Kname,name
(vdb was created with one partition (type 8E), after which a vg and lv thin pool have been created on it)
vdb   vdb
vdb1  `-vdb1
dm-0    |-some--other--vg-some--lv_tmeta
[dm-2](https://issues.redhat.com//browse/dm-2)    | `-some--other--vg-some--lv
[dm-1](https://issues.redhat.com//browse/dm-1)    `-some--other--vg-some--lv_tdata
[dm-2](https://issues.redhat.com//browse/dm-2)      `-some--other--vg-some--lv
the commands will now be
dmsetup remove --force /dev/dm-2
dmsetup remove --force /dev/dm-0
dmsetup remove --force /dev/dm-2 //no-op
dmsetup remove --force /dev/dm-1
wipefs --all --force /dev/vdb // will do ioctl reload and cause partition table refresh
Previously, we encountered an issue because we used the RunCommandAsHostInto method as well as StartCommandWithOutputAsHost. However, these methods assume that (as is the case with lvm2) STDERR is not necessary to determine program completion, just STDOUT. However, if a program like wipefs / dmsetup issues a STDERR log after the STDOUT pipe closes, then this is not recognized. This results in the situation that wipefs calls can sometimes "double up" and race each other, which causes unpredictable wiping behavior.

By switching to CombinedOutputCommandAsHost we wait for both STDOUT & STDERR to complete fully before moving on as the optimizations for parsing via streaming are not necessary in dmsetup or wipefs.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

jakobmoellerdev added 2 commits August 7, 2024 14:00

openshift-ci bot requested a review from radeore August 7, 2024 13:01

openshift-ci bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Aug 7, 2024

openshift-ci bot requested review from jeff-roche and jerpeter1 August 7, 2024 13:01

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 7, 2024

suleymanakbas91 reviewed Aug 7, 2024

View reviewed changes

internal/controllers/vgmanager/wipe_devices.go Show resolved Hide resolved

openshift-ci bot assigned suleymanakbas91 Aug 7, 2024

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Aug 7, 2024

openshift-merge-bot bot merged commit 1357036 into openshift:main Aug 7, 2024
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OCPBUGS-37711: fix: ensure wipefs / dmsetup remove use CombinedOutput to wait for both STDERR and STDOUT before continuing #687

OCPBUGS-37711: fix: ensure wipefs / dmsetup remove use CombinedOutput to wait for both STDERR and STDOUT before continuing #687

jakobmoellerdev commented Aug 7, 2024 •

edited

Loading

openshift-ci-robot commented Aug 7, 2024

openshift-ci-robot commented Aug 7, 2024 •

edited by openshift-ci bot

Loading

openshift-ci bot commented Aug 7, 2024

openshift-ci-robot commented Aug 7, 2024 •

edited by openshift-ci bot

Loading

codecov-commenter commented Aug 7, 2024

openshift-ci bot commented Aug 7, 2024

suleymanakbas91 commented Aug 7, 2024

openshift-ci-robot commented Aug 7, 2024 •

edited by openshift-ci bot

Loading

OCPBUGS-37711: fix: ensure wipefs / dmsetup remove use CombinedOutput to wait for both STDERR and STDOUT before continuing #687

OCPBUGS-37711: fix: ensure wipefs / dmsetup remove use CombinedOutput to wait for both STDERR and STDOUT before continuing #687

Conversation

jakobmoellerdev commented Aug 7, 2024 • edited Loading

openshift-ci-robot commented Aug 7, 2024

openshift-ci-robot commented Aug 7, 2024 • edited by openshift-ci bot Loading

openshift-ci bot commented Aug 7, 2024

openshift-ci-robot commented Aug 7, 2024 • edited by openshift-ci bot Loading

codecov-commenter commented Aug 7, 2024

Codecov Report

openshift-ci bot commented Aug 7, 2024

suleymanakbas91 commented Aug 7, 2024

openshift-ci-robot commented Aug 7, 2024 • edited by openshift-ci bot Loading

jakobmoellerdev commented Aug 7, 2024 •

edited

Loading

openshift-ci-robot commented Aug 7, 2024 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Aug 7, 2024 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Aug 7, 2024 •

edited by openshift-ci bot

Loading