Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Several Switchover fixes #503

Merged
merged 4 commits into from
May 2, 2023
Merged

Several Switchover fixes #503

merged 4 commits into from
May 2, 2023

Conversation

tiagolobocastro
Copy link
Contributor

test(switchover/bdd): add robustness tests

Add tests to help exercise some corner cases seen during bug study.
Tbh these are not very precise as it's difficult to re-create some corner cases,
but nonetheless they are probably a decent starting point and can be used to help
manually try to recreate these issues in a more automated fashion.

Signed-off-by: Tiago Castro <[email protected]>

fix(switchover): adds several corner case fixes

Replace path is not idempotent in case the connection was established without moving path.
gRPC Timeouts for connection are not enough which leads into moving the target
when not necessary.
Report of failed paths does not differentiate errors at all.
Report of failed paths returns only 1 error.. we should have an error per path as we report.
 in batches.
In case of no resources for republish volume the ha cluster retries in a loop with no delay.
or backoff..
In error case we don’t clean up shutdown targets.
On switchover failure, failed paths are not cleared in the cluster agent.

Signed-off-by: Tiago Castro <[email protected]>

chore(python/bdd): pin to version 6

Newer version (not sure which exactly) seems to generate the test cases without a name.
todo: determine best version to use

Signed-off-by: Tiago Castro <[email protected]>

fix(ha-node): add subsystem sync for platform none

On the deployer the udev code we have for listening to disk events does not seem to work.
For now, assume it is incompatible and add a subsystem sync just for platform none.

Signed-off-by: Tiago Castro <[email protected]>

Copy link
Member

@Abhinandan-Purkait Abhinandan-Purkait left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Few nits.

control-plane/stor-port/src/transport_api/mod.rs Outdated Show resolved Hide resolved
tests/bdd/common/nvme.py Show resolved Hide resolved
tests/bdd/features/ha/robustness.feature Outdated Show resolved Hide resolved
tests/bdd/features/ha/robustness.feature Outdated Show resolved Hide resolved
tests/bdd/requirements.txt Show resolved Hide resolved
@tiagolobocastro
Copy link
Contributor Author

bors merge

bors bot pushed a commit that referenced this pull request May 2, 2023
503: Several Switchover fixes r=tiagolobocastro a=tiagolobocastro

    test(switchover/bdd): add robustness tests
    
    Add tests to help exercise some corner cases seen during bug study.
    Tbh these are not very precise as it's difficult to re-create some corner cases,
    but nonetheless they are probably a decent starting point and can be used to help
    manually try to recreate these issues in a more automated fashion.
    
    Signed-off-by: Tiago Castro <[email protected]>



Co-authored-by: Tiago Castro <[email protected]>
@bors
Copy link

bors bot commented May 2, 2023

Build failed:

On the deployer the udev code we have for listening to disk events does not seem to work.
For now, assume it is incompatible and add a subsystem sync just for platform none.

Signed-off-by: Tiago Castro <[email protected]>
Newer version (not sure which exactly) seems to generate the test cases without a name.
todo: determine best version to use

Signed-off-by: Tiago Castro <[email protected]>
Replace path is not idempotent in case the connection was established without moving path.
gRPC Timeouts for connection are not enough which leads into moving the target
when not necessary.
Report of failed paths does not differentiate errors at all.
Report of failed paths returns only 1 error.. we should have an error per path as we report.
 in batches.
In case of no resources for republish volume the ha cluster retries in a loop with no delay.
or backoff..
In error case we don’t clean up shutdown targets.
On switchover failure, failed paths are not cleared in the cluster agent.

Signed-off-by: Tiago Castro <[email protected]>
Add tests to help exercise some corner cases seen during bug study.
Tbh these are not very precise as it's difficult to re-create some corner cases,
but nonetheless they are probably a decent starting point and can be used to help
manually try to recreate these issues in a more automated fashion.

Signed-off-by: Tiago Castro <[email protected]>
@tiagolobocastro
Copy link
Contributor Author

bors merge

@bors
Copy link

bors bot commented May 2, 2023

Build succeeded!

The publicly hosted instance of bors-ng is deprecated and will go away soon.

If you want to self-host your own instance, instructions are here.
For more help, visit the forum.

If you want to switch to GitHub's built-in merge queue, visit their help page.

@bors bors bot merged commit 58211c1 into develop May 2, 2023
@bors bors bot deleted the ha-fixes branch May 2, 2023 12:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants