Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backport fixes to release/2.7 #890

Merged
merged 7 commits into from
Nov 26, 2024
Merged

Conversation

tiagolobocastro
Copy link
Contributor

chore(bors): merge pull request #887

887: Fix regression for pool creation timeout retry r=tiagolobocastro a=tiagolobocastro

    test: use tmp in project workspace

    Use a tmp folder from the workspace allowing us to cleanup up things like
    LVM volumes a lot easier as we can just purge it.

    Signed-off-by: Tiago Castro <[email protected]>

---

    test(pool): create on very large or very slow disks

    Uses LVM Lvols as backend devices for the pool.
    We suspend these before pool creation, allowing us to simulate slow
    pool creation.
    This test ensures that the pool creation is completed by itself and also
    that a client can also complete it by calling create again.

    Signed-off-by: Tiago Castro <[email protected]>

---

    fix: allow pool creation to complete asynchronously

    When the initial create gRPC times out, the data-plane may still be creating
    the pool in the background, which can happen for very large pools.
    Rather than assume failure, we allow this to complete in the background up to
    a large arbitrary amount of time. If the pool creation completes before, then
    we retry the creation flow.
    The reason why we don't simply use very large timeouts is because the gRPC
    operations are currently sequential, mostly due to historical reasons.
    Now that the data-plane is allowing concurrent calls, we should also allow
    this on the control-plane.
    TODO: allow concurrent node operations

    Signed-off-by: Tiago Castro <[email protected]>

---

    fix: check for correct not found error code

    A previous fix ended up not working correctly because it was merged
    incorrectly, somehow!

    Signed-off-by: Tiago Castro <[email protected]>

---

    chore: update terraform node prep

    Pull the Release key from a recent k8s version since the old keys are no
    longer valid.
    This will have to be updated from time to time.

Co-authored-by: Tiago Castro <[email protected]>

fix(resize): atomically check for the required size

Ensures races don't lead into volume resize failures.

Signed-off-by: Tiago Castro <[email protected]>

test(bdd/thin): fix racy thin prov test

Add retry waiting for condition to be met.

Signed-off-by: Tiago Castro <[email protected]>

feat(topology): remove the internal labels while displaying

Signed-off-by: sinhaashish <[email protected]>

fix(fsfreeze): improved error message when volume is not staged

Signed-off-by: Abhinandan Purkait <[email protected]>

fix(deployer): increasing the max number of allowed connection attempts to the io-engine

Signed-off-by: sinhaashish <[email protected]>

fix(topology): hasTopologyKey overwites affinityTopologyLabels

Signed-off-by: sinhaashish <[email protected]>

@tiagolobocastro
Copy link
Contributor Author

bors try

bors-openebs-mayastor bot pushed a commit that referenced this pull request Nov 26, 2024
@bors-openebs-mayastor
Copy link

try

Build succeeded:

@tiagolobocastro
Copy link
Contributor Author

bors merge

bors-openebs-mayastor bot pushed a commit that referenced this pull request Nov 26, 2024
890: Backport fixes to release/2.7 r=tiagolobocastro a=tiagolobocastro

    chore(bors): merge pull request #887
    
    887: Fix regression for pool creation timeout retry r=tiagolobocastro a=tiagolobocastro
    
        test: use tmp in project workspace
    
        Use a tmp folder from the workspace allowing us to cleanup up things like
        LVM volumes a lot easier as we can just purge it.
    
        Signed-off-by: Tiago Castro <[email protected]>
    
    ---
    
        test(pool): create on very large or very slow disks
    
        Uses LVM Lvols as backend devices for the pool.
        We suspend these before pool creation, allowing us to simulate slow
        pool creation.
        This test ensures that the pool creation is completed by itself and also
        that a client can also complete it by calling create again.
    
        Signed-off-by: Tiago Castro <[email protected]>
    
    ---
    
        fix: allow pool creation to complete asynchronously
    
        When the initial create gRPC times out, the data-plane may still be creating
        the pool in the background, which can happen for very large pools.
        Rather than assume failure, we allow this to complete in the background up to
        a large arbitrary amount of time. If the pool creation completes before, then
        we retry the creation flow.
        The reason why we don't simply use very large timeouts is because the gRPC
        operations are currently sequential, mostly due to historical reasons.
        Now that the data-plane is allowing concurrent calls, we should also allow
        this on the control-plane.
        TODO: allow concurrent node operations
    
        Signed-off-by: Tiago Castro <[email protected]>
    
    ---
    
        fix: check for correct not found error code
    
        A previous fix ended up not working correctly because it was merged
        incorrectly, somehow!
    
        Signed-off-by: Tiago Castro <[email protected]>
    
    ---
    
        chore: update terraform node prep
    
        Pull the Release key from a recent k8s version since the old keys are no
        longer valid.
        This will have to be updated from time to time.
    
    Co-authored-by: Tiago Castro <[email protected]>

---

    fix(resize): atomically check for the required size
    
    Ensures races don't lead into volume resize failures.
    
    Signed-off-by: Tiago Castro <[email protected]>

---

    test(bdd/thin): fix racy thin prov test
    
    Add retry waiting for condition to be met.
    
    Signed-off-by: Tiago Castro <[email protected]>

---

    feat(topology): remove the internal labels while displaying
    
    Signed-off-by: sinhaashish <[email protected]>

---

    fix(fsfreeze): improved error message when volume is not staged
    
    Signed-off-by: Abhinandan Purkait <[email protected]>

---

    fix(deployer): increasing the max number of allowed connection attempts to the io-engine
    
    Signed-off-by: sinhaashish <[email protected]>

---

    fix(topology): hasTopologyKey overwites affinityTopologyLabels
    
    Signed-off-by: sinhaashish <[email protected]>


Co-authored-by: sinhaashish <[email protected]>
Co-authored-by: Abhinandan Purkait <[email protected]>
Co-authored-by: Tiago Castro <[email protected]>
Co-authored-by: mayastor-bors <[email protected]>
@bors-openebs-mayastor
Copy link

Build failed:

Ensures races don't lead into volume resize failures.

Signed-off-by: Tiago Castro <[email protected]>
887: Fix regression for pool creation timeout retry r=tiagolobocastro a=tiagolobocastro

    test: use tmp in project workspace

    Use a tmp folder from the workspace allowing us to cleanup up things like
    LVM volumes a lot easier as we can just purge it.

    Signed-off-by: Tiago Castro <[email protected]>

---

    test(pool): create on very large or very slow disks

    Uses LVM Lvols as backend devices for the pool.
    We suspend these before pool creation, allowing us to simulate slow
    pool creation.
    This test ensures that the pool creation is completed by itself and also
    that a client can also complete it by calling create again.

    Signed-off-by: Tiago Castro <[email protected]>

---

    fix: allow pool creation to complete asynchronously

    When the initial create gRPC times out, the data-plane may still be creating
    the pool in the background, which can happen for very large pools.
    Rather than assume failure, we allow this to complete in the background up to
    a large arbitrary amount of time. If the pool creation completes before, then
    we retry the creation flow.
    The reason why we don't simply use very large timeouts is because the gRPC
    operations are currently sequential, mostly due to historical reasons.
    Now that the data-plane is allowing concurrent calls, we should also allow
    this on the control-plane.
    TODO: allow concurrent node operations

    Signed-off-by: Tiago Castro <[email protected]>

---

    fix: check for correct not found error code

    A previous fix ended up not working correctly because it was merged
    incorrectly, somehow!

    Signed-off-by: Tiago Castro <[email protected]>

---

    chore: update terraform node prep

    Pull the Release key from a recent k8s version since the old keys are no
    longer valid.
    This will have to be updated from time to time.

Co-authored-by: Tiago Castro <[email protected]>
Signed-off-by: Tiago Castro <[email protected]>
@tiagolobocastro
Copy link
Contributor Author

bors merge

@bors-openebs-mayastor
Copy link

Build succeeded:

@bors-openebs-mayastor bors-openebs-mayastor bot merged commit b928b51 into release/2.7 Nov 26, 2024
4 checks passed
@bors-openebs-mayastor bors-openebs-mayastor bot deleted the cherry-pick branch November 26, 2024 18:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants