Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deps: bump to race-free c/image and c/storage along with test to verify concurrent/parallel builds #13404

Merged

Conversation

flouthoc
Copy link
Collaborator

@flouthoc flouthoc commented Mar 2, 2022

Commits explained in bullet points.

  • Bump c/storage to main/d06b0f
  • Bump c/image to main/9a9cd9

New Names API in c/storage and c/image ensures that Name modification operations are race-free.

  • Adds a test to verify parallel/concurrent builds via podman don't race against each other.

Read more about problem statement here: containers/buildah#3794 and #13339

flouthoc added 2 commits March 2, 2022 18:15
Bump c/storage to main/d06b0f so we podman could use new `race-free`
`AddNames` and `RemoveNames` api

Signed-off-by: Aditya R <[email protected]>
Bump c/image to upstream main/9a9cd9 so podman could use new race-free
code.

Signed-off-by: Aditya R <[email protected]>
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 2, 2022
test/system/070-build.bats Outdated Show resolved Hide resolved
@flouthoc flouthoc force-pushed the bump-to-race-free-deps branch from fe7137e to d9b98cf Compare March 3, 2022 05:53
test/system/070-build.bats Outdated Show resolved Hide resolved
@flouthoc flouthoc force-pushed the bump-to-race-free-deps branch 2 times, most recently from 093791e to 009fa0e Compare March 3, 2022 09:07
Copy link
Member

@giuseppe giuseppe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@edsantiago
Copy link
Member

/hold

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 3, 2022
Copy link
Member

@edsantiago edsantiago left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not test what you think it tests. There is no parallelization here anywhere that I can see.

test/system/070-build.bats Outdated Show resolved Hide resolved
@edsantiago edsantiago changed the title deps: bump to race-free c/image and c/storage along with test to verify concurrent/parallel builds [DO NOT MERGE] deps: bump to race-free c/image and c/storage along with test to verify concurrent/parallel builds Mar 3, 2022
@edsantiago
Copy link
Member

I'm rewriting the test. I will post my changes once I'm done. This must not merge.

@edsantiago
Copy link
Member

edsantiago commented Mar 3, 2022

This is super-dangerous, it runs the risk of hanging forever, but I can find no way around that.

I tried to grok the bug but can't; so I don't know if it's important to build lots-of-different-images or if you want all-the-same. If you want the latter, just change echo hi$i in the Containerfile to echo hi or something like that.
UPDATED: indeed, with just RUN echo hi (no $i) my test fails pretty consistently on main.

@test "podman parallel build should not race" {
    # Start thirty builds in parallel
    local count=30
    local -a pids
    for i in $(seq --format '%02g' 1 $count); do
        local d=$PODMAN_TMPDIR/build-test$i
        mkdir $d
        cat >$d/Containerfile <<EOF
FROM $IMAGE
RUN echo hi
EOF
        $PODMAN build -t i$i $d &>/dev/null &
    done

    # DANGER DANGER DANGER: this has the potential to hang forever,
    # should any individual build fail. I can find no way to add a timeout.
    wait

    # Now delete each image. If an image wasn't built, rmi will fail
    for i in $(seq --format '%02g' 1 $count); do
        run_podman rmi i$i
    done
}

@giuseppe
Copy link
Member

giuseppe commented Mar 3, 2022

    # DANGER DANGER DANGER: this has the potential to hang forever,
    # should any individual build fail. I can find no way to add a timeout.

could we add the timeout on the podman build command itself, e.g. timeout -s KILL 60 podman build ...?

@flouthoc
Copy link
Collaborator Author

flouthoc commented Mar 3, 2022

@edsantiago @giuseppe Instead of wait could we use a sleep here ? So we don't hang for infinite ?

@edsantiago
Copy link
Member

edsantiago commented Mar 3, 2022

@giuseppe excellent idea, and it seems to work! Thank you! Here's the new code:

@test "podman parallel build should not race" {
    # Start thirty builds in parallel
    local count=30
    local -a pids
    for i in $(seq --format '%02g' 1 $count); do
        local d=$PODMAN_TMPDIR/build-test$i
        mkdir $d
        cat >$d/Containerfile <<EOF
FROM $IMAGE
RUN echo hi
EOF
        timeout --foreground -v --kill=10 60 $PODMAN build -t i$i $d &>/dev/null &
    done

    wait

    # Now delete all built images. If an image wasn't built, rmi will fail
    run_podman rmi $(seq --format 'i%02g' 1 $count)
}

(note the optimization in the rmi)

@flouthoc we do not use sleep if at all possible. It is evil: it forces a constant delay no matter how long the builds take (forcing CI to take longer and longer, which makes the world a worse place). It is also not a guarantee: it is possible for some circumstance to make build take very long one day, and the sleep would finish first, and we would have an impossible-to-debug flake.

@flouthoc flouthoc force-pushed the bump-to-race-free-deps branch 2 times, most recently from 2b7c5ca to 7879b77 Compare March 3, 2022 13:45
@flouthoc
Copy link
Collaborator Author

flouthoc commented Mar 3, 2022

@edsantiago Thanks a lot for patching the test and i have verified it 👍 it does not fails on my local lets see on the CI.

@flouthoc flouthoc requested a review from edsantiago March 3, 2022 13:46
@edsantiago
Copy link
Member

Update: here's a cleaner, less sloppy version of the test. Please use this one, it is less misleading (i.e. there is no reason to create multiple Containerfiles; and the comments are better)

@test "podman parallel build should not race" {
    # Run thirty parallel builds using the same Containerfile
    cat >$PODMAN_TMPDIR/Containerfile <<EOF
FROM $IMAGE
RUN echo hi
EOF

    local count=30
    for i in $(seq --format '%02g' 1 $count); do
        timeout --foreground -v --kill=10 60 \
                $PODMAN build -t i$i $PODMAN_TMPDIR &>/dev/null &
    done

    # Wait for all background builds to complete. Note that this succeeds
    # even if some of the individual builds fail! Our actual test is below.
    wait

    # Now delete all built images. If any image wasn't built, rmi will fail
    # and test will fail.
    run_podman rmi $(seq --format 'i%02g' 1 $count)
}

@edsantiago edsantiago changed the title [DO NOT MERGE] deps: bump to race-free c/image and c/storage along with test to verify concurrent/parallel builds deps: bump to race-free c/image and c/storage along with test to verify concurrent/parallel builds Mar 3, 2022
@flouthoc flouthoc force-pushed the bump-to-race-free-deps branch from 7879b77 to 063c312 Compare March 3, 2022 13:50
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 3, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: edsantiago, flouthoc, giuseppe

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [edsantiago,flouthoc,giuseppe]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@edsantiago
Copy link
Member

Oops. Looks like you need skip_if_remote

Invoking parallel/concurrent builds from podman race against each other
following behviour was fixed in
containers/storage#1153 and containers/image#1480

Test verifies if following bug is fixed in new race-free API or not.
Read more about this issue, see bz 2055487 for more details.

More details here: containers/buildah#3794 and containers#13339

Co-authored-by: Ed Santiago <[email protected]>
Signed-off-by: Aditya R <[email protected]>
@flouthoc flouthoc force-pushed the bump-to-race-free-deps branch from 063c312 to 63f92d0 Compare March 3, 2022 15:34
@flouthoc flouthoc removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 3, 2022
@edsantiago
Copy link
Member

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Mar 3, 2022
@openshift-merge-robot openshift-merge-robot merged commit 3cfb70f into containers:main Mar 3, 2022
@flouthoc
Copy link
Collaborator Author

flouthoc commented Mar 3, 2022

FYI @TomSweeneyRedHat

@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 21, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 21, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants