Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Progress bars sometimes not showing any progress #1013

Closed
vrothberg opened this issue Aug 10, 2020 · 13 comments · Fixed by #1332
Closed

Progress bars sometimes not showing any progress #1013

vrothberg opened this issue Aug 10, 2020 · 13 comments · Fixed by #1332

Comments

@vrothberg
Copy link
Member

I have noticed this in the past weeks a couple of times:

Getting image source signatures
Copying blob df20fa9351a1 [--------------------------------------] 0.0b / 0.0b
Copying config 0f5f445df8 done  
Writing manifest to image destination
Storing signatures

Sometimes, the progress bars do not show any progress but complete with [--------------------------------------] 0.0b / 0.0b. I did not test much but suspect that it only happens for small blobs and may be timing issue (i.e., completed before the bar could be rendered again).

@mtrmac
Copy link
Collaborator

mtrmac commented Aug 10, 2020

Thanks for filing this, I saw at least one instance of that recently as well.

If I had to guess, the go b.refreshTillShutdown() doesn’t manage to run before the context is canceled and the bar operator terminates, which causes the refresh request to be abandoned.

@vrothberg
Copy link
Member Author

I played a bit in Podman and it seems that we are killing the bars before they are finished (see https://github.com/containers/image/blob/master/copy/copy.go#L959). Maybe we need to cancel() only in error cases?

vrothberg added a commit to vrothberg/image that referenced this issue Aug 17, 2020
Wait for all progress bars to finish before cleaning up.  In case of
errors, clean up immediately.  Waiting for all bars closes a race
condition where a bar might not be rendered before being cleaned up.

Fixes: containers#1013
Signed-off-by: Valentin Rothberg <[email protected]>
vrothberg added a commit to vrothberg/image that referenced this issue Aug 17, 2020
Wait for all progress bars to finish before cleaning up.  In case of
errors, clean up immediately.  Waiting for all bars closes a race
condition where a bar might not be rendered before being cleaned up.

Fixes: containers#1013
Signed-off-by: Valentin Rothberg <[email protected]>
vrothberg added a commit to vrothberg/image that referenced this issue Aug 17, 2020
Wait for all progress bars to finish before cleaning up.  In case of
errors, clean up immediately.  Waiting for all bars closes a race
condition where a bar might not be rendered before being cleaned up.

Fixes: containers#1013
Signed-off-by: Valentin Rothberg <[email protected]>
@vrothberg
Copy link
Member Author

Closing. This has been fixed in commit ee5014c.

@giuseppe
Copy link
Member

I am afraid this is not fixed yet.

Is there any side effect if we have something like (to give some time to the progress bars to finish the rendering):

diff --git a/copy/copy.go b/copy/copy.go
index b4ff8aa..f4bac99 100644
--- a/copy/copy.go
+++ b/copy/copy.go
@@ -1067,7 +1067,10 @@ func (c *copier) newProgressPool(ctx context.Context) (*mpb.Progress, func()) {
        ctx, cancel := context.WithCancel(ctx)
        pool := mpb.NewWithContext(ctx, mpb.WithWidth(40), mpb.WithOutput(c.progressOutput))
        return pool, func() {
-               cancel()
+               go func() {
+                       time.Sleep(time.Second * 5)
+                       cancel()
+               }()
                pool.Wait()
        }
 }

@mtrmac
Copy link
Collaborator

mtrmac commented Jul 29, 2021

That’s a concerning approach.

There should be a right way to use the progress bar API, with no artifacts. Do we know exactly how the situation breaks? If not, I’d rather have ugly progress bars and a bug report than nice progress bars that (seem to) work for an unknown reason.

Especially because a time.Sleep like this is likely enough to bring the cancellation past the lifetime of simple processes like skopeo copy where we are likely to notice, but could blow up in unnoticed ways in long-running processes.

giuseppe added a commit to giuseppe/mpb that referenced this issue Jul 29, 2021
fix a race issue where a cancelled context might cause the progress
bar to never be rendered.

More details here: containers/image#1013

Signed-off-by: Giuseppe Scrivano <[email protected]>
@giuseppe
Copy link
Member

vbauerster/mpb#92 seems to solve the issue for me. Let's see if that works

@vrothberg
Copy link
Member Author

Interesting. Since commit ee5014c, I couldn't reproduce locally anymore.

giuseppe added a commit to giuseppe/mpb that referenced this issue Jul 31, 2021
fix a race issue where a cancelled context might cause the progress
bar to never be rendered.

More details here: containers/image#1013

Signed-off-by: Giuseppe Scrivano <[email protected]>
@giuseppe
Copy link
Member

The proposed PR for mpb was not accepted, @mtrmac @vrothberg would you be fine with something like:

diff --git a/copy/copy.go b/copy/copy.go
index b4ff8aa..bd2f703 100644
--- a/copy/copy.go
+++ b/copy/copy.go
@@ -124,6 +124,7 @@ type copier struct {
        ociEncryptConfig      *encconfig.EncryptConfig
        maxParallelDownloads  uint
        downloadForeignLayers bool
+       needsCancelProgress   bool
 }
 
 // imageCopier tracks state specific to a single image (possibly an item of a manifest list)
@@ -1067,7 +1068,9 @@ func (c *copier) newProgressPool(ctx context.Context) (*mpb.Progress, func()) {
        ctx, cancel := context.WithCancel(ctx)
        pool := mpb.NewWithContext(ctx, mpb.WithWidth(40), mpb.WithOutput(c.progressOutput))
        return pool, func() {
-               cancel()
+               if c.needsCancelProgress {
+                       cancel()
+               }
                pool.Wait()
        }
 }
@@ -1160,6 +1163,7 @@ func (c *copier) copyConfig(ctx context.Context, src types.Image) error {
                        bar := c.createProgressBar(progressPool, false, srcInfo, "config", "done")
                        destInfo, err := c.copyBlobFromStream(ctx, bytes.NewReader(configBlob), srcInfo, nil, false, true, false, bar, -1, false)
                        if err != nil {
+                               c.needsCancelProgress = true
                                return types.BlobInfo{}, err
                        }
                        bar.SetTotal(int64(len(configBlob)), true)
@@ -1326,6 +1330,7 @@ func (ic *imageCopier) copyLayer(ctx context.Context, srcInfo types.BlobInfo, to
 
        blobInfo, diffIDChan, err := ic.copyLayerFromStream(ctx, srcStream, types.BlobInfo{Digest: srcInfo.Digest, Size: srcBlobSize, MediaType: srcInfo.MediaType, Annotations: srcInfo.Annotations}, diffIDIsNeeded, toEncrypt, bar, layerIdex, emptyLayer)
        if err != nil {
+               ic.c.needsCancelProgress = true
                return types.BlobInfo{}, "", err
        }
 
@@ -1333,9 +1338,11 @@ func (ic *imageCopier) copyLayer(ctx context.Context, srcInfo types.BlobInfo, to
        if diffIDIsNeeded {
                select {
                case <-ctx.Done():
+                       ic.c.needsCancelProgress = true
                        return types.BlobInfo{}, "", ctx.Err()
                case diffIDResult := <-diffIDChan:
                        if diffIDResult.err != nil {
+                               ic.c.needsCancelProgress = true
                                return types.BlobInfo{}, "", errors.Wrap(diffIDResult.err, "computing layer DiffID")
                        }
                        logrus.Debugf("Computed DiffID %s for layer %s", diffIDResult.digest, srcInfo.Digest)
~
~
~
~

@mtrmac
Copy link
Collaborator

mtrmac commented Jul 31, 2021

I don’t know. What is the problem, and why is this a fix? I can’t tell whether the proposal is random tinkering, or the result of deeply understanding the situation, and without documentation of what are the essential parts of the fix we aren’t likely to correctly maintain in the future.

From my current understanding, this does not look right, if only because the lifetimes of a mpb.Progress and copier are so different.

(The MPB PR has had more conversations since the above comment.)

@mtrmac
Copy link
Collaborator

mtrmac commented Jul 31, 2021

See vbauerster/mpb#93 for more discussion; would the

bar := p.AddBar(…)
defer func() {
  if !p.Completed() {
    p.Abort()
  }
}

approach work?

@giuseppe
Copy link
Member

would the

bar := p.AddBar(…)
defer func() {
  if !p.Completed() {
    p.Abort()
  }
}

approach work?

yes. I think this works well and it is much cleaner than my proposed patch above. Could you open a PR?

@mtrmac
Copy link
Collaborator

mtrmac commented Aug 2, 2021

Per vbauerster/mpb#93 , let’s wait until the next release, and then:

  • Drop the context from the Progress container; have p.Wait block until all progress bars finish
  • To ensure they do, defer bar.Abort() (in a scope that terminates before the p.Wait)

@vrothberg
Copy link
Member Author

SGTM
Great work, @mtrmac and @giuseppe !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants