Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

podman events --since="nonsense" is nondeterministic #23165

Closed
edsantiago opened this issue Jul 2, 2024 · 3 comments · Fixed by #23170
Closed

podman events --since="nonsense" is nondeterministic #23165

edsantiago opened this issue Jul 2, 2024 · 3 comments · Fixed by #23170
Assignees
Labels
flakes Flakes from Continuous Integration locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.

Comments

@edsantiago
Copy link
Member

It will sometimes succeed!

$ t0=$SECONDS;i=0;while :;do i=$((i+1));bin/podman events --since="the dawn of time...ish" && break;done;printf "%d seconds, %d tries\n" $((SECONDS-$t0)) $i
...
Error: failed to parse event filters: unable to convert since time of the dawn of time...ish: unable to interpret time value
74 seconds, 3674 tries

And sometimes that will continue looping forever (or five minutes, which is ~equivalent). I have no idea what makes it "succeed". And, given a loop exit, no idea what to do to investigate.

@edsantiago edsantiago added the flakes Flakes from Continuous Integration label Jul 2, 2024
@edsantiago
Copy link
Member Author

A few more failures:

1361 seconds, 67355 tries
...
602 seconds, 29632 tries  (using --since=nonsense)

Still no idea what triggers this.

@edsantiago
Copy link
Member Author

Very weird. I thought the problem might be

podman/pkg/util/utils.go

Lines 1112 to 1141 in c279ce0

// ParseInputTime takes the users input and to determine if it is valid and
// returns a time format and error. The input is compared to known time formats
// or a duration which implies no-duration
func ParseInputTime(inputTime string, since bool) (time.Time, error) {
timeFormats := []string{time.RFC3339Nano, time.RFC3339, "2006-01-02T15:04:05", "2006-01-02T15:04:05.999999999",
"2006-01-02Z07:00", "2006-01-02"}
// iterate the supported time formats
for _, tf := range timeFormats {
t, err := time.Parse(tf, inputTime)
if err == nil {
return t, nil
}
}
unixTimestamp, err := strconv.ParseFloat(inputTime, 64)
if err == nil {
iPart, fPart := math.Modf(unixTimestamp)
return time.Unix(int64(iPart), int64(fPart*1_000_000_000)).UTC(), nil
}
// input might be a duration
duration, err := time.ParseDuration(inputTime)
if err != nil {
return time.Time{}, errors.New("unable to interpret time value")
}
if since {
return time.Now().Add(-duration), nil
}
return time.Now().Add(duration), nil
}
returning a false non-error value somehow, so I instrumented it with logrus on every return... and nope, it's actually returning from line 1135, "unable to interpret". I can only conclude that whoever is calling filters.go:generateEventFilters() is not handling the error. And with that I'm outta here for the night.

@Luap99
Copy link
Member

Luap99 commented Jul 3, 2024

The issue is with how channels and select work I would assume, i.e. see c46884a

I guess that didn't really fix the main cause as the default: just skips the error if the channel does not yet contain the error.

@Luap99 Luap99 self-assigned this Jul 3, 2024
Luap99 added a commit to Luap99/libpod that referenced this issue Jul 3, 2024
The events code makes use of two channels, one for the events and one
for the resulting error. Then in the main file we have a loop reading
from both channels that should exit on first error it gets.

However in case the event channel is closed before the error channel
cotains the error it could caused an early exit as it looked like all
events were done. Commit c46884a fixed that somewhat by checking for
an error in the error channel before exiting. This however was still
racy as it added a default case in the select which means the channel
check is non blocking. Thus the error was not yet send into the channel.

To fix this we should make it a blocking read to wait for the error in
the channel. Also the err != nil check can be removed as we either
return err or nil anyway.

And as last step make sure the error channel is closed, that prevents us
from blocking forever in case the main select already processed the nil
error.

Fixes containers#23165

Signed-off-by: Paul Holzinger <[email protected]>
@stale-locking-app stale-locking-app bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Oct 4, 2024
@stale-locking-app stale-locking-app bot locked as resolved and limited conversation to collaborators Oct 4, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
flakes Flakes from Continuous Integration locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants