Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix several podman events issues #15717

Merged
merged 9 commits into from
Sep 12, 2022
Merged

Conversation

Luap99
Copy link
Member

@Luap99 Luap99 commented Sep 9, 2022

fix hang with podman events file logger

podman --events-backend file events --stream=false should never hang. The
problem is that our tail library will wait for the file to be created
which makes sense when we do not run with --stream=false. To fix this we
can just always create the file when the logger is initialized. This
would also help to report errors early on in case the file is not
accessible.

event backend none: return an error when reading events

podman --events-backend none events should return with an error since it
will never be able to actually list events.

event backend journald: fix problem with empty journal

Currently podman events will just fail with Error: failed to get journal cursor: failed to get cursor: cannot assign requested address when the
journal contains zero podman events.

The problem is that we are using the journal accessors wrong. There is no
need to call GetCursor() and compare them manually. The Next() return an
integer which tells if it moved to the next or not. This means the we can
remove GetCursor() which would fail when there is no entry.

This also includes another bug fix. Previously the logic called Next()
twice for the first entry which caused us to miss the first entry.

To reproduce this issue you can run the following commands:

sudo journalctl --rotate
sudo journalctl --vacuum-time=1s

Note that this will delete the full journal.

Now run podman events and it fails but with this patch it works.
Now generate a single event, i.e. podman pull alpine, and run
podman events --until 1s.

I am not sure how to get a reliable test into CI, I really do not want
to delete the journal and developer or CI systems.

libpod: runtime newEventer() cleanup

There is no reason to create a new eventer every time. The libpod runtime
already has one attached which should be used instead.

Fixes #15688

Does this PR introduce a user-facing change?

podman --events-backend file events --stream=false will now no longer hang when the event file does not exists.
podman --events-backend none events now returns an error because it can never list events.
podman --events-backend journald will now work with an empty journal and no longer skip the first event.

@openshift-ci openshift-ci bot added release-note approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Sep 9, 2022
Copy link
Member

@vrothberg vrothberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, @Luap99!

Changes LGTM

return nil, err
}
_ = fd.Close()
return &EventLogFile{options: options}, nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mini nit: return ..., fd.Close()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can add this but honestly do we want to fail basically all podman commands for this?

jsonStr, err := event.ToJSONString()
if err != nil {
return err
for {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes here look unrelated to the commit.

Copy link
Member Author

@Luap99 Luap99 Sep 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are not, for loop on event channel is blocking. Therefore we can never read the error channel.

return fmt.Errorf("failed to move journal cursor to next entry: %w", err)
}
newCursor, err := j.GetCursor()
entry, err := getNextEntry(ctx, j, options.Stream, untilTime)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: bug fix + refactor in the same commit is hard to review.

@edsantiago
Copy link
Member

Since you need to resubmit anyway... could I beg you to include something like this?

diff --git a/docs/source/markdown/podman.1.md b/docs/source/markdown/podman.1.md
index 3b3974dcc..e992377cf 100644
--- a/docs/source/markdown/podman.1.md
+++ b/docs/source/markdown/podman.1.md
@@ -43,8 +43,8 @@ Remote connections use local containers.conf for default.                                                                                    
 #### **--events-backend**=*type*
 
 Backend to use for storing events. Allowed values are **file**, **journald**, and
-**none**. When *file* is specified, the events are stored under a subdirectory
-of the *tmpdir* location (see **--tmpdir** below).
+**none**. When *file* is specified, the events are stored
+in \$XDG\_RUNTIME\_DIR/libpod/tmp/events/events.log.
 
 #### **--help**, **-h**

This would address my comment about debug logging. Basically, I wasted a lot of time yesterday looking for a (nonexistent) log or lock file "somewhere under a tmpdir".

@Luap99
Copy link
Member Author

Luap99 commented Sep 9, 2022

Since you need to resubmit anyway... could I beg you to include something like this?

diff --git a/docs/source/markdown/podman.1.md b/docs/source/markdown/podman.1.md
index 3b3974dcc..e992377cf 100644
--- a/docs/source/markdown/podman.1.md
+++ b/docs/source/markdown/podman.1.md
@@ -43,8 +43,8 @@ Remote connections use local containers.conf for default.                                                                                    
 #### **--events-backend**=*type*
 
 Backend to use for storing events. Allowed values are **file**, **journald**, and
-**none**. When *file* is specified, the events are stored under a subdirectory
-of the *tmpdir* location (see **--tmpdir** below).
+**none**. When *file* is specified, the events are stored
+in \$XDG\_RUNTIME\_DIR/libpod/tmp/events/events.log.
 
 #### **--help**, **-h**

This would address my comment about debug logging. Basically, I wasted a lot of time yesterday looking for a (nonexistent) log or lock file "somewhere under a tmpdir".

sure

As it looks the test found another bug. They use the default log file location even when --root and --runroot and --tmpdir are set.

@edsantiago
Copy link
Member

Because the diff isn't obvious: test is expecting untag+remove as last two events, is getting untag+untag instead.

Copy link
Member

@vrothberg vrothberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Luap99
Copy link
Member Author

Luap99 commented Sep 12, 2022

@containers/podman-maintainers PTAL

Luap99 and others added 9 commits September 12, 2022 18:05
podman --events-backend file events --stream=false should never hang. The
problem is that our tail library will wait for the file to be created
which makes sense when we do not run with --stream=false. To fix this we
can just always create the file when the logger is initialized. This
would also help to report errors early on in case the file is not
accessible.

Fixes part one from containers#15688

Signed-off-by: Paul Holzinger <[email protected]>
podman --events-backend none events should return with an error since it
will never be able to actually list events.

Fixes part three of containers#15688

Signed-off-by: Paul Holzinger <[email protected]>
Currently podman events will just fail with `Error: failed to get journal
cursor: failed to get cursor: cannot assign requested address` when the
journal contains zero podman events.

The problem is that we are using the journal accessors wrong. There is no
need to call GetCursor() and compare them manually. The Next() return an
integer which tells if it moved to the next or not. This means the we can
remove GetCursor() which would fail when there is no entry.

This also includes another bug fix. Previously the logic called Next()
twice for the first entry which caused us to miss the first entry.

To reproduce this issue you can run the following commands:
```
sudo journalctl --rotate
sudo journalctl --vacuum-time=1s
```
Note that this will delete the full journal.

Now run podman events and it fails but with this patch it works.
Now generate a single event, i.e. podman pull alpine, and run
podman events --until 1s.

I am not sure how to get a reliable test into CI, I really do not want
to delete the journal and developer or CI systems.

Fixes second part of containers#15688

Signed-off-by: Paul Holzinger <[email protected]>
There is no reason to create a new eventer every time. The libpod runtime
already has one attached which should be used instead.

Signed-off-by: Paul Holzinger <[email protected]>
Refactored secrets API in common for stability purposes. Move podman to
said API.

[NO NEW TESTS NEEDED]

Signed-off-by: Ashley Cui <[email protected]>
List the default paths to the event log file and the tmpdir option.

Signed-off-by: Paul Holzinger <[email protected]>
The current code only sets EventsLogFilePath when the tmp is overwritten
from the db. We should always set the default when no path was set in
containers.conf.

Signed-off-by: Paul Holzinger <[email protected]>
In order to display all events we have to read until the event channel
is closed.

Signed-off-by: Paul Holzinger <[email protected]>
The --format changes caused a duplicated newline.
PR containers#15678 should have a test for this.

Signed-off-by: Paul Holzinger <[email protected]>
Copy link
Member

@edsantiago edsantiago left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trivial nit in man page, but am blocking because, could you check your commit list please? I think you accidentally adopted 72e715a

@@ -158,7 +158,7 @@ On remote clients, including Mac and Windows (excluding WSL2) machines, logging

#### **--tmpdir**

Path to the tmp directory, for libpod runtime content.
Path to the tmp directory, for libpod runtime content. Defaults to `$XDG\_RUNTIME\_DIR/libpod/tmp` as rootless and `run/libpod/tmp` as rootful.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be /run (missing slash)?

@Luap99
Copy link
Member Author

Luap99 commented Sep 12, 2022

Trivial nit in man page, but am blocking because, could you check your commit list please? I think you accidentally adopted 72e715a

No that commit is required to vendor in new c/common (breaking changes)

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Sep 12, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: edsantiago, Luap99

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@edsantiago
Copy link
Member

No that commit is required to vendor in new c/common (breaking changes)

Got it. For benefit of future spelunkers, that was originally #15723, which was closed in favor of getting absorbed here.

@rhatdan
Copy link
Member

rhatdan commented Sep 12, 2022

/lgtm
/hold

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 12, 2022
@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Sep 12, 2022
@rhatdan
Copy link
Member

rhatdan commented Sep 12, 2022

/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 12, 2022
@openshift-merge-robot openshift-merge-robot merged commit 6e54594 into containers:main Sep 12, 2022
@Luap99 Luap99 deleted the events branch September 12, 2022 18:24
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 20, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 20, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. release-note
Projects
None yet
Development

Successfully merging this pull request may close these issues.

podman events missbehaving when there are zero events
6 participants