Container sigkill after stop timeout not logged as error #11854

pboguslawski · 2021-10-04T18:44:07Z

/kind bug

Description

If podman stop kills container after timeout, no info about timeout is logged in journal/stderr; killing container after timeout should be treated as dangerous because of data loss possibility so message

podman/libpod/oci_conmon_linux.go

Line 438 in 800e91d

    
           logrus.Infof("Timed out stopping container %s, resorting to SIGKILL: %v", ctr.ID(), err)

priority should be higher than info (at least warn) and systemd unit files generated with podman generate should produce such warnings to journal/syslog in case of killing container by podman with sigkill.

Using podman stop --log-level=info is not an option because it produces too much noise.

Steps to reproduce the issue:

Run any container that need >1s to stop.
Stop it using podman stop --time=1

Describe the results you received:

No traces in journal/syslog about container killing.

Describe the results you expected:

Journal/syslog should contain message about stop timeout and container killing with level warn (or maybe even error).

The text was updated successfully, but these errors were encountered:

rhatdan · 2021-10-04T18:57:48Z

Interested in opening a PR to fix?

edsantiago · 2021-10-05T13:05:12Z

I agree that falling back to SIGKILL is noteworthy, but I'm not convinced that podman stop is the right place for this. The careful developer makes sure to check container exit status:

$ podman run --name foo -d alpine sleep 100
7e6c79489feac65fa34353146047214e05014c30bf4efa048b09192c29c5513c
$ podman stop --time=1 foo
foo
$ podman inspect --format '{{.State.ExitCode}}' foo
137

In an automated environment, a warning would be of limited use: nobody would see it. In an automated environment, it really is kind of important for the developer to check container exit status.

When playing on the command line, I can see a "timed out" warning being mildly helpful -- but in this environment the user will (presumably) also notice the timeout delay itself.

@pboguslawski in what scenario do you think this kind of warning would actually help someone?

pboguslawski · 2021-10-05T13:11:05Z

in what scenario do you think this kind of warning would actually help someone?

The issue is about using systemd unit generated with podman generate as system service without any other automated environment. In this scenario no problem traces in journal/syslog. Also some careful developers still like to use syslog watchers... :)

rhatdan · 2021-10-05T13:26:05Z

You could change the --log-level info command on the podman stop command to see this output.

mheon · 2021-10-05T13:33:47Z

I don't know if this is worthy of printing by default - podman stop on a container running a binary that ignores signals (sleep is a good example) will always trigger it, so my feeling is that we'll be printing this warning a lot, there's not much the user can do to resolve it (maybe some binaries are like systemd and require a specific stop signal but that seems to be the minority), and it will increase logspam making it more difficult to locate serious problems.

pboguslawski · 2021-10-05T13:37:03Z

You could change the --log-level info command on the podman stop command to see this output.

Adding --log-level info to systemd service unit file fills journal with too much noise; this particular message level should be warn or error (if warn then podman should generate unit files with podman stop --log-level warn to feed journal with it in case of timeout).

I don't know if this is worthy of printing by default

IHMO: its definitely more important message than podmans messages with

container start
container stop
container cleanup
container died
container attach

that are sent to syslog by default.

Would be also nice to see similar message in syslog with for example

container killed (stop timeout)

edsantiago · 2021-10-05T13:45:51Z

@pboguslawski thanks, that's a good point (your use case in systemd). I think your proposal is promising: the timeout really should IMHO be more important than info (although absolutely not error-level); and by automatically setting log-level in the unit file, the warning will be visible to log scrapers but not simple schmoes playing with sleep containers on the command line.

rhatdan · 2021-10-05T21:26:52Z

Podman defaults to log-level Warn, the question would be do we raise it to Warn, so it is in everyones face when it happens. I am ok with it, but if users start to complain, then we might want to revisit it.
@pboguslawski Please open a PR.

mheon · 2021-10-05T21:29:27Z

If we actually do this, be prepared for the integration tests to break in a lot of places - which might not be the worst thing, considering we usually shouldn't be waiting for that timeout anyways.

edsantiago · 2021-10-05T21:58:58Z

Podman defaults to log-level Warn

Man page says "error", which is why I was OK with raising the severity of this message.

If the man page is wrong, can someone please fix it? (I would offer, but I'm OOTO tomorrow with no net access)
- Also if the man page is wrong, then I withdraw my comments above. I don't think this would be a useful warning for regular users. And int tests will be unaffected, because they don't really check any output anywhere, but sys tests may need a bit of massaging.

mheon · 2021-10-06T19:09:15Z

Manpages are wrong, then - we swapped to Warn around 9 months ago, AFAIK.

Note: the Warning message will not come to podman-remote. It would be difficult to plumb, and not really worth the effort. Fixes: containers#11854 Signed-off-by: Daniel J Walsh <[email protected]>

openshift-ci bot added the kind/bug Categorizes issue or PR as related to a bug. label Oct 4, 2021

rhatdan mentioned this issue Oct 6, 2021

Warn if podman stop timeout expires that sigkill was sent #11880

Merged

openshift-merge-robot closed this as completed in #11880 Oct 10, 2021

github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 21, 2023

github-actions bot locked as resolved and limited conversation to collaborators Sep 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Container sigkill after stop timeout not logged as error #11854

Container sigkill after stop timeout not logged as error #11854

pboguslawski commented Oct 4, 2021

rhatdan commented Oct 4, 2021

edsantiago commented Oct 5, 2021

pboguslawski commented Oct 5, 2021 •

edited

Loading

rhatdan commented Oct 5, 2021

mheon commented Oct 5, 2021

pboguslawski commented Oct 5, 2021 •

edited

Loading

edsantiago commented Oct 5, 2021

rhatdan commented Oct 5, 2021

mheon commented Oct 5, 2021

edsantiago commented Oct 5, 2021

mheon commented Oct 6, 2021

Container sigkill after stop timeout not logged as error #11854

Container sigkill after stop timeout not logged as error #11854

Comments

pboguslawski commented Oct 4, 2021

rhatdan commented Oct 4, 2021

edsantiago commented Oct 5, 2021

pboguslawski commented Oct 5, 2021 • edited Loading

rhatdan commented Oct 5, 2021

mheon commented Oct 5, 2021

pboguslawski commented Oct 5, 2021 • edited Loading

edsantiago commented Oct 5, 2021

rhatdan commented Oct 5, 2021

mheon commented Oct 5, 2021

edsantiago commented Oct 5, 2021

mheon commented Oct 6, 2021

pboguslawski commented Oct 5, 2021 •

edited

Loading

pboguslawski commented Oct 5, 2021 •

edited

Loading