Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libct/cg/sd: reconnect and retry on dbus connection error #2923

Merged
merged 5 commits into from
Apr 28, 2021

Conversation

kolyshkin
Copy link
Contributor

@kolyshkin kolyshkin commented Apr 27, 2021

PR #2203 started to reuse the same dbus connection. While this improves the performance,
In case the dbus daemon is ever restarted, the connection is no longer valid and every operation
fails. This is a minor concern for short-lived runc, but much more of a issue in case there is
a long-running daemon (e.g. cri-o) is using runc's libcontainer, as the connection is never
retried and the only remedy is to restart the daemon.

The solution to the above is to check the errors returned for dbus: connection closed by user
error, and try to re-connect on that. This is what PR #2862 does.

This is a carry of #2862, implementing the idea of retry-in-place (first described
at #2862 (comment) and #2862 (comment)) on top of what it does.

For more info, see commit messages as well as #2862.

Fixes:

Changelog entry

  • libcontainer/cgroups/systemd: reconnect and retry in case dbus connection is closed (after dbus restart)

@kolyshkin
Copy link
Contributor Author

@wzshiming PTAL

@kolyshkin kolyshkin mentioned this pull request Apr 27, 2021
@kolyshkin kolyshkin changed the title libct/cg/sd: reconnection and retry on dbus connection error libct/cg/sd: reconnect and retry on dbus connection error Apr 27, 2021
@kolyshkin kolyshkin changed the title libct/cg/sd: reconnect and retry on dbus connection error libct/cg/sd: reconnect and retry on dbus connection error (regression in rc91) Apr 27, 2021
@kolyshkin kolyshkin requested review from cyphar and AkihiroSuda and removed request for cyphar April 27, 2021 22:10
@kolyshkin kolyshkin added this to the 1.0.0-rc94 milestone Apr 27, 2021
mrunalp
mrunalp previously approved these changes Apr 27, 2021
@kolyshkin kolyshkin changed the title libct/cg/sd: reconnect and retry on dbus connection error (regression in rc91) libct/cg/sd: reconnect and retry on dbus connection error Apr 27, 2021
@kolyshkin
Copy link
Contributor Author

I was wrong in my initial analysis that this is a regression caused by PR #2203. In fact it's not, it has always been working that way (see e.g. https://bugzilla.redhat.com/show_bug.cgi?id=1634092 from 2018).

wzshiming and others added 5 commits April 27, 2021 16:02
[@kolyshkin: documentation nits]

Signed-off-by: Shiming Zhang <[email protected]>
Signed-off-by: Kir Kolyshkin <[email protected]>
Generalize isUnitExists as isDbusError, and use errors.As while at it
(which can handle wrapped errors as well).

Signed-off-by: Kir Kolyshkin <[email protected]>
[@kolyshkin: doc nits, use dbus.ErrClosed and isDbusError]

Signed-off-by: Shiming Zhang <[email protected]>
Signed-off-by: Kir Kolyshkin <[email protected]>
Signed-off-by: Shiming Zhang <[email protected]>
Signed-off-by: Kir Kolyshkin <[email protected]>
Instead of reconnecting to dbus after some failed operations, and
returning an error (so a caller has to retry), reconnect AND retry
in place for all such operations.

This should fix issues caused by a stale dbus connection after e.g.
a dbus daemon restart.

Signed-off-by: Kir Kolyshkin <[email protected]>
Comment on lines +31 to +36
d.RLock()
if conn := d.conn; conn != nil {
d.RUnlock()
return conn, nil
}
d.RUnlock()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be

d.RLock()
conn := d.conn
d.RUnlock()
if conn != nil { ... }

But it doesn't really matter.

Copy link
Member

@cyphar cyphar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants