Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement SD-NOTIFY proxy in conmon #8508

Closed
wants to merge 1 commit into from
Closed

Conversation

rhatdan
Copy link
Member

@rhatdan rhatdan commented Nov 29, 2020

This leverages conmon's ability to proxy the SD-NOTIFY socket.
This prevents locking caused by OCI runtime blocking, waiting for
SD-NOTIFY messages, and instead passes the messages directly up
to the host.

Signed-off-by: Joseph Gooch [email protected]
Signed-off-by: Daniel J Walsh [email protected]

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 29, 2020
@rhatdan
Copy link
Member Author

rhatdan commented Nov 29, 2020

Replaces: #7607

@rhatdan
Copy link
Member Author

rhatdan commented Nov 29, 2020

@goochjj I will take this one over and try to finish it up.

Thanks for the PR.

@rhatdan
Copy link
Member Author

rhatdan commented Nov 30, 2020

@haircommander Is this problem caused by older conmon still?

@haircommander
Copy link
Collaborator

yeah it looks like it

@rhatdan
Copy link
Member Author

rhatdan commented Nov 30, 2020

@cevich PTAL

libpod/container_internal_linux.go Show resolved Hide resolved
Comment on lines +1451 to +1651
// Bind notify proxy
if c.config.SdNotifyMode == define.SdNotifyModeContainer {
notifyDir := filepath.Join(c.bundlePath(), "notify")
logrus.Debugf("checking notify %q dir", notifyDir)
if err := os.MkdirAll(notifyDir, 0755); err != nil {
if !os.IsExist(err) {
return errors.Wrapf(err, "unable to create notify %q dir", notifyDir)
}
}
if err := label.Relabel(notifyDir, c.MountLabel(), true); err != nil {
return errors.Wrapf(err, "relabel failed %q", notifyDir)
}
logrus.Debugf("add bindmount notify %q dir", notifyDir)
if _, ok := c.state.BindMounts["/run/notify"]; !ok {
c.state.BindMounts["/run/notify"] = notifyDir
}
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similarly here, we need the OCI runtime to take care of it. If we pass down the notify socket directly to the container, there is a race condition between the notification being sent and systemd reading it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@giuseppe We're proxying. PODMAN is now in charge of notification. NOT the OCI runtime. That's the point.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PODMAN creates /run/notify/
PODMAN tells OCI runtime to bind-mount /run/notify to the container's setup folder
CONMON forwards /run/notify/notify.sock ( in container, by way of bind mount ) to --sdnotify-socket on host

OCI runtime is NOT taking care of it.

Copy link
Member

@giuseppe giuseppe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@openshift-ci-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: giuseppe, rhatdan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@cevich
Copy link
Member

cevich commented Dec 1, 2020

@cevich PTAL

Not sure what is needed from me. I peeked at the testing results and they all appear related to the changes here, no?

@mheon
Copy link
Member

mheon commented Dec 1, 2020

We may need new VM images with a newer Conmon.

@rhatdan
Copy link
Member Author

rhatdan commented Dec 1, 2020

Yes @cevich new vms need latest conmon.

@cevich
Copy link
Member

cevich commented Dec 2, 2020

Yes @cevich new vms need latest conmon.

And newer Ubuntu ☹️ Can be part of WIP #8312 (from containers/automation_images#34)?

@cevich
Copy link
Member

cevich commented Dec 2, 2020

Note, "No" is acceptable: in that case I'll roll up new images separate from my Ubuntu work, and they can be brought in via this PR.

@rhatdan
Copy link
Member Author

rhatdan commented Dec 2, 2020

@cevich depends on when I can get new Ubuntu. A couple of days, I am fine with it. weeks, I have a problem.

@cevich
Copy link
Member

cevich commented Dec 2, 2020

Sounds fair, if we get into next week and I'm still stuck, I'll make a new PR to just refresh the current images. There is a risk it will fail with Ubuntu 19, as we're on borrowed-time WRT the package repositories (19 is EOL) remaining available.

Otherwise/also, I've just finished building fresh and am trying those out right now in #8312...

@cevich
Copy link
Member

cevich commented Dec 4, 2020

...yeah, we've got problems with the new VMs that will probably take a while to solve. But maybe we have some luck here, b/c IIRC brent built new VMs to incorporate docker compose. It's not yet in master, @rhatdan try this see if it works:

diff --git a/.cirrus.yml b/.cirrus.yml
index 0fa51be63..be4c0e459 100644
--- a/.cirrus.yml
+++ b/.cirrus.yml
@@ -30,7 +30,7 @@ env:
     PRIOR_UBUNTU_NAME: "ubuntu-19"

     # Google-cloud VM Images
-    IMAGE_SUFFIX: "c5402398833246208"
+    IMAGE_SUFFIX: "c4704091098054656"
     FEDORA_CACHE_IMAGE_NAME: "fedora-${IMAGE_SUFFIX}"
     PRIOR_FEDORA_CACHE_IMAGE_NAME: "prior-fedora-${IMAGE_SUFFIX}"
     UBUNTU_CACHE_IMAGE_NAME: "ubuntu-${IMAGE_SUFFIX}"

@cevich
Copy link
Member

cevich commented Dec 4, 2020

Oh! I can check what version of conmon is in ubuntu-c4704091098054656...looks like 2.0.20~1. Is that new enough?

@rhatdan
Copy link
Member Author

rhatdan commented Dec 4, 2020

I think we need 2.0.21

@cevich
Copy link
Member

cevich commented Dec 7, 2020

I think we need 2.0.21

okay. Lokesh has a new OBS repo with libseccomp 2.5.1 and other updated packages for us. Building images now...

@cevich
Copy link
Member

cevich commented Dec 7, 2020

...testing new images.

@cevich
Copy link
Member

cevich commented Dec 8, 2020

Update: I'm down to ~2-5 failing tests on the new images. These failures seem to be caused by "too old" package versions. We're asking a lot of Lokesh to maintain all these special Ubuntu repos. in the first place, so I'm afraid to ask for updates to the (now defunct) 19.10 repo, in addition to 20.04 and 20.10. At the same time, I don't want to give you a steaming pile to work with, so I'm doing by best to get all the tests passing (PR #8312) first.

@cevich
Copy link
Member

cevich commented Dec 10, 2020

Update: Getting very close with the new images, though I've had to add skips to workaround a few broken tests/situations.

@cevich
Copy link
Member

cevich commented Dec 11, 2020

Update: The suite of podman CI tests should be passing now, so this PR is waiting on review and merge of #8312

@rhatdan
Copy link
Member Author

rhatdan commented Jan 14, 2021

Not going to make the 3.0 release. Will have to wait until 3.1

@github-actions
Copy link

A friendly reminder that this PR had no activity for 30 days.

This leverages conmon's ability to proxy the SD-NOTIFY socket.
This prevents locking caused by OCI runtime blocking, waiting for
SD-NOTIFY messages, and instead passes the messages directly up
to the host.

[NO TESTS NEEDED] existing tests cover this use case.

Signed-off-by: Joseph Gooch <[email protected]>
Signed-off-by: Daniel J Walsh <[email protected]>
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 16, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: giuseppe, rhatdan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@mheon
Copy link
Member

mheon commented Aug 9, 2021

What's the status here? I see we have a commit changing the default type of generate systemd units over to notify in the v3.3 branch - are we still using the OCI runtime to provide sdnotify in that case, or is conmon being used? If we're using the OCI runtime, I think we need to revert for 3.3, until this PR can merge.

@github-actions github-actions bot removed the stale-pr label Aug 10, 2021
@rhatdan
Copy link
Member Author

rhatdan commented Aug 10, 2021

I have no idea of the state. I grabbed this PR to try to push it over the edge a while back. but I have no idea what is going on at this state. If @mheon or someone else wants to finish this, I am open to them taking it.

@flyingflo
Copy link

What is missing besides of a bunch of failing tests? These seem to fail on many pull requests, however. 🤷‍♂️ Don't know why..

@@ -1254,6 +1244,12 @@ func (r *ConmonOCIRuntime) sharedConmonArgs(ctr *Container, cuuid, bundlePath, p
args = append(args, rFlags...)
}

if ctr.config.SdNotifyMode == define.SdNotifyModeContainer {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we also have to unset NOTIFY_SOCKET to prevent runc and crun from blocking? Not sure if I understood it correctly though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well we don't want to pass these on to the OCI Runtimes, yes, since we don't want them handling it.

@vrothberg
Copy link
Member

@rhatdan suggested that I take over this PR to kick it over the finish line, closing.

@vrothberg vrothberg closed this Aug 17, 2021
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 22, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 22, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants