Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: enable test_wait_next_exit #17854

Conversation

matejvasek
Copy link
Contributor

@matejvasek matejvasek commented Mar 20, 2023

Does this PR introduce a user-facing change?

None

@openshift-ci openshift-ci bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. do-not-merge/release-note-label-needed Enforce release-note requirement, even if just None release-note-none and removed do-not-merge/release-note-label-needed Enforce release-note requirement, even if just None labels Mar 20, 2023
@rhatdan rhatdan marked this pull request as ready for review March 20, 2023 20:28
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 20, 2023
@rhatdan
Copy link
Member

rhatdan commented Mar 21, 2023

Seems like it is still broken in CI.

@vrothberg
Copy link
Member

@matejvasek did you find time to analyze what's up?

@matejvasek
Copy link
Contributor Author

@vrothberg this test should be enabled once updated SELinux policy package is available in testing OS.

The fix PR: containers/container-selinux#196
It should be available in SELinux policy package v2.193.0.

I find it odd it is not present, it's been quite a time since fix.

@matejvasek
Copy link
Contributor Author

@rhatdan @vrothberg the test runs on fedora-37 I would expect that it has container-selinux v2.193.0+.

@matejvasek
Copy link
Contributor Author

It has container-selinux-2.201.0-1.fc37-noarch so I have no idea why this is failing.

@matejvasek
Copy link
Contributor Author

matejvasek commented Mar 28, 2023

I can reproduce the issue locally with container-selinux-2.199. This is weird the write to tmpfs should have been fixed.

Mar 28 21:34:37 bellatrix podman[1509235]: 2023-03-28 21:34:37.176807527 +0200 CEST m=+4.987029737 container init 24c08a537ad9a5116e89a1a0e516c7cf352120e8ff5ced34b0054c1c98d383be (image=quay.io/libpod/alpine:latest, name=top)
Mar 28 21:34:37 bellatrix podman[1509235]: 2023-03-28 21:34:37.180976599 +0200 CEST m=+4.991198809 container start 24c08a537ad9a5116e89a1a0e516c7cf352120e8ff5ced34b0054c1c98d383be (image=quay.io/libpod/alpine:latest, name=top)
Mar 28 21:34:37 bellatrix systemd-journald[16005]: Failed to stat passed file, ignoring: Permission denied
Mar 28 21:34:37 bellatrix audit[16005]: AVC avc:  denied  { getattr } for  pid=16005 comm="systemd-journal" path=2F6465762F73686D2F6A6F75726E616C2E585858585831323932393430333935202864656C6574656429 dev="tmpfs" ino=457645 scontext=system_u:system_r:syslogd_t:s0 tcontext=unconfined_u:object_r:container_runtime_tmpfs_t:s0 tclass=file permissive=0
Mar 28 21:34:37 bellatrix audit[16005]: SYSCALL arch=c000003e syscall=262 success=no exit=-13 a0=3b a1=7ff9879b7093 a2=7ffdcca48f10 a3=1000 items=1 ppid=1 pid=16005 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="systemd-journal" exe="/usr/lib/systemd/systemd-journald" subj=system_u:system_r:syslogd_t:s0 key=(null)
Mar 28 21:34:37 bellatrix audit: CWD cwd="/"
Mar 28 21:34:37 bellatrix audit: PATH item=0 name="" inode=457645 dev=00:17 mode=0100600 ouid=1000 ogid=1000 rdev=00:00 obj=unconfined_u:object_r:container_runtime_tmpfs_t:s0 nametype=NORMAL cap_fp=0 cap_fi=0 cap_fe=0 cap_fver=0 cap_frootid=0
Mar 28 21:34:37 bellatrix audit: PROCTITLE proctitle="/usr/lib/systemd/systemd-journald"
Mar 28 21:34:37 bellatrix top[1509384]: [83B blob data]
Mar 28 21:34:37 bellatrix top[1509384]: CPU:   8% usr   4% sys   0% nic  84% idle   0% io   1% irq   1% sirq
Mar 28 21:34:37 bellatrix top[1509384]: Load average: 3.29 1.78 1.54 1/4850 1
Mar 28 21:34:37 bellatrix top[1509384]:   PID  PPID USER     STAT   VSZ %VSZ CPU %CPU COMMAND

@matejvasek
Copy link
Contributor Author

@rhatdan ^^^ any idea what the problem could be?
btw pid 16005 is systemd-journald

@matejvasek
Copy link
Contributor Author

matejvasek commented Mar 28, 2023

Before the error was different (different syscall was denied):
OLD ERROR:

dfbd.scope - libcrun container.
Mar 28 21:41:13 bellatrix audit[16005]: AVC avc:  denied  { read write } for  pid=16005 comm="systemd-journal" path=2F6465762F73686D2F6A6F75726E616C2E585858585833373730313431383132202864656C6574656429 dev="tmpfs" ino=459577 scontext=system_u:system_r:syslogd_t:s0 tcontext=unconfined_u:object_r:container_runtime_tmpfs_t:s0 tclass=file permissive=0
Mar 28 21:41:13 bellatrix audit[16005]: SYSCALL arch=c000003e syscall=47 success=yes exit=0 a0=5 a1=7ffdcca48de0 a2=40000040 a3=7ffdccb67080 items=0 ppid=1 pid=16005 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="systemd-journal" exe="/usr/lib/systemd/systemd-journald" subj=system_u:system_r:syslogd_t:s0 key=(null)
Mar 28 21:41:13 bellatrix audit: SOCKADDR saddr=0100003934356431
Mar 28 21:41:13 bellatrix audit: PROCTITLE proctitle="/usr/lib/systemd/systemd-journald"
Mar 28 21:41:13 bellatrix systemd-journald[16005]: Got message with truncated control data (too many fds sent?), ignoring.
Mar 28 21:41:13 bellatrix audit[16005]: AVC avc:  denied  { read write } for  pid=16005 comm="systemd-journal" path=2F6465762F73686D2F6A6F75726E616C2E585858585833373833353031303335202864656C6574656429 dev="tmpfs" ino=459578 scontext=system_u:system_r:syslogd_t:s0 tcontext=unconfined_u:object_r:container_runtime_tmpfs_t:s0 tclass=file permissive=0
Mar 28 21:41:13 bellatrix audit[16005]: SYSCALL arch=c000003e syscall=47 success=yes exit=0 a0=5 a1=7ffdcca48de0 a2=40000040 a3=7ffdccb67080 items=0 ppid=1 pid=16005 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="systemd-journal" exe="/usr/lib/systemd/systemd-journald" subj=system_u:system_r:syslogd_t:s0 key=(null)
Mar 28 21:41:13 bellatrix audit: SOCKADDR saddr=0100003934356431
Mar 28 21:41:13 bellatrix audit: PROCTITLE proctitle="/usr/lib/systemd/systemd-journald"
Mar 28 21:41:13 bellatrix systemd-journald[16005]: Got message with truncated control data (too many fds sent?), ignoring.
Mar 28 21:41:13 bellatrix audit[16005]: AVC avc:  denied  { read write } for  pid=16005 comm="systemd-journal" path=2F6465762F73686D2F6A6F75726E616C2E585858585832353039323238323333202864656C6574656429 dev="tmpfs" ino=459579 scontext=system_u:system_r:syslogd_t:s0 tcontext=unconfined_u:object_r:container_runtime_tmpfs_t:s0 tclass=file permissive=0
Mar 28 21:41:13 bellatrix systemd-journald[16005]: Got message with truncated control data (too many fds sent?), ignoring.
Mar 28 21:41:13 bellatrix audit[16005]: SYSCALL arch=c000003e syscall=47 success=yes exit=0 a0=5 a1=7ffdcca48de0 a2=40000040 a3=7ffdccb67080 items=0 ppid=1 pid=16005 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="systemd-journal" exe="/usr/lib/systemd/systemd-journald" subj=system_u:system_r:syslogd_t:s0 key=(null)

@matejvasek
Copy link
Contributor Author

@rhatdan Maybe some other global SELinux policies changed?

@matejvasek
Copy link
Contributor Author

cc @edsantiago

@edsantiago
Copy link
Member

Whoosh. Way over my head, sorry.

@rhatdan
Copy link
Member

rhatdan commented Mar 28, 2023

Can you get this to fail outside of the CI/CD system?

@rhatdan
Copy link
Member

rhatdan commented Mar 28, 2023

What do I need to do to test this locally?

@matejvasek
Copy link
Contributor Author

@rhatdan

# in podman project
python3 -m venv venv
source ./venv/bin/activate
pip3 install -r test/python/requirements.txt
export PODMAN=/path/to/podman
python -m unittest test.python.docker.compat.test_containers.TestContainers.test_wait_next_exit

@matejvasek
Copy link
Contributor Author

Sometimes the testing podman service hangs:

subprocess.CalledProcessError: Command 'podman system service' returned non-zero exit status 125.

in such a case kill it manually.

@matejvasek
Copy link
Contributor Author

@rhatdan or just run a container with an image with 200k long labels an look into syslog.

@matejvasek
Copy link
Contributor Author

@rhatdan try this:
podman run --rm quay.io/mvasek/long-label:latest and then check journalctl -f.

@matejvasek
Copy link
Contributor Author

@rhatdan have you managed to reproduce the issue?

@matejvasek
Copy link
Contributor Author

@rhatdan I don't know if it helps but: the AVC denied error happens when len(labels) around 104k-105k characters. For instance 100k label does work all right.

@matejvasek
Copy link
Contributor Author

Before containers/container-selinux#196 the limit was just few 10k IIRC.

@matejvasek
Copy link
Contributor Author

The good news for me is the image we use has labels just very little bit above 100k so it working for my use-case.
Still I with there were some reserve hence the test uses 250k labels.

@rhatdan
Copy link
Member

rhatdan commented Mar 31, 2023

I really have no idea what is going on here.

@matejvasek
Copy link
Contributor Author

Something more from investigation:

Larges message that can be sent to log is 212940 bytes.

What is most likely happening:

  • Try send log entry by ssize_t send(int sockfd, const void *buf, size_t len, int flags), if succeeded we are done.
  • If the above fails due to EMSGSIZE try ssize_t sendmsg(int sockfd, const struct msghdr *msg, int flags),
    the msghdr struct contains contains file-descriptor to temp file created by podman in /dev/shm.
    The call succeeds, however systemd-journald cannot access the temp file and it gets AVC.

The SELinux labels of the temp file unconfined_u:object_r:container_runtime_tmpfs_t:s0.
The SELinux labels of the systemd-journald daemon system_u:system_r:syslogd_t:s0.

@matejvasek
Copy link
Contributor Author

@rhatdan @edsantiago I added some new findings above.

@matejvasek
Copy link
Contributor Author

simple reproducer:

var err error
msgLen := 212_940 + 1
data := make([]byte, msgLen)
for i := range data {
	data[i] = '0'
}
err = journal.Send(string(data[:msgLen]), journal.PriInfo, nil)
if err != nil {
	t.Fatal(err)
}

don't forget to set chcon on the testing binary.

@matejvasek matejvasek force-pushed the enable-test-test_wait_next_exit branch from d87e3ab to 1f58b64 Compare April 3, 2023 15:07
@rhatdan
Copy link
Member

rhatdan commented Apr 3, 2023

container-selinux-2.209.0 should fix this.

@edsantiago
Copy link
Member

There's a new CI VM build in progress, expected for next week maybe. f37 includes container-selinux-209 but unfortunately f36 does not

@matejvasek
Copy link
Contributor Author

The Docker-py Compat. test runs on F37, support for F36 ends mid May so I think it's fine.

@matejvasek matejvasek force-pushed the enable-test-test_wait_next_exit branch 4 times, most recently from 6f6c80b to eb241b6 Compare April 11, 2023 21:54
@edsantiago
Copy link
Member

@matejvasek re-pushing isn't going to do any good if you don't rebase. The magic you need is in .cirrus.yml and was merged in #18124.

@matejvasek matejvasek force-pushed the enable-test-test_wait_next_exit branch from eb241b6 to d286564 Compare April 12, 2023 12:50
@matejvasek
Copy link
Contributor Author

@rhatdan @edsantiago Thanks. It should be all right now.

@edsantiago
Copy link
Member

I think this is the test in question?. It passed (all tests pass), and shows container-selinux-2.209 present, so I guess this is good to go?

/approve

@rhatdan PTAL.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Apr 12, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: edsantiago, matejvasek

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 12, 2023
@rhatdan
Copy link
Member

rhatdan commented Apr 12, 2023

/lgtm
Thanks @matejvasek

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Apr 12, 2023
@openshift-merge-robot openshift-merge-robot merged commit c8eb151 into containers:main Apr 12, 2023
@github-actions github-actions bot added the locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. label Sep 1, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 1, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. release-note-none
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants