Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ROX-18838: Remove docker socket mountpoint from collector container. #1277

Merged
merged 4 commits into from
Aug 14, 2023

Conversation

ovalenti
Copy link
Contributor

@ovalenti ovalenti commented Aug 8, 2023

Description

The current definition of the Collector daemonset contains a volume to access to the Docker socket. This is not used anymore and could cause security concern.

This PR removes the mountpoint parameters in the CI tests to check that Collector is not affected by this change.

Checklist

  • Investigated and inspected CI test results

@openshift-ci
Copy link

openshift-ci bot commented Aug 8, 2023

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@github-actions
Copy link

github-actions bot commented Aug 9, 2023

Kernel Method Without Collector Time (secs) With Collector Time (secs) Baseline median (secs) Collector median (secs) PValue
rhel.rhel-8 ebpf 208.56 218.682 204.92 213.7 🟢
ubuntu-os.ubuntu-2004-lts ebpf 237.06 233.751 236.1 234.97 🟢
ubuntu-os.ubuntu-2204-lts ebpf 247.845 246.557 222.56 229.05 🟢

@ovalenti ovalenti marked this pull request as ready for review August 9, 2023 15:17
@ovalenti ovalenti requested a review from a team as a code owner August 9, 2023 15:17
@Molter73
Copy link
Collaborator

Molter73 commented Aug 10, 2023

Looks like RHCOS is showing the same behavior I was getting on my Fedora, I wonder if there's something about the format for the cgroups in that platform that is not supported by Falco and we were falling back to using the container engine for getting that data somehow 🤔

It'd be great if we could somehow log what container runtime is being used, I suspect crun is not supported by Falco and is what I'm using on my environment.

  ociRuntime:
    name: crun
    package: crun-1.8.6-1.fc38.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 1.8.6
      commit: 73f759f4a39769f60990e7d225f561b4f4f06bcf
      rundir: /run/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +LIBKRUN +WASM:wasmedge +YAJL

@Molter73
Copy link
Collaborator

Ok, I've been doing some debugging and I've found why this is failing on systems with podman.

Usually, the container ID is detected by the container engine in falco, some of the engines are run by default, but podman has this little jewel: https://github.com/stackrox/falcosecurity-libs/blob/653e24c25d10c1a7ac16455ec41d0c6bc5bfd219/userspace/libsinsp/container_engine/docker/podman.cpp#L177-L182

That piece of code is quite weird, but the basic idea is, "if the podman socket doesn't exists, don't try to match podman type cgroups", causing Falco to not be able to parse container IDs on systems running podman. We will need to figure out a way to either:

  • Mock the collector filesystem so it thinks it has podman always.
  • Have some compile time check that allows us to bypass that check.

@ovalenti
Copy link
Contributor Author

Good catch @Molter73 !

That piece of code is quite weird, but the basic idea is, "if the podman socket doesn't exists, don't try to match podman type cgroups"

I was searching for the reason behind this particular exception, and it would be a performance trick: falcosecurity/libs#296

@ovalenti
Copy link
Contributor Author

Also, none of the end-user runtime environments we support uses podman, nor docker... so I wonder whether the problem is in the code, ... or the test environment 😬

@Molter73
Copy link
Collaborator

Good catch @Molter73 !

That piece of code is quite weird, but the basic idea is, "if the podman socket doesn't exists, don't try to match podman type cgroups"

I was searching for the reason behind this particular exception, and it would be a performance trick: falcosecurity/libs#296

Yeah, I assumed something of the sort, since it has to look for the socket in multiple users.

Also, none of the end-user runtime environments we support uses podman, nor docker... so I wonder whether the problem is in the code, ... or the test environment 😬

For the time being, I think we can keep the podman mount to trick the integration tests into passing, but we'll need to look for a better solution. I'll talk to folks upstream, cause there is a mask that you can set at runtime to skip any engines you don't use, so I think it makes sense to move the check out of the libraries and into Falco itself.

The sad part in all this is that we will still have the code for querying container APIs, even if they fail and there won't be an easy way to decouple them.

…podman

We do not make use of the runtime socket, but its absence disables the runtime
cgroup parsing... We bind it to a dummy file, which should do the trick.
Copy link
Collaborator

@Molter73 Molter73 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@ovalenti
Copy link
Contributor Author

TIL: yet an other difference between podman and docker
containers/podman#6234 (comment)

@ovalenti ovalenti requested a review from Molter73 August 14, 2023 12:12
@ovalenti ovalenti merged commit dff3274 into master Aug 14, 2023
@ovalenti ovalenti deleted the ovalenti/ROX-18838-discard_docker_socket branch August 14, 2023 13:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants