Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] New os user/group doesn't reflect in Cloudbeat #235

Closed
uri-weisman opened this issue Jun 22, 2022 · 19 comments · Fixed by elastic/elastic-agent#1382 or elastic/kibana#143492
Closed
Assignees
Labels
8.6 candidate bug Something isn't working csp: GA Team:Cloud Security Cloud Security team related verified label for fixed and retested issues

Comments

@uri-weisman
Copy link
Contributor

uri-weisman commented Jun 22, 2022

Describe the bug
Remediation that requires creating a new OS user/group won't be presented to the user even though it took effect.

Preconditions
A Containerized cloudbeat using a Kind cluster.

To Reproduce

  1. Verify CIS rule 1.1.12 is failing the evaluation.
  2. Apply remediation:
  3. SSH to host (kind cluster)
  4. groupadd etcd
  5. useradd -g etcd etcd
  6. chown etcd:etcd /var/lib/etcd/
  7. Wait a cycle for new results to be presented
  8. CIS rule 1.1.12 is still failing the evaluation.

Expected behavior
CIS rule 1.1.12 is should pass as we apply the right remediation.

Investigation conclusions

  1. group and user info can be found in /etc/group & /etc/passwd respectively.
  2. In order to propagate this data to cloudbeat we mount those files to the elastic-agent.
  3. In the case of a missing user/group we use the useradd/groupadd commands.
  4. Those commands edit the mentioned files and modify their inodes.
  5. When you mount an individual file into a container, what you are mounting is the inode of the file on the FS (discussion).
    As a result, the files inside the container are not synced.

Proposed solution
Mounting an entire directory will solve the issue because the directory's pointer to the inode gets updated.
Having said that, we don't want to mount the entire etc directory, therefore, we can create a new folder to mount and symlink the mentioned files.

Screenshots

useradd cmd modify /etc/passwd file inode.

Screen Shot 2022-06-22 at 17 31 43

Related issue:

@uri-weisman uri-weisman added the bug Something isn't working label Jun 22, 2022
@uri-weisman uri-weisman changed the title [BUG] New user wouldn [BUG] New os user/group doesn't does not infiltrate Cloudbeat Jun 22, 2022
@uri-weisman uri-weisman added 8.4 candidate Team:Cloud Security Cloud Security team related labels Jun 22, 2022
@oren-zohar oren-zohar changed the title [BUG] New os user/group doesn't does not infiltrate Cloudbeat [BUG] New os user/group doesn't infiltrate Cloudbeat Jun 23, 2022
@tinnytintin10
Copy link

I have prioritized this as a 0 @tehilashn the definition of done should be that remediations should show up as passing and there should be a test developed for this.

@oren-zohar oren-zohar changed the title [BUG] New os user/group doesn't infiltrate Cloudbeat [BUG] New os user/group doesn't reflect in Cloudbeat Jul 26, 2022
@yashtewari
Copy link
Contributor

Very thorough, Uri! This helped me while working on

I managed to make those tests work by using directory mounts and some other hacks. The whole experiment with mounted files is also neatly described in this blog. I found that it's possible to run the useradd and groupadd commands from the container by providing a prefix:

useradd etcd -P /hostfs

So this is for containerized cloudbeat, but what about Elastic Agent? We use the same mount format in those manifests, and I've confirmed that it mostly doesn't work when the file is changed on host. A newly created Agent pod comes online with the latest file contents/inode. Interestingly, in some cases it seems to work for the first time the file is changed after pod creation. The pod goes into a crash loop for a few seconds, and comes back with the latest file contents/inode. This only seems to happen once, and it seems important to understand why.

Having said that, we don't want to mount the entire etc directory, therefore, we can create a new folder to mount and symlink the mentioned files.

Do you mean making changes to the customer's host? I think this goes one step beyond anything we're doing currently, but I wonder if any other use case of Agent/beats does this. A host-agnostic script that runs on Elastic Agent startup could do the trick. @DaveSys911?

@yashtewari
Copy link
Contributor

yashtewari commented Aug 2, 2022

Some clarification: for files that are mounted individually (ie not as part of a directory tree), our file system fetcher works just fine for updates that are made in place. The problem only occurs with tools like useradd (and many other linux utilities, editors etc) that internally have the following flow:

  1. make a copy of the file
  2. make changes to the copy
  3. replace the original file with the copy

This can be verified for example with the passwd file. From within the host node, append something to it

cat >> /etc/passwd << EOF
See you on the other side
EOF

The change will be reflected in the mounted file on the container. The same problem with the mounted file would happen with a mounted directory if we replace it with a new directory of the same name. But making changes to a directory subtree is analogous to making changes to a mounted file in place as described, so it works.

It follows that if we don't mount the directory, or don't somehow re-mount the mounted file, changes made by useradd won't propagate to the Agent. I tried some things that don't work:

  1. A symbolic link (ln -s) as suggested by @uri-weisman is just a file with a path that points to another file. If we don't already have access to what we want on the container, we can't point to it either.
  2. A hard link (ln) links to the inode, and suffers from the same limitations of the mounted file.
  3. Instead of just the file, I made a volume out of the whole directory /etc, and then restricted the volumeMounts to just the relevant files using subPath or subPathExpr, but to no avail.

The options I can see:

  1. Depend on Elastic Agent restarts, and everything gets mounted again with the latest file contents/inodes. We could inform customers that if they use methods that replace the file, an Agent restart is required for changes to be reflected on their dashboard. This makes it a product decision: @tehilashn @tinnytintin10
  2. Somehow force un-mount -> mount on a running Agent.
  3. Mount the host's entire /etc directory? We would have to prove to @ph and team that exposing all host system configuration to the Agent is OK, which it probably isn't.

@yashtewari yashtewari self-assigned this Aug 2, 2022
@tehilashn
Copy link
Collaborator

Thanks @yashtewari . The ideal behaviour imo is for the user not to perform any manual operation. So option 2. seems like the best one. The question is - did you think of a way to implement it? what are the unknowns here/risks?

@yashtewari
Copy link
Contributor

@tehilashn no, it's mostly conceptual. A re-mount will definitely work, but you need superuser permissions to do mount operations, which we can't give to the Agent.

@yashtewari
Copy link
Contributor

The pod goes into a crash loop for a few seconds, and comes back with the latest file contents/inode.

This is interesting to me. Pure speculation, but with the file that is mounted being "deleted" (replaced), maybe we can expect the Agent to eventually restart itself? Looking into this.

@amirbenun
Copy link
Contributor

I am not sure that agent restart will solve it.
Maybe if you delete the agent pod and reinstall it you will get the latest file?

@yashtewari
Copy link
Contributor

@amirbenun yeah anything that triggers the file to be mounted again works, because the mount command considers the file path on the host. Once it is mounted, it gets tied to the inode. That's what I meant by "Agent restart": Agent pod restart. Delete the pod, delete the daemonset, remove the whole daemonset from the cluster and re-apply it. All of these work.

@amirbenun
Copy link
Contributor

Interesting, do you have some idea in mind on how to trigger such a restart based on a change?

@DaveSys911
Copy link
Contributor

DaveSys911 commented Aug 3, 2022

@amirbenun We can set up a watcher for inode changes and restart the pod if it's detected but two issues with this:

  1. I think it's something that if we decided to do, should be perhaps done in the agent context.
  2. I guess we will have an issue with the single file mounts since we won't see the inode changes? @yashtewari

@yashtewari
Copy link
Contributor

You're right @DaveSys911, the inode (and consequently the file contents of the mounted file) doesn't change from the perspective of the Agent, unless we mount the parent directory /etc. So far I don't see a way for a watcher to work unless it lives on the host node.

@yashtewari
Copy link
Contributor

Looks like the auditbeat DaemonSet is already mounting all of /etc, so maybe it can be OK for us to do that. @ph can we do this in the Agent's DaemonSet?

@ph
Copy link

ph commented Aug 3, 2022

Interesting, I didn't know we were mounting all /etc in auditbeat, @blakerouse @ChrsMark WDYT?

@ChrsMark
Copy link
Member

ChrsMark commented Aug 4, 2022

Interesting, I didn't know we were mounting all /etc in auditbeat, @blakerouse @ChrsMark WDYT?

Well I guess that for Auditbeat to work this is required. But auditbeat is unique, this is why we only have it for this specific Beat and not for Filebeat, Metricbeat etc.

In Agent era, I'm not sure how we can avoid defining a superset of permissions and mounts. Maybe you can consider having it commented out and only users that actually require it should add it. But it has to do with the user experience we want to provide here. We had similar conversations at elastic/elastic-agent#381 (comment). @gizas feel free to comment here.

@tehilashn
Copy link
Collaborator

since it requires upgrading the policies repo and we are too close to the last QA cycle, let's move it to 8.5

@kfirpeled
Copy link
Contributor

kfirpeled commented Sep 22, 2022

@oren-zohar , @tehilashn is this going to be part of 8.5? if not, can we move it?

@yashtewari
Copy link
Contributor

yashtewari commented Sep 30, 2022

Thanks @ChrsMark for linking to the earlier discussion! For the purpose of mounting the /etc directory from K8S nodes to the Agent pods (see PR), it's a good foundation for what we should focus on. I've summarized some points analogous to the ones made earlier, with some paraphrasing of what everyone had said:

@amitkanfer: The change involves extending existing permissions to more objects, all of which are read-only permissions.

As before, in this case too files and directories in /etc are already mounted with read-only permissions to the Agent, and this would be extended to include all of /etc.

@amitkanfer: We should go above and beyond to reduce friction for the end-user.

Without this change, there are cases where the user would have to restart the Agent pods in order to see their changes reflected on their Kibana dashboard, and until they do so, the dashboard indication would be outdated/incorrect.

@ruflin: Read-only permissions are not a big concern, but after the change, the user should still have an easy way to change or remove permissions as needed.

The user can view and edit the files and directories being mounted from the node to the Agent in the DaemonSet manifest that they obtain from Kibana.

@ruflin: In documentation, we should explain the need for these permissions and the consequences of removing them. If they are removed, we should convey why things don't work with good error messages.

I've opened an issue for this.

@tonymeehan: We should document everything contained in the resources being exposed.

/etc is where Linux applications keep configuration that isn't specific to a Linux user.

@tonymeehan: The future of Agent in Kubernetes must involve components like operators.

This is another use-case for features that will enable dynamic least privileges for our integrations.

@oren-zohar
Copy link
Collaborator

need to update other manifests before closing

@gurevichdmitry
Copy link
Collaborator

gurevichdmitry commented Dec 29, 2022

verified through checking automated tests report https://github.com/elastic/cloudbeat/commits/0d31ac7352706eaadbded5da4f270e8eac701884 - fixed

@gurevichdmitry gurevichdmitry added the verified label for fixed and retested issues label Dec 29, 2022
orouz pushed a commit to orouz/cloudbeat that referenced this issue Sep 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
8.6 candidate bug Something isn't working csp: GA Team:Cloud Security Cloud Security team related verified label for fixed and retested issues
Projects
None yet