Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Security Manager Replacement] Strengthen OS core security via systemd configuration #16729

Open
kumargu opened this issue Nov 27, 2024 · 9 comments
Labels
Meta Meta issue, not directly linked to a PR untriaged

Comments

@kumargu
Copy link
Contributor

kumargu commented Nov 27, 2024

Please describe the end goal of this project

The OS core's security in absence of security manager can be strengthened by using a stronger systemd unit configuration. We can imagine this as sandboxing via systemd protecting itself from vulnerability in core or untrusted code (plugins). However, this is not a complete replacement of security manager, a part of it like controlling access to egress network, controlling access to specific file locations can be sought as a replacement. Some of the newly introduced configs will add more security than what is being offered by security manager.

Supporting References

#1687

Issues

#16634

Related component

Other

@kumargu kumargu added Meta Meta issue, not directly linked to a PR untriaged labels Nov 27, 2024
@kumargu
Copy link
Contributor Author

kumargu commented Nov 27, 2024

A list of systemd unit configs which could be useful to restrict access to sys resources, lock down network access, restrict system call etc. This is a fairly exhaustive list but not a complete list -- such as access to socket ip addresses are not exhaustive. Similarly, some of them could be also removed depending on the issues seen during actual integration.

configs to be amended to exiting config

# Prevent modifications to the control group filesystem
ProtectControlGroups=true

# Prevent loading or reading kernel modules
ProtectKernelModules=true

# Prevent altering kernel tunables (sysctl parameters)
ProtectKernelTunables=true

# Restrict access to the filesystem:
# 'strict' makes /usr and /boot read-only, and hides other directories
ProtectSystem=strict

# Set device access policy to 'closed', allowing access only to specific devices
DevicePolicy=closed

# Make /proc invisible to the service, enhancing isolation
ProtectProc=invisible

# Make /usr, /boot, and /etc read-only (less restrictive than 'strict')
ProtectSystem=full

# Prevent changes to control groups (redundant with earlier setting, can be removed)
ProtectControlGroups=yes

# Prevent changing the execution domain
LockPersonality=yes


# System call filtering
# System call filterings which restricts which system calls a process can make
# @ means allowed 
# ~ means not allowed
SystemCallFilter=@system-service
SystemCallFilter=~@clock
SystemCallFilter=~@cpu-emulation
SystemCallFilter=~@obsolete
SystemCallFilter=~@reboot
SystemCallFilter=~@swap
SystemCallFilter=~@clock
SystemCallFilter=~@debug
SystemCallFilter=~@module
SystemCallFilter=~@mount
SystemCallFilter=~@raw-io
SystemCallFilter=~@resources
SystemCallFilter=~@cpu-emulation
SystemCallFilter=~@obsolete

SystemCallErrorNumber=EPERM

# Capability restrictions
# Remove the ability to block system suspends
CapabilityBoundingSet=~CAP_BLOCK_SUSPEND

# Remove the ability to establish leases on files
CapabilityBoundingSet=~CAP_LEASE

# Remove the ability to use system resource accounting
CapabilityBoundingSet=~CAP_SYS_PACCT

# Remove the ability to configure TTY devices
CapabilityBoundingSet=~CAP_SYS_TTY_CONFIG

# Remov below capabilities:
# - CAP_SYS_ADMIN: Various system administration operations
# - CAP_SYS_PTRACE: Ability to trace processes
# - CAP_NET_ADMIN: Various network-related operations
CapabilityBoundingSet=~CAP_SYS_ADMIN ~CAP_SYS_PTRACE ~CAP_NET_ADMIN


# Address family restrictions
RestrictAddressFamilies=~AF_INET ~AF_INET6 ~AF_NETLINK ~AF_PACKET ~AF_UNIX

 
# Filesystem Access
 
ReadWritePaths=/var/log/elasticsearch /var/lib/opensearch 
 
# Namespace restrictions
# ~CLONE_NEWCGROUP: Prevents creation of new cgroup namespaces
# ~CLONE_NEWIPC: Prevents creation of new IPC namespaces
# ~CLONE_NEWNET: Prevents creation of new network namespaces
# ~CLONE_NEWNS: Prevents creation of new mount namespaces
# ~CLONE_NEWPID: Prevents creation of new PID namespaces
# ~CLONE_NEWUSER: Prevents creation of new user namespaces
# ~CLONE_NEWUTS: Prevents creation of new UTS namespaces
RestrictNamespaces=true
RestrictNamespaces=~CLONE_NEWCGROUP ~CLONE_NEWIPC ~CLONE_NEWNET ~CLONE_NEWNS ~CLONE_NEWPID ~CLONE_NEWUSER ~CLONE_NEWUTS

# Memory and execution protection
MemoryDenyWriteExecute=true           # Prevent creating writable executable memory mappings
SystemCallArchitectures=native        # Allow only native system calls
KeyringMode=private                   # Service does not share key material with other services
LockPersonality=true                  # Prevent changing ABI personality
RestrictSUIDSGID=true                 # Prevent creating SUID/SGID files
RestrictRealtime=true                 # Prevent acquiring realtime scheduling
ProtectHostname=true                  # Prevent changes to system hostname
ProtectKernelLogs=true                # Prevent reading/writing kernel logs
ProtectClock=true                     # Prevent tampering with the system clock

# Socket restrictions
SocketBindAllow=tcp:9200
SocketBindAllow=tcp:9201                            
SocketBindDeny=any                    # Deny all other socket bindings


# Optional directives
IPAddressDeny=                       # Deny access to specific IP addresses
PrivateNetwork=true                  # Use a private network namespace

@kumargu
Copy link
Contributor Author

kumargu commented Nov 27, 2024

// todo

explore below configs

# Optional directives
IPAddressDeny=                       # Deny access to specific IP addresses
PrivateNetwork=true                  # Use a private network namespace

@rmuir
Copy link
Contributor

rmuir commented Nov 27, 2024

nice to see this work here, especially sandboxing things such as filesystem with systemd fills a big gap that security manager was doing, and IMO does it in a much better way.

I'd be curious to see change in reported exposure level from systemd-analyze security opensearch.service with your improvements.

@kumargu
Copy link
Contributor Author

kumargu commented Nov 28, 2024

thanks @rmuir. I will post the results from systemd-analyze security opensearch.service.

@kumargu
Copy link
Contributor Author

kumargu commented Nov 28, 2024

cc @andrross

@andrross
Copy link
Member

Thanks @kumargu, I think this approach is super promising.

It does raise some questions around testing and maintaining a properly secured systemd config. We'll need to evaluate our integration testing and release pipeline to ensure we have proper coverage of this, as the evolving code base may sometimes (rarely I hope) require tweaks to this config.

@rmuir
Copy link
Contributor

rmuir commented Nov 30, 2024

You don't need to have all the SystemCallFilter= entries. Most of what you have listed is already excluded via @system-service.

i'd nuke all the CapabilityBoundingSet entries, just replace with CapabilityBoundingSet=. I'd set NoNewPrivileges=true along the same lines of not allowing escalation.

Same goes with your RestrictNamespaces= entries, they are not needed. it is enough to just set RestrictNamespaces=true.

@kumargu
Copy link
Contributor Author

kumargu commented Nov 30, 2024

Ack @rmuir, I'll make changes as suggested by you, all your comments makes sense to me. I am going to try out the actual integration tomorrow and post results.

@kumargu
Copy link
Contributor Author

kumargu commented Nov 30, 2024

@andrross -- thanks for bringing up the testing part of it. I will think more about it. At the moment, I could think of having a test.service which gets spawned in test suite and verifies that it doesn't have access to restricted resources, e.g; no read/ write access to var/log/elasticsearch.

I don't think we will be able to get a full coverage, but we can cover for the most critical ones. And yes, it would be rare we'd be changing the configs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Meta Meta issue, not directly linked to a PR untriaged
Projects
Status: New
Development

No branches or pull requests

3 participants