-
Notifications
You must be signed in to change notification settings - Fork 721
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Elasticsearch 8.0.0-SNAPSHOT fails at startup due to volume permissions #2791
Comments
We run an init container to change the owner of the data volume to # chown the data and logs volume to the elasticsearch user
# only done when running as root, other cases should be handled
# with a proper security context
chown_start=$(date +%s)
if [[ $EUID -eq 0 ]]; then
{{range .ChownToElasticsearch}}
echo "chowning {{.}} to elasticsearch:elasticsearch"
chown -v elasticsearch:elasticsearch {{.}}
{{end}}
fi In 8.0.0-SNAPSHOT the init container runs with the
|
Related to #2599 |
I think the way we currently deal with volumes permissions is not great: we run an init container to I think this would be better dealt with apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
name: elasticsearch-sample
spec:
version: 8.0.0-SNAPSHOT
nodeSets:
- name: default
count: 3
podTemplate:
spec:
securityContext:
fsGroup: 1000 I think the right thing to do is replace our custom init container chown mechanism by a default However I'm not exactly sure of the implications on Openshift. This documentation gives more details. |
The difference in Elasticsearch behaviour comes from elastic/elasticsearch#50277, where I've done some tests regarding First, there is no problem running 8.0..0-SNAPSHOT on Openshift. Openshift changes the default user the container runs with to an arbitrary one for security reasons (in my example: UID Whereas in "regular" Kubernetes, the Setting
One solution to this, detailed in Openshift docs is to not use the default tl;dr:
Let's see if we can find a common solution here. In any case, changing permissions in the init container does not feel like the right thing to do. Related k8s issue: kubernetes/kubernetes#2630. |
I think it will work as long as the cluster is not secured. If there is a PSP that restrict the range for the fsGroup (which is I would expect on production clusters) chances are it will fail the same way I guess. |
You're right @barkbay this goes beyond the scope of Openshift vs. not Openshift. The question is more: is there a PSP (or SCC on Openshift) or not? IIUC this doc correctly, setting an When no PSP/SCC is enforced, we probably need to set Should we default to one or the other, or try to auto-detect what's best? Users can still override the |
Assuming we want to rely on
Regarding ECK defaults, we have several options. 1. Don't set a default
|
My inclination would be for option 3. It makes things tricky since there is an implicit/non-robust decision being made by the operator, but it gives the easiest quickstart experience with less overhead in the documentation. If that ends up being too complicated, my second choice would be option 2. In short: favor a quickstart experience on an unsecured k8s cluster, and try to redirect other users (including Openshift users) to a dedicated doc page about disabling the securityContext. |
A few things we discussed today with @nkvoll @pebrc @anyasabo. No decision reached yet:
|
I think I'm leaning towards the following:
I'm not sure whether this should apply to all stack versions, or only apply to 8.0+ so we don't break compatibility with existing pre-8.0 clusters. |
I think I agree Seb. I'm less sure on only implementing it for 8.0+. It's nice because it is "simple" from a user experience -- if an existing user with PSPs upgrades to the version of ECK that includes the automatic fsGroup setting, it only fails for new 8.x clusters, or fails on the first pod during a rolling upgrade to 8.x. So the impact is minimal and should be relatively easy to notice by users. I think it's less nice because of the complexity involved. We're arguably not doing the "right thing" now in <8.0 by using the init container instead of the native feature that does what we want it to do. There's even more differences between 6.x/7.x and 8.x that aren't really related to actual Elasticsearch and are more related to how Elasticsearch is packaged. Being consistent wherever we can is nice just to minimize mental load both for us and our users (who have to keep a mental map of what behavior we default with across different ES versions). Downsides of defaulting fsGroups for all versions:
Overall I think I'm okay with making this change for all ES versions, but could still be persuaded otherwise. |
👍 on using the flag @sebgl proposed.
Side-note: I am still a bit worried about the number of flags we add to the binary (17 atm) and still think we should consider a configuration file. But maybe not for the ones that are feature toggles like this one but for the configuration values like cert validity and such. |
If we use a flag I think we will have to make a choice regarding the operator hub: Either :
|
A few things we discussed with the team:
|
What we agreed on with the team (basically summarizes the discussion above):
|
@sebgl just trying to figure out where we stand with this issue. I think I ticked all the right boxes. |
@pebrc yes! Unassigning myself here since not really working on this at the moment. |
As we are getting to the release of
Based on the above:
I'd like to use the fact that we actually have three values possible for the flag: The experience for different user grous will be following:
Tbh, I'd challenge the usefulness of the warning. Because of non-ocp clusters with PSP we can't warn if users have this flag set to false. This means that the flag is only helpful if OCP user misconfigures it. |
We've discussed offline and decided to:
|
Exception raised at startup:
I think that's because the Docker image runs with user
elasticsearch
by default, whereas it was using userroot
before (even though the elasticsearch process itself runs as userelasticsearch
):The text was updated successfully, but these errors were encountered: