Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Docker images suitable for ESS/ECE #49926

Closed
pugnascotia opened this issue Dec 6, 2019 · 7 comments · Fixed by #50277
Closed

Make Docker images suitable for ESS/ECE #49926

pugnascotia opened this issue Dec 6, 2019 · 7 comments · Fixed by #50277
Assignees
Labels
:Core/Infra/Core Core issues without another label v8.0.0-alpha1

Comments

@pugnascotia
Copy link
Contributor

Cloud currently builds and runs a fully custom Elasticsearch image. The Stack image should provide an image suitable for direct use instead.

User / Group IDs

Cloud currently expects to be able to set the user and group IDs via environment variables. The entry point script changes the founduser user's UID and GID to those provided in the env vars, and performs a recursive chown on some directories in the container.

The Stack image takes a different approach, in that it follows the OKD image guidelines and uses GID 0 for all files.

The different in approaches may well be due to Cloud's multi-tenancy. I don't know how OKD handles this.

(There is a related open issue, "Pin USER in Dockerfile complying with Docker best practices" (#46166).)

setuid flags

Cloud ensures that there are no files with setuid, in order to mitigate "stackclash" attacks. Basically, they do this:

RUN find / -xdev -perm -4000 -exec chmod u-s {} +

This could be done in the Stack image.

Init process

Cloud runs Elasticsearch via a mini-init process in order to avoid zombie processes. There should be no harm in adopting this in the Stack image.

JVM options

Cloud comments out the heap settings and disables the HeapDumpOnOutOfMemoryError flag in jvm.options. Cloud is also passing the relevant heap options on the command line so it's not necessary to comment out the heap settings (this has been the case since 5.0). It does has the advantage that the resulting Elasticsearch command line flags is tidier. I don't know if that's important to Cloud.

Quota-aware filesystem

Cloud changes the default filesystem provider to an implementation that is aware of user quotas. This is achieved by providing a file to the container that contains the quota settings, which the provider then takes into account then answering capacity questions. The file has a default location, which can be changed via a system property. The filesystem provider is changed via a JVM option, set via ES_JAVA_OPTS.

Cloud has suggested that the Stack provide "a native and equivalent way to deal with this". There is a Stack Overflow question that suggests that Java lacks support for this at present. It ought to be possible to perform native calls on Linux (and possibly other operating systems) that return quota information, but it's unclear how feasible / time consuming that will be. Cloud can continue using their current solution without modifying the base image.

Plugins

The Cloud Docker build process downloads and adds plugins to the image, in a "plugin-archive" directory. Then, when an allocator run an Elasticsearch image, it runs the elasticsearch-plugin tool to install the active plugins from the archive directory. User-provided plugins are downloaded and installed dynamically.

There's nothing obvious that the Stack image needs to do here.

@pugnascotia pugnascotia added :Core/Infra/Core Core issues without another label v8.0.0 labels Dec 6, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (:Core/Infra/Core)

@pugnascotia
Copy link
Contributor Author

cc @dliappis / @nachogiljaldo for input.

@alpar-t
Copy link
Contributor

alpar-t commented Dec 13, 2019

@mieciu

@droberts195
Copy link
Contributor

Cloud runs Elasticsearch via a mini-init process in order to avoid zombie processes. There should be no harm in adopting this in the Stack image.

Please can whoever does this work ping me so we make sure the kill order of the "mini-init" doesn't cause problems for ML. This problem has recently affected Cloud and is being fixed for Cloud, but also the identical problem came up as the root cause of #46262, so I'd like to make sure it's not accidentally introduced in a third place.

@pugnascotia pugnascotia self-assigned this Dec 16, 2019
@pugnascotia
Copy link
Contributor Author

Talking to @mieciu, we do want the GID 0 approach for the user running ES, which is good. We may need to support UIDs other than 1000 - I'm trying to find out more.

@pugnascotia
Copy link
Contributor Author

@droberts195 any idea who fixed it in Cloud, or a PR number maybe? Just so I can have a peek.

@droberts195
Copy link
Contributor

any idea who fixed it in Cloud

It was @mieciu. The PR is in a private repo.

pugnascotia added a commit to pugnascotia/elasticsearch that referenced this issue Jan 7, 2020
Closes elastic#49926 and elastic#46166.

   * Add a mini-init process (copied from Cloud)
   * Don't use root at all when running the container
   * Add an explicity TERM handler
   * Ensure no files in the image have the setuid flag

Also improve dependency tracking in the build.
pugnascotia added a commit that referenced this issue Jan 23, 2020
Closes #49926 and #46166. Rework the Docker image so that it comes with a tiny
init system, to ensure ML processes are correctly cleaned up, and to run ES
as a regular user instead of root.

Also:

   * Ensure no files in the image have the setuid/setgid flag
   * Also improve dependency tracking in the build
   * Remove TAKE_FILE_OWNERSHIP option and its documentation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Core/Infra/Core Core issues without another label v8.0.0-alpha1
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants