[Elastic Agent] Fix Docker container to allow state to properly be handled #24817

blakerouse · 2021-03-29T14:34:07Z

What does this PR do?

This fixes the Elastic Agent Docker image to allow a proper state to be stored for the image. Previously this was not possible for the state to be placed into a volume which prevent restarts of the container to maintain its previous state.

This changes the Elastic Agent to not version its directory when running under Docker. This is key to allowing the state to be storable and being that an Elastic Agent container cannot be self-upgraded it is acceptable that it is not versioned.

Because of how the Elastic Agent ships with the builds in the download directory the state could not mount over that directory of Elastic Agent would need to re-download those files. This changes the container sub-command to rsync the builds download directory into the state directory before actually running. This allows new downloads to stay in the state directory and allows the running Elastic Agent to use the binaries that shipped with the image.

By default the state of the Docker container goes to /usr/share/elastic-agent/state. This can changed with STATE_PATH environment. Elastic Agent logs now default to output stderr so they can be collected by the container runtime. In the case that LOGS_PATH variable is set then the Elastic Agent will log to $LOGS_PATH/elastic-agent.log instead of stderr.

Other fixes include using the $STATE_PATH/data/tmp instead of /tmp for the unix sockets as the container cannot write to /tmp.

Why is it important?

Critical to allow the state of the Elastic Agent to persist restarts of the Elastic Agent Docker image.

Checklist

My code follows the style guidelines of this project
I have commented my code, particularly in hard-to-understand areas
~~[ ] I have made corresponding changes to the documentation~~
~~[ ] I have made corresponding change to the default configuration files~~
I have added tests that prove my fix is effective or that my feature works
I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Author's Checklist

State is stored in /usr/share/elastic-agent/state.
State path can be changed with STATE_PATH.

How to test this PR locally

$ # Build the docker container in dev-mode
$ cd x-pack/elastic-agent
$ PLATFORMS="+all linux/amd64" mage dev:package
$ # Run the docker container
$ docker run -it -e KIBANA_FLEET_SETUP=1 -e ELASTICSEARCH_HOST=http://host.docker.internal:9200 -e KIBANA_HOST=http://host.docker.internal:5601 -e FLEET_ENROLL=1 -e FLEET_URL=http://host.docker.internal:8220 -e FLEET_INSECURE=1 docker.elastic.co/beats/elastic-agent:8.0.0

Related issues

elasticmachine · 2021-03-29T14:34:59Z

Pinging @elastic/agent (Team:Agent)

blakerouse · 2021-03-29T14:35:05Z

/package

elasticmachine · 2021-03-29T15:39:04Z

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS

Expand to view the summary

Build stats

Build Cause: Pull request #24817 updated
Start Time: 2021-03-31T18:27:25.675+0000
Duration: 51 min 4 sec
Commit: 7d1da80

Test stats 🧪

Test	Results
Failed	0
Passed	6588
Skipped	16
Total	6604

Trends 🧪

💚 Flaky test report

Tests succeeded.

Expand to view the summary

Test stats 🧪

Test	Results
Failed	0
Passed	6588
Skipped	16
Total	6604

blakerouse · 2021-03-29T20:54:23Z

/package

simitt

Log path defaults to $STATE_PATH/logs but that can also log to a completely different path by setting LOG_PATH environment.

Providing LOGS_PATH=/myapp/logs when starting the container does not log anything at this directory. All logs are always logged under $STATE_PATH/data/logs.

simitt · 2021-03-30T14:26:19Z

x-pack/elastic-agent/pkg/agent/cmd/container.go

+		return fmt.Errorf("syncing download directory to STATE_PATH(%s) failed: %s", pathCfg.State, err)
+	}
+	paths.SetTop(filepath.Join(pathCfg.State, "data"))
+	paths.SetConfig(pathCfg.State)


Can you make this configurable independently of the pathCfg.State?

We could? What is the use case? Don't like to add an ENV unless it is useful.

We have a usecase with an existing directory structure:

/app /app/config /app/logs /app/data

When setting STATE_PATH=/app the agent expects the configuration at /app/.

simitt · 2021-03-30T14:35:30Z

x-pack/elastic-agent/pkg/agent/cmd/container.go

 	var err error
 	var client *kibana.Client
 	executable, err := os.Executable()
 	if err != nil {
 		return err
 	}

+	// set paths early so all action below use the defined paths
+	err = handlePaths(cfg.Agent.Paths)


The APM Server is extracted from the Elastic Agent's download folder. At the moment this is done before the Elastic Agent gets started (https://github.com/elastic/beats/pull/24817/files#diff-1f8e1e2079524931fa8f3cf121667a25baa2b998557136500d5e7596b46e9187R159), so also before you are setting the path variables here. The path variables need to be set as the very first action.

If you prefer I can change the order to make this work with the cloud mode after your PR gets merged.

I think because of me pushing an update I broke the link your provided. Can you provide it again so I can see what you are talking about. Would be happy to fix it in this branch.

certainly; https://github.com/elastic/beats/blob/master/x-pack/elastic-agent/pkg/agent/cmd/container.go#L157

blakerouse · 2021-03-30T16:14:50Z

Log path defaults to $STATE_PATH/logs but that can also log to a completely different path by setting LOG_PATH environment.

Providing LOGS_PATH=/myapp/logs when starting the container does not log anything at this directory. All logs are always logged under $STATE_PATH/data/logs.

Yes I question if the LOGS_PATH should just be removed. Being that the Elastic Agent logs to stdout when running in container mode, there is nothing to write to in that directory. I only added for brevity with the --path.logs, which does work when you want it to actually log to a file.

But for a container it should write to stdout, which is how it is setup now. That way a log collector for docker or kubernetes will automatically pick up the logs.

blakerouse · 2021-03-30T16:18:45Z

Providing LOGS_PATH=/myapp/logs when starting the container does not log anything at this directory. All logs are always logged under $STATE_PATH/data/logs.

Adding more to this the Elastic Agent expects that logs exists at $STATE_PATH/data/logs/*. This is because the Elastic Agent itself ships those logs to Elasticsearch (it can be a different one). Adding the ability to change this path and still allow the Elastic Agent to ship them its going to make it even more confusing and error prone I feel.

Why doesn't setting STATE_PATH=/myapp and reading from /myapp/data/logs not work?

* unify ENV for home, logs, config, data * add CONFIG_PATH * set paths and copy downloads before extracting APM Server * restrict path config to ENV

simitt · 2021-03-31T14:00:20Z

x-pack/elastic-agent/CHANGELOG.next.asciidoc

@@ -77,3 +77,4 @@
 - Add TLS support for Fleet Server {pull}24142[24142]
 - Add support for Fleet Server running under Elastic Agent {pull}24220[24220]
 - Add CA support to Elastic Agent docker image {pull}24486[24486]
+- Add STATE_PATH to Elastic Agent docker image {pull}24817[24817]


also add LOGS_PATH and CONFIG_PATH

…nroll to exit when Elastic Agent crashes.

…ndled (elastic#24817) * Adjust paths to allow setting data path when running in container mode. * Fix ability for elastic-agent to have proper state in Docker. * Add changelog entry. * Fix lint. * Fix issue with bootstrapping with Fleet Server. * Fix lint. * additional fixes * unify ENV for home, logs, config, data * add CONFIG_PATH * set paths and copy downloads before extracting APM Server * restrict path config to ENV * Cleanup some issues with cloud mode. * Fix issue with CONFIG_PATH when empty. Improve sub-process agent in enroll to exit when Elastic Agent crashes. * Cleanup the paths to be correct. * Update changelog. * Fix unit tests. Co-authored-by: simitt <[email protected]> (cherry picked from commit 1f1fae5)

…ndled (#24817) (#24879) * Adjust paths to allow setting data path when running in container mode. * Fix ability for elastic-agent to have proper state in Docker. * Add changelog entry. * Fix lint. * Fix issue with bootstrapping with Fleet Server. * Fix lint. * additional fixes * unify ENV for home, logs, config, data * add CONFIG_PATH * set paths and copy downloads before extracting APM Server * restrict path config to ENV * Cleanup some issues with cloud mode. * Fix issue with CONFIG_PATH when empty. Improve sub-process agent in enroll to exit when Elastic Agent crashes. * Cleanup the paths to be correct. * Update changelog. * Fix unit tests. Co-authored-by: simitt <[email protected]> (cherry picked from commit 1f1fae5)

Due to some changes in elastic-agent (elastic/beats#24817), injection of the apm-server binary became ineffective and we have been running system tests with the published artifacts. Artifacts (such as the apm-server) are now unpacked into state/data/install/<artifact>. The state/data/install directory is expected to be owned by the elastic-agent user, so we can no longer bind mount the apm-server binary. Instead, we now create a custom Docker image and copy in the apm-server and apm-server.yml files.

Due to some changes in elastic-agent (elastic/beats#24817), injection of the apm-server binary became ineffective and we have been running system tests with the published artifacts. Artifacts (such as the apm-server) are now unpacked into state/data/install/<artifact>. The state/data/install directory is expected to be owned by the elastic-agent user, so we can no longer bind mount the apm-server binary. Instead, we now create a custom Docker image and copy in the apm-server and apm-server.yml files. (cherry picked from commit 301caed)

Due to some changes in elastic-agent (elastic/beats#24817), injection of the apm-server binary became ineffective and we have been running system tests with the published artifacts. Artifacts (such as the apm-server) are now unpacked into state/data/install/<artifact>. The state/data/install directory is expected to be owned by the elastic-agent user, so we can no longer bind mount the apm-server binary. Instead, we now create a custom Docker image and copy in the apm-server and apm-server.yml files.

Due to some changes in elastic-agent (elastic/beats#24817), injection of the apm-server binary became ineffective and we have been running system tests with the published artifacts. Artifacts (such as the apm-server) are now unpacked into state/data/install/<artifact>. The state/data/install directory is expected to be owned by the elastic-agent user, so we can no longer bind mount the apm-server binary. Instead, we now create a custom Docker image and copy in the apm-server and apm-server.yml files. (cherry picked from commit 301caed) Co-authored-by: Andrew Wilkins <[email protected]>

blakerouse added 2 commits March 25, 2021 12:49

Adjust paths to allow setting data path when running in container mode.

fb22d3f

Fix ability for elastic-agent to have proper state in Docker.

ac47ff7

blakerouse added the Team:Elastic-Agent Label for the Agent team label Mar 29, 2021

blakerouse self-assigned this Mar 29, 2021

botelastic bot added needs_team Indicates that the issue/PR needs a Team:* label and removed needs_team Indicates that the issue/PR needs a Team:* label labels Mar 29, 2021

Add changelog entry.

5ff72ab

blakerouse marked this pull request as ready for review March 29, 2021 14:34

Fix lint.

d4f5b22

ruflin mentioned this pull request Mar 30, 2021

Fleet-server docs overview elastic/fleet-server#184

Closed

15 tasks

simitt reviewed Mar 30, 2021

View reviewed changes

Fix issue with bootstrapping with Fleet Server.

d0193e3

blakerouse and others added 3 commits March 30, 2021 12:41

Fix lint.

91dae4c

additional fixes

935064a

* unify ENV for home, logs, config, data * add CONFIG_PATH * set paths and copy downloads before extracting APM Server * restrict path config to ENV

Cleanup some issues with cloud mode.

5ce5306

simitt approved these changes Mar 31, 2021

View reviewed changes

blakerouse added 4 commits March 31, 2021 11:05

Fix issue with CONFIG_PATH when empty. Improve sub-process agent in e…

93deacd

…nroll to exit when Elastic Agent crashes.

Cleanup the paths to be correct.

4291897

Update changelog.

d8da230

Fix unit tests.

7d1da80

blakerouse merged commit 1f1fae5 into elastic:master Mar 31, 2021

blakerouse deleted the fix-agent-data-container-path branch March 31, 2021 20:00

blakerouse mentioned this pull request Mar 31, 2021

Cherry-pick #24817 to 7.x: [Elastic Agent] Fix Docker container to allow state to properly be handled #24879

Merged

6 tasks

blakerouse added the v7.13.0 label Mar 31, 2021

mostlyjason mentioned this pull request Apr 15, 2021

Add new topic about Fleet server elastic/observability-docs#449

Closed

8 tasks

pebrc mentioned this pull request Apr 19, 2021

Adjust Elastic Agent controller to make use of new container state handling in 7.13 elastic/cloud-on-k8s#4428

Closed

pebrc mentioned this pull request Apr 27, 2021

[Elastic Agent] fork/exec from within STATE_PATH incompatible with K8s hostPath volumes #25326

Closed

dedemorton mentioned this pull request May 5, 2021

Document how to run Agent and Fleet Server in docker elastic/observability-docs#622

Closed

3 tasks

axw mentioned this pull request Jun 13, 2021

systemtest: fix apm-server binary injection elastic/apm-server#5440

Merged

cmacknz mentioned this pull request Jan 16, 2023

Kubernetes examples and documentation should store state on a host path volume elastic/elastic-agent#2109

Closed

tetianakravchenko mentioned this pull request Apr 25, 2023

Add hostPath to store elastic-agent state elastic/elastic-agent#2550

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Elastic Agent] Fix Docker container to allow state to properly be handled #24817

[Elastic Agent] Fix Docker container to allow state to properly be handled #24817

blakerouse commented Mar 29, 2021 •

edited

Loading

elasticmachine commented Mar 29, 2021

blakerouse commented Mar 29, 2021

elasticmachine commented Mar 29, 2021 •

edited by jenkins-beats-ci bot

Loading

Build stats

Test stats 🧪

Trends 🧪

Test stats 🧪

blakerouse commented Mar 29, 2021

simitt left a comment

simitt Mar 30, 2021

blakerouse Mar 30, 2021

simitt Mar 30, 2021

simitt Mar 30, 2021

blakerouse Mar 30, 2021

simitt Mar 30, 2021

blakerouse commented Mar 30, 2021

blakerouse commented Mar 30, 2021

simitt Mar 31, 2021

[Elastic Agent] Fix Docker container to allow state to properly be handled #24817

[Elastic Agent] Fix Docker container to allow state to properly be handled #24817

Conversation

blakerouse commented Mar 29, 2021 • edited Loading

What does this PR do?

Why is it important?

Checklist

Author's Checklist

How to test this PR locally

Related issues

elasticmachine commented Mar 29, 2021

blakerouse commented Mar 29, 2021

elasticmachine commented Mar 29, 2021 • edited by jenkins-beats-ci bot Loading

💚 Build Succeeded

Build stats

Test stats 🧪

Trends 🧪

💚 Flaky test report

Test stats 🧪

blakerouse commented Mar 29, 2021

simitt left a comment

Choose a reason for hiding this comment

simitt Mar 30, 2021

Choose a reason for hiding this comment

blakerouse Mar 30, 2021

Choose a reason for hiding this comment

simitt Mar 30, 2021

Choose a reason for hiding this comment

simitt Mar 30, 2021

Choose a reason for hiding this comment

blakerouse Mar 30, 2021

Choose a reason for hiding this comment

simitt Mar 30, 2021

Choose a reason for hiding this comment

blakerouse commented Mar 30, 2021

blakerouse commented Mar 30, 2021

simitt Mar 31, 2021

Choose a reason for hiding this comment

blakerouse commented Mar 29, 2021 •

edited

Loading

elasticmachine commented Mar 29, 2021 •

edited by jenkins-beats-ci bot

Loading