Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

APMServer keystore cannot be initialized in version 7.9.0 #3349

Closed
sebgl opened this issue Jun 30, 2020 · 8 comments
Closed

APMServer keystore cannot be initialized in version 7.9.0 #3349

sebgl opened this issue Jun 30, 2020 · 8 comments
Assignees
Labels
>bug Something isn't working

Comments

@sebgl
Copy link
Contributor

sebgl commented Jun 30, 2020

https://devops-ci.elastic.co/job/cloud-on-k8s-e2e-tests-snapshot-versions/87/testReport/github/com_elastic_cloud-on-k8s_test_e2e_apm/TestUpdateConfiguration_APM_Pod_should_be_recreated/

=== RUN   TestUpdateConfiguration/APM_Pod_should_be_recreated
Retries (5m0s timeout): ....................................................................................................
    TestUpdateConfiguration/APM_Pod_should_be_recreated: utils.go:84: 
        	Error Trace:	utils.go:84
        	Error:      	Received unexpected error:
        	            	1 APM pod expected, got 2
        	Test:       	TestUpdateConfiguration/APM_Pod_should_be_recreated
{"log.level":"error","@timestamp":"2020-06-30T00:59:52.914Z","message":"stopping early","service.version":"0.0.0-00000000","service.type":"eck","ecs.version":"1.4.0","error":"test failure","error.stack_trace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/go/pkg/mod/github.com/go-logr/[email protected]/zapr.go:128\ngithub.com/elastic/cloud-on-k8s/test/e2e/test.StepList.RunSequential\n\t/go/src/github.com/elastic/cloud-on-k8s/test/e2e/test/step.go:43\ngithub.com/elastic/cloud-on-k8s/test/e2e/apm.TestUpdateConfiguration\n\t/go/src/github.com/elastic/cloud-on-k8s/test/e2e/apm/configuration_test.go:197\ntesting.tRunner\n\t/usr/local/go/src/testing/testing.go:991"}
    --- FAIL: TestUpdateConfiguration/APM_Pod_should_be_recreated (300.00s)
@sebgl sebgl self-assigned this Jun 30, 2020
@sebgl
Copy link
Contributor Author

sebgl commented Jun 30, 2020

I can reproduce on 7.9.0-SNAPSHOT (it works fine on 7.8.0).
The keystore init container fails with:

+ keystore_initialized_flag=/usr/share/apm-server/data/elastic-internal-init-keystore.ok
+ [[ -f /usr/share/apm-server/data/elastic-internal-init-keystore.ok ]]
+ echo 'Initializing keystore.'
+ /usr/share/apm-server/apm-server keystore create --force
Initializing keystore.
error initializing beat: error loading config file: config file ("apm-server.yml") can only be writable by the owner but the permissions are "-rw-rw----" (to fix the permissions use: 'chmod go-w /usr/share/apm-server/apm-server.yml')

@sebgl sebgl changed the title Flaky E2E test: TestUpdateConfiguration/APM_Pod_should_be_recreated APMServer keystore cannot be initialized in version 7.9.0 Jun 30, 2020
@sebgl sebgl added >bug Something isn't working and removed >flaky_test labels Jun 30, 2020
@sebgl
Copy link
Contributor Author

sebgl commented Jun 30, 2020

Files are owned by apm-server in version 7.8.0:

bash-4.2$ ls -la
total 97408
drwxr-x--- 1 root apm-server     4096 Jun 14 17:15 .
drwxr-xr-x 1 root root           4096 Jun 14 17:15 ..
-rw-r----- 1 root apm-server       41 Jun 14 17:14 .build_hash.txt
-rw-r----- 1 root apm-server    13675 Jun 14 17:14 LICENSE.txt
-rw-r----- 1 root apm-server   157358 Jun 14 17:14 NOTICE.txt
-rw-r----- 1 root apm-server      660 Jun 14 17:14 README.md
-rwxr-x--- 1 root apm-server 99282096 Jun 14 17:14 apm-server
-rw-r----- 1 root apm-server    47503 Jun 14 17:14 apm-server.yml
drwxrwx--- 2 root apm-server     4096 Jun 14 17:15 data
-rw-r----- 1 root apm-server   206047 Jun 14 17:14 fields.yml
drwxr-x--- 1 root apm-server     4096 Jun 14 17:14 ingest
drwxrwx--- 2 root apm-server     4096 Jun 14 17:15 logs

and owned by root in 7.9.0-SNAPSHOT:

bash-4.2$ ls -la
total 98776
drwxrwx--- 1 root root      4096 Jun 28 05:56 .
drwxr-xr-x 1 root root      4096 Jun 28 05:56 ..
-rw-rw---- 1 root root        41 Jun 28 05:55 .build_hash.txt
-rw-rw---- 1 root root     13675 Jun 28 05:55 LICENSE.txt
-rw-rw---- 1 root root    159237 Jun 28 05:55 NOTICE.txt
-rw-rw---- 1 root root       669 Jun 28 05:55 README.md
-rwxrwx--- 1 root root 100681264 Jun 28 05:55 apm-server
-rw-rw---- 1 root root     48413 Jun 28 05:55 apm-server.yml
drwxrwx--- 2 root root      4096 Jun 28 05:56 data
-rw-rw---- 1 root root    206201 Jun 28 05:55 fields.yml
drwxrwx--- 1 root root      4096 Jun 28 05:55 ingest
drwxrwx--- 2 root root      4096 Jun 28 05:56 logs

@barkbay
Copy link
Contributor

barkbay commented Jun 30, 2020

It's odd, looks like a side effect of elastic/beats#18873 but it is not merged yet

@sebgl
Copy link
Contributor Author

sebgl commented Jun 30, 2020

Another identified issue in Beats: elastic/beats#18858 (cc @simitt)

@jsoriano
Copy link
Member

It seems that the 7.x branch of apm-server is using a version of beats with elastic/beats#12905, what provoked elastic/beats#18858. This change was reverted on elastic/beats#18872 to solve this issue.
Would it be possible to update beats in apm-server to a changeset including the reverted change?

@simitt
Copy link

simitt commented Jun 30, 2020

thanks @jsoriano, I'll merge the update once CI has passed elastic/apm-server#3924.

@sebgl
Copy link
Contributor Author

sebgl commented Jul 3, 2020

It seems fixed now, the latest nightly build didn't fail 🥳
Thanks @simitt @jsoriano.

@sebgl sebgl closed this as completed Jul 3, 2020
@simitt
Copy link

simitt commented Jul 3, 2020

There was another update related to this, I just merged the PR pulling in the changes in libbeat. LMK in case issues arise again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants