Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query cluster UUID on apm-server startup #6591

Merged
merged 13 commits into from
Nov 15, 2021

Conversation

stuartnelson3
Copy link
Contributor

@stuartnelson3 stuartnelson3 commented Nov 11, 2021

Motivation/summary

stack monitoring ui in cloud depends on the beat module of metricbeat, which requires apm-server to know its cluster_uuid in order for the documents to be indexed. The apm-server only gets its cluster_uuid as a callback to making an initial connection with elasticsearch.

When using datastreams, the initial connection isn't made until it attempts to index a document, which means that stack monitoring ui is broken for apm-server until a document is indexed, even though nothing may be wrong with the apm-server.

Copy the request made in libbeat and execute it as a precondition to get the cluster_uuid as soon as the server is ready.

Checklist

How to test these changes

  1. start apm-integration with elasticsearch + kibana
  2. install the apm-integration
  3. start apm-server: apm-server -e -E apm-server.data_streams.enabled=true -E http.enabled=true
  4. curl the /state internal metrics endpoint: curl -s http://localhost:5066/state | jq '.outputs.elasticsearch'
  5. you should see non-empty output for the cluster_uuid

@stuartnelson3 stuartnelson3 requested a review from a team November 11, 2021 11:52
@mergify
Copy link
Contributor

mergify bot commented Nov 11, 2021

This pull request does not have a backport label. Could you fix it @stuartnelson3? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-7.x is the label to automatically backport to the 7.x branch.
  • backport-7./d is the label to automatically backport to the 7./d branch. /d is the digit

NOTE: backport-skip has been added to this pull request.

@mergify mergify bot added the backport-skip Skip notification from the automated backport with mergify label Nov 11, 2021
beater/beater.go Outdated
if elasticsearchRegistry.Get(clusterUUID) != nil {
elasticsearchRegistry.Remove(clusterUUID)
}
clusterUUIDRegVar := monitoring.NewString(elasticsearchRegistry, clusterUUID)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked through the api but it wasn't immediately apparent to me how to update a value already existing in a registry.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a Get method on the monitoring.Registry which I think you can use?

Would the libbeat code have been called yet at this point? I'm wondering though if there's any risk that the libbeat code could kick in concurrently and panic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a Get method on the monitoring.Registry which I think you can use?

roger, I have to cast to *monitoring.String, but it appears to work (the return value is Var interface type, it seems, which only exposes a single Visit method).

Would the libbeat code have been called yet at this point? I'm wondering though if there's any risk that the libbeat code could kick in concurrently and panic.

The registry code is called synchronously, before *beater.Run, so it should always be set up. The Set method on the *monitoring.String could be called at the same time, but it has an internal mutex, so access should be fine.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, thanks. I see you removed the test guards, looks much safer to me now.

@stuartnelson3 stuartnelson3 added backport-7.16 Automated backport with mergify to the 7.16 branch backport-8.0 Automated backport with mergify labels Nov 11, 2021
@mergify mergify bot removed the backport-skip Skip notification from the automated backport with mergify label Nov 11, 2021
@apmmachine
Copy link
Contributor

apmmachine commented Nov 11, 2021

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2021-11-15T15:21:32.765+0000

  • Duration: 44 min 54 sec

  • Commit: 498c662

Test stats 🧪

Test Results
Failed 0
Passed 6022
Skipped 19
Total 6041

🤖 GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /hey-apm : Run the hey-apm benchmark.

  • /package : Generate and publish the docker images.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

Copy link
Member

@axw axw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM, but I'm a bit worried about the timing of the var registration. Is there any possibility that the libbeat code would be called after ours, and cause a panic?

beater/beater.go Show resolved Hide resolved
beater/beater.go Outdated
if elasticsearchRegistry.Get(clusterUUID) != nil {
elasticsearchRegistry.Remove(clusterUUID)
}
clusterUUIDRegVar := monitoring.NewString(elasticsearchRegistry, clusterUUID)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a Get method on the monitoring.Registry which I think you can use?

Would the libbeat code have been called yet at this point? I'm wondering though if there's any risk that the libbeat code could kick in concurrently and panic.

Copy link
Member

@axw axw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@stuartnelson3 stuartnelson3 enabled auto-merge (squash) November 11, 2021 13:17
@stuartnelson3
Copy link
Contributor Author

/test

2 similar comments
@stuartnelson3
Copy link
Contributor Author

/test

@stuartnelson3
Copy link
Contributor Author

/test

@mergify
Copy link
Contributor

mergify bot commented Nov 15, 2021

This pull request is now in conflicts. Could you fix it @stuartnelson3? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b query-cluster-uuid upstream/query-cluster-uuid
git merge upstream/master
git push upstream query-cluster-uuid

don't return a nil error when there was an http
status issue
registries are created when the cluster_uuid
callback is registered, which doesn't happen under
elastic-agent because of the config being
delivered later.
@stuartnelson3 stuartnelson3 merged commit ecf0df2 into elastic:master Nov 15, 2021
mergify bot pushed a commit that referenced this pull request Nov 15, 2021
mergify bot pushed a commit that referenced this pull request Nov 15, 2021
(cherry picked from commit ecf0df2)

# Conflicts:
#	changelogs/head.asciidoc
@stuartnelson3 stuartnelson3 deleted the query-cluster-uuid branch November 15, 2021 16:14
marclop pushed a commit that referenced this pull request Nov 16, 2021
stuartnelson3 added a commit that referenced this pull request Nov 16, 2021
(cherry picked from commit ecf0df2)

Co-authored-by: stuart nelson <[email protected]>
@marclop marclop self-assigned this Dec 21, 2021
@marclop
Copy link
Contributor

marclop commented Dec 21, 2021

Verified with 9bb5e4c:

./apm-server -E output.elasticsearch.username=admin -E output.elasticsearch.password=changeme -E apm-server.kibana.host=http://localhost:5601 -E apm-server.kibana.username=admin -E apm-server.kibana.password=changeme -E apm-server.kibana.enabled=true -E apm-server.expvar.enabled=true -E http.enabled=true
...
$ curl -s http://localhost:5066/state | jq '.outputs.elasticsearch'
{
  "cluster_uuid": "nE_NVzmNRAWVnk5BzWVSJw"
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-7.16 Automated backport with mergify to the 7.16 branch backport-8.0 Automated backport with mergify test-plan test-plan-ok v8.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants