-
Notifications
You must be signed in to change notification settings - Fork 525
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Query cluster UUID on apm-server startup #6591
Query cluster UUID on apm-server startup #6591
Conversation
This pull request does not have a backport label. Could you fix it @stuartnelson3? 🙏
NOTE: |
beater/beater.go
Outdated
if elasticsearchRegistry.Get(clusterUUID) != nil { | ||
elasticsearchRegistry.Remove(clusterUUID) | ||
} | ||
clusterUUIDRegVar := monitoring.NewString(elasticsearchRegistry, clusterUUID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked through the api but it wasn't immediately apparent to me how to update a value already existing in a registry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a Get
method on the monitoring.Registry
which I think you can use?
Would the libbeat code have been called yet at this point? I'm wondering though if there's any risk that the libbeat code could kick in concurrently and panic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a Get method on the monitoring.Registry which I think you can use?
roger, I have to cast to *monitoring.String
, but it appears to work (the return value is Var
interface type, it seems, which only exposes a single Visit
method).
Would the libbeat code have been called yet at this point? I'm wondering though if there's any risk that the libbeat code could kick in concurrently and panic.
The registry code is called synchronously, before *beater.Run
, so it should always be set up. The Set
method on the *monitoring.String
could be called at the same time, but it has an internal mutex, so access should be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, thanks. I see you removed the test guards, looks much safer to me now.
💚 Build Succeeded
Expand to view the summary
Build stats
Test stats 🧪
🤖 GitHub commentsTo re-run your PR in the CI, just comment with:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally LGTM, but I'm a bit worried about the timing of the var registration. Is there any possibility that the libbeat code would be called after ours, and cause a panic?
beater/beater.go
Outdated
if elasticsearchRegistry.Get(clusterUUID) != nil { | ||
elasticsearchRegistry.Remove(clusterUUID) | ||
} | ||
clusterUUIDRegVar := monitoring.NewString(elasticsearchRegistry, clusterUUID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a Get
method on the monitoring.Registry
which I think you can use?
Would the libbeat code have been called yet at this point? I'm wondering though if there's any risk that the libbeat code could kick in concurrently and panic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
/test |
2 similar comments
/test |
/test |
This pull request is now in conflicts. Could you fix it @stuartnelson3? 🙏
|
don't return a nil error when there was an http status issue
registries are created when the cluster_uuid callback is registered, which doesn't happen under elastic-agent because of the config being delivered later.
(cherry picked from commit ecf0df2)
(cherry picked from commit ecf0df2) # Conflicts: # changelogs/head.asciidoc
(cherry picked from commit ecf0df2) Co-authored-by: stuart nelson <[email protected]>
(cherry picked from commit ecf0df2) Co-authored-by: stuart nelson <[email protected]>
Verified with 9bb5e4c: ./apm-server -E output.elasticsearch.username=admin -E output.elasticsearch.password=changeme -E apm-server.kibana.host=http://localhost:5601 -E apm-server.kibana.username=admin -E apm-server.kibana.password=changeme -E apm-server.kibana.enabled=true -E apm-server.expvar.enabled=true -E http.enabled=true
...
$ curl -s http://localhost:5066/state | jq '.outputs.elasticsearch'
{
"cluster_uuid": "nE_NVzmNRAWVnk5BzWVSJw"
} |
Motivation/summary
stack monitoring ui in cloud depends on the beat module of metricbeat, which requires apm-server to know its
cluster_uuid
in order for the documents to be indexed. The apm-server only gets its cluster_uuid as a callback to making an initial connection with elasticsearch.When using datastreams, the initial connection isn't made until it attempts to index a document, which means that stack monitoring ui is broken for apm-server until a document is indexed, even though nothing may be wrong with the apm-server.
Copy the request made in libbeat and execute it as a precondition to get the cluster_uuid as soon as the server is ready.
Checklist
Documentation has been updatedHow to test these changes
apm-server -e -E apm-server.data_streams.enabled=true -E http.enabled=true
/state
internal metrics endpoint:curl -s http://localhost:5066/state | jq '.outputs.elasticsearch'