-
Notifications
You must be signed in to change notification settings - Fork 525
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flush aggregator on shutdown #3971
Conversation
💚 Build SucceededExpand to view the summary
Build stats
Test stats 🧪
Steps errorsExpand to view the steps failures
|
Change the publisher to wait for enqueued events to be published when Stop is called, up to the configured ShutdownTimeout.
Remove the context param from Aggregator.Run, and instead add a Stop method which accepts a context. When Stop is called it directs Run to exit after performing one final publication. Stop waits for Run to exit, or for the context to be cancelled, whichever comes first. We call the Stop method with a timeout based on the `apm-server.shutdown_timeout` configuration.
publisher.Stop now accepts a context, which is used to interrupt waiting for published events to be acknowledged. This is implemented by introducing a custom beat.ACKer.
* publish: publish enqueued events on shutdown Change the publisher to wait for enqueued events to be published when Stop is called, up to the configured ShutdownTimeout. * aggregation/txmetrics: graceful shutdown Remove the context param from Aggregator.Run, and instead add a Stop method which accepts a context. When Stop is called it directs Run to exit after performing one final publication. Stop waits for Run to exit, or for the context to be cancelled, whichever comes first. We call the Stop method with a timeout based on the `apm-server.shutdown_timeout` configuration. * tests/system: add test for flushing aggregations * publish: add context to publisher.Stop publisher.Stop now accepts a context, which is used to interrupt waiting for published events to be acknowledged. This is implemented by introducing a custom beat.ACKer. * Move publisher ShutdownTimeout handling to beater * Update changelog
* publish: publish enqueued events on shutdown Change the publisher to wait for enqueued events to be published when Stop is called, up to the configured ShutdownTimeout. * aggregation/txmetrics: graceful shutdown Remove the context param from Aggregator.Run, and instead add a Stop method which accepts a context. When Stop is called it directs Run to exit after performing one final publication. Stop waits for Run to exit, or for the context to be cancelled, whichever comes first. We call the Stop method with a timeout based on the `apm-server.shutdown_timeout` configuration. * tests/system: add test for flushing aggregations * publish: add context to publisher.Stop publisher.Stop now accepts a context, which is used to interrupt waiting for published events to be acknowledged. This is implemented by introducing a custom beat.ACKer. * Move publisher ShutdownTimeout handling to beater * Update changelog
successfully tested;
|
Motivation/summary
Flush aggregated transaction metrics on shutdown, to minimise data loss. This could be important in a rolling upgrade.
There are some significant, and necessary, changes to the publisher to support an interruptible graceful shutdown-with-timeout. The publisher's Stop method now takes a context, and waits until the context is cancelled, or the published events are acknowledged, before returning. This is all necessary because the shutdown timeout, from the user's point of view, should encompass the entire shutdown procedure -- not just closing the pipeline client.
Checklist
I have considered changes for:
- [ ] documentation- [ ] logging (add log lines, choose appropriate log selector, etc.)- [ ] metrics and monitoring (create issue for Kibana team to add metrics to visualizations, e.g. Kibana#44001)- [ ] telemetry- [ ] Elasticsearch Service (https://cloud.elastic.co)- [ ] Elastic Cloud Enterprise (https://www.elastic.co/products/ece)- [ ] Elastic Cloud on Kubernetes (https://www.elastic.co/elastic-cloud-kubernetes)How to test these changes
Related issues
Closes #3789