Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unified Pipeline/output metrics #4663

Merged
merged 3 commits into from
Jul 13, 2017
Merged

Conversation

urso
Copy link

@urso urso commented Jul 13, 2017

This PR adds metrics support to the publisher pipeline + unifies the metrics used by outputs.

The metrics are registered dynamically and can be removed later on (e.g. pipeline already removes metrics after on close).

Metrics support is somewhat standardised and decoupled from outputs/publisher pipeline, by having some kind of event listener/observer/... object defining a set of common events. On every event from the outputs, the metrics (potentially multiple metrics) are updated accordingly.

The original per output type metrics have been removed (no more output.elasticsearch... and so on), in favor of a standardized set of metrics.

The observer is passed to the outputs and publisher pipeline. This is used to collect metrics for different pipeline instances (xpack and libbeat namespace).

pipeline metrics:

pipeline.clients: number of beat.Client instances (internal connections to pipeline)
pipeline.events.total: total number of events processed by a client
pipeline.events.filtered: total number of events removed by processors
pipeline.events.published: total number of events pushed to the queue/broker
pipeline.events.failed: total number of events failed to be pushed to queue (e.g. disconnect)
pipeline.events.dropped: total number of events dropped
pipeline.events.retry: total number of events retried
pipeline.queue.acked: total number of events ACKed by the event queue/buffer
pipeline.events.active: (gauge) number of active events in pipeline

output metrics:

output.type: configured output type (logstash, elasticsearch, ...)
output.events.batches: total number of batches processed by output
output.events.total: total number of events processed by output
output.events.acked: total number of events ACKed by output
output.events.failed: total number of events failed in output
output.events.active: (gauge) events sent and waiting for ACK/fail from output
output.write.bytes: total amount of bytes written by output
output.write.errors: total number of I/O errors on write
output.read.bytes: total amount of bytes read
output.read.errors: total number of I/O errors while waiting for response on output

@urso urso added the review label Jul 13, 2017
@urso urso mentioned this pull request Jul 13, 2017
22 tasks
Copy link
Contributor

@ruflin ruflin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great change. It heavily simplifies the internal and external handling of the metrics. Makes it very easy to filter by one output type etc.

Makes we wonder, if there are other places where we should do the same ;-)

next := math.Float64bits(math.Float64frombits(cur) + delta)
if atomic.CompareAndSwapUint64(&v.f, cur, next) {
if v.f.CAS(cur, next) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tricky abbreviation. only know what it means because of the diff

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh... to me CAS on atomics is pretty well known/common :)

@ruflin
Copy link
Contributor

ruflin commented Jul 13, 2017

@urso The wait_shutdown_ok test seems to fail, which worries me a bit. But not related to this PR I think.

@urso
Copy link
Author

urso commented Jul 13, 2017

@ruflin checked the test. The test is completely unrelated to this PR. Although the test is failing (cause the shutdown timer times out), I verified the state in the registry still matches expectations.

urso added 3 commits July 13, 2017 13:24
- use libbeat/common/atomic package
- add monitoring.Uint type
- report pipeline metrics on:
  - libbeat.pipeline....
  - xpack.monitoring.pipeline...
@urso urso force-pushed the pipeline/metrics branch from 598b5c1 to a3adc49 Compare July 13, 2017 11:25
@ruflin ruflin merged commit 573eb67 into elastic:master Jul 13, 2017
@urso urso deleted the pipeline/metrics branch February 19, 2019 18:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants