Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[receiver/k8sclusterreceiver] Report entity state periodically to be immune to event losses and reordering #24413

Closed
Tracked by #19741
tigrannajaryan opened this issue Jul 20, 2023 · 0 comments · Fixed by #24434

Comments

@tigrannajaryan
Copy link
Member

tigrannajaryan commented Jul 20, 2023

Entity state is represented as log records and delivered via regular log pipeline and exporter queue. The events in the queue may be easily reordered. Events may be also lost if queue is overflowed.

To ensure entity's state is eventually correctly delivered to the destination the design requires that entity state event is periodically reported even if the entity does not change.

This requires k8sclusterreceiver to perform periodic collection (in addition to on-change) of K8s metadata.

Note that it is a rare situation for event reordering to result in incorrect state at the destination. The same entity has to change twice quickly enough to end up in the same or immediately adjacent batch of events, then batches end up being reordered in the exporter queue (or if in the same batch, the batch is split and then reordered). This is possible but is going to be quite rare so we are ok that when this happens it takes a few minutes to be corrected.

Because of this I think a reasonable default interval is probably around 5-10min. This should ensure that the increased load from repetition is reasonably small yet any errors are corrected quickly enough.

@tigrannajaryan tigrannajaryan changed the title Report entity state periodically to be immune to event losses and reordering [receiver/k8sclusterreceiver] Report entity state periodically to be immune to event losses and reordering Jul 20, 2023
tigrannajaryan added a commit to tigrannajaryan/opentelemetry-collector-contrib that referenced this issue Jul 21, 2023
Resolves open-telemetry#24413

This is part 4 of the work to move to entity events-as-log-records in K8s cluster receiver:
open-telemetry#19741

Overall design document:
https://docs.google.com/document/d/1Tg18sIck3Nakxtd3TFFcIjrmRO_0GLMdHXylVqBQmJA/

Note that metadata.GetMetadataUpdate() computes deltas between the old and
new state. If the old and states are equal metadata.GetMetadataUpdate() returns
an empty slice. This means periodic polling when no state is changed will
NOT result in any additional metadata update calls. So, we expect no changes
in the behavior of existing metadata listeners, such as signalfxexporter.

As opposed to that entity events are always emitted even if no state is
changed. So, this periodic collection will result in periodic emitting
of entity events. This is the desirable effect of this PR.

### Testing

I was unable to think of a good automated test for this capability that does
not require a major refactoring of other code and executes reasonably quickly.
For this reason I am not including any tests in this PR. I have tested the
functionality manually and can see the entities periodically collected
according to the config setting.
dmitryax pushed a commit that referenced this issue Jul 24, 2023
Resolves
#24413

This is part of the work to move to entity events-as-log-records in K8s
cluster receiver:
#19741

Overall design document:

https://docs.google.com/document/d/1Tg18sIck3Nakxtd3TFFcIjrmRO_0GLMdHXylVqBQmJA/

Note that metadata.GetMetadataUpdate() computes deltas between the old
and new state. If the old and new states are equal
metadata.GetMetadataUpdate() returns an empty slice. This means periodic
polling when no state is changed will NOT result in any additional
metadata update calls. So, we expect no changes in the behavior of
existing metadata listeners, such as signalfxexporter.

As opposed to that entity events are always emitted even if no state is
changed. So, this periodic collection will result in periodic emitting
of entity events. This is the desirable effect of this PR.

### Testing

I was unable to think of a good automated test for this capability that
does not require a major refactoring of other code and executes
reasonably quickly. For this reason I am not including any tests in this
PR. I have tested the functionality manually and can see the entities
periodically collected according to the config setting.

If anyone has ideas about how to write good unit tests for this PR I am
open to suggestions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant