-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[receiver/k8sclusterreceiver] Report entity state periodically to be immune to event losses and reordering #24413
Comments
10 tasks
tigrannajaryan
changed the title
Report entity state periodically to be immune to event losses and reordering
[receiver/k8sclusterreceiver] Report entity state periodically to be immune to event losses and reordering
Jul 20, 2023
tigrannajaryan
added a commit
to tigrannajaryan/opentelemetry-collector-contrib
that referenced
this issue
Jul 21, 2023
Resolves open-telemetry#24413 This is part 4 of the work to move to entity events-as-log-records in K8s cluster receiver: open-telemetry#19741 Overall design document: https://docs.google.com/document/d/1Tg18sIck3Nakxtd3TFFcIjrmRO_0GLMdHXylVqBQmJA/ Note that metadata.GetMetadataUpdate() computes deltas between the old and new state. If the old and states are equal metadata.GetMetadataUpdate() returns an empty slice. This means periodic polling when no state is changed will NOT result in any additional metadata update calls. So, we expect no changes in the behavior of existing metadata listeners, such as signalfxexporter. As opposed to that entity events are always emitted even if no state is changed. So, this periodic collection will result in periodic emitting of entity events. This is the desirable effect of this PR. ### Testing I was unable to think of a good automated test for this capability that does not require a major refactoring of other code and executes reasonably quickly. For this reason I am not including any tests in this PR. I have tested the functionality manually and can see the entities periodically collected according to the config setting.
dmitryax
pushed a commit
that referenced
this issue
Jul 24, 2023
Resolves #24413 This is part of the work to move to entity events-as-log-records in K8s cluster receiver: #19741 Overall design document: https://docs.google.com/document/d/1Tg18sIck3Nakxtd3TFFcIjrmRO_0GLMdHXylVqBQmJA/ Note that metadata.GetMetadataUpdate() computes deltas between the old and new state. If the old and new states are equal metadata.GetMetadataUpdate() returns an empty slice. This means periodic polling when no state is changed will NOT result in any additional metadata update calls. So, we expect no changes in the behavior of existing metadata listeners, such as signalfxexporter. As opposed to that entity events are always emitted even if no state is changed. So, this periodic collection will result in periodic emitting of entity events. This is the desirable effect of this PR. ### Testing I was unable to think of a good automated test for this capability that does not require a major refactoring of other code and executes reasonably quickly. For this reason I am not including any tests in this PR. I have tested the functionality manually and can see the entities periodically collected according to the config setting. If anyone has ideas about how to write good unit tests for this PR I am open to suggestions.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Entity state is represented as log records and delivered via regular log pipeline and exporter queue. The events in the queue may be easily reordered. Events may be also lost if queue is overflowed.
To ensure entity's state is eventually correctly delivered to the destination the design requires that entity state event is periodically reported even if the entity does not change.
This requires k8sclusterreceiver to perform periodic collection (in addition to on-change) of K8s metadata.
Note that it is a rare situation for event reordering to result in incorrect state at the destination. The same entity has to change twice quickly enough to end up in the same or immediately adjacent batch of events, then batches end up being reordered in the exporter queue (or if in the same batch, the batch is split and then reordered). This is possible but is going to be quite rare so we are ok that when this happens it takes a few minutes to be corrected.
Because of this I think a reasonable default interval is probably around 5-10min. This should ensure that the increased load from repetition is reasonably small yet any errors are corrected quickly enough.
The text was updated successfully, but these errors were encountered: