fix(mqtt): rework connection and message tracking #10696
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
My mqtt connection would randomly stall, or disconnect (without any logging on the telegraf side).
It turns out there's was a comedy of bugs all adding up to trigger the behavior.
On first glance, it seemed that telegraf would randomly disconnect and reconnect, just to disconnect again. That led to looking at paho's client management and finding that the plugin really should be making a new client after disconnect (specifically since the plugin is not using auto reconnect).
Through code inspection, I noticed #10687. Sadly, fixing this did not resolve the problem. But...
As part of testing #10684 I noticed that the
m.acc
andm.sem
channels were slowly-but-steadily growing larger over time. With the dedup bug fixed, both channels did not drain to zero as expected. It turns out thatselect
has random behavior when both cases are ready. Since one path exits the for-loop, on average this causesm.acc
to not process, and thus it fills up.TL;DR
So all-together, this change both cleans up the client connection semantics, as well as cleaning up metric tracking.
Required for all PRs:
resolves #10687