Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(mqtt): rework connection and message tracking #10696

Merged
merged 2 commits into from
Sep 26, 2022
Merged

fix(mqtt): rework connection and message tracking #10696

merged 2 commits into from
Sep 26, 2022

Conversation

kitlaan
Copy link
Contributor

@kitlaan kitlaan commented Feb 22, 2022

My mqtt connection would randomly stall, or disconnect (without any logging on the telegraf side).
It turns out there's was a comedy of bugs all adding up to trigger the behavior.

  1. Weird connection/disconnection behavior

On first glance, it seemed that telegraf would randomly disconnect and reconnect, just to disconnect again. That led to looking at paho's client management and finding that the plugin really should be making a new client after disconnect (specifically since the plugin is not using auto reconnect).

Reusing a Client is not completely safe. After calling Disconnect please create a new Client (NewClient()) rather than attempting to reuse the existing one (note that features such as SetAutoReconnect mean this is rarely necessary).

  1. Weird data loss (deadlock?)

Through code inspection, I noticed #10687. Sadly, fixing this did not resolve the problem. But...

  1. Metric tracking

As part of testing #10684 I noticed that the m.acc and m.sem channels were slowly-but-steadily growing larger over time. With the dedup bug fixed, both channels did not drain to zero as expected. It turns out that select has random behavior when both cases are ready. Since one path exits the for-loop, on average this causes m.acc to not process, and thus it fills up.

TL;DR

So all-together, this change both cleans up the client connection semantics, as well as cleaning up metric tracking.

Required for all PRs:

resolves #10687

@telegraf-tiger telegraf-tiger bot added the fix pr to fix corresponding bug label Feb 22, 2022
@powersj powersj added the ready for final review This pull request has been reviewed and/or tested by multiple users and is ready for a final review. label Sep 21, 2022
@MyaLongmire MyaLongmire merged commit 2b37d7e into influxdata:master Sep 26, 2022
@kitlaan kitlaan deleted the fix/mqtt-connect branch September 26, 2022 17:44
popey pushed a commit that referenced this pull request Oct 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fix pr to fix corresponding bug ready for final review This pull request has been reviewed and/or tested by multiple users and is ready for a final review.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Deadlock in mqtt_consumer?
3 participants