Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix reference's leak in fast publish of Qos0 and slow subscriber use case #649

Conversation

andsel
Copy link
Collaborator

@andsel andsel commented Feb 3, 2022

Release notes

Fix a message leak in handling PUB qos0 message.

What does this PR do?

Handling PUB message is a multistep process, it regards forwarding the payload/message to appropriate target sessions event queues and notify the registered interceptors.
When the front processing of the message is terminated, i has to free appropriately the message, decrementing the reference count. This PR fixes a missed decrement.

Why is it important/What is the impact to the user?

It fixes a bug that happens when a fast puglisher send QoS0 messagges to a slow subscriber.

How to test this PR locally

To prove run the only test method in FastPublisherSlowSubscriberTest.asFastAsItCan.

@hylkevds
Copy link
Collaborator

hylkevds commented Feb 3, 2022

I'm not seeing any netty leak warnings with the current code without disconnects/reconnects.

As soon as a client disconnects, leaks are expected, since that is the reason issue #608 exists: queues are not cleaned up when sessions are destroyed.
It should already be better if you test on the #648 branch, since it cleans up destroyed sessions.

Edit: I was not patient enough, with only 1 publisher/subscriber it takes a while before garbage collection happens :)
Edit2: Rewriting the Test into a standalone main() class helps, since then it can run in the profiler, allowing me to manually force garbage collection. 👍

@@ -318,6 +318,7 @@ public void unsubscribe(List<String> topics, MQTTConnection mqttConnection, int
}

interceptor.notifyTopicPublished(msg, clientID, username);
// ? ReferenceCountUtil.release(msg);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is the correct fix. A similar release happens for QoS 1 and 2 messages, but was missing for QoS 0.

@andsel
Copy link
Collaborator Author

andsel commented Feb 5, 2022

Hi @hylkevds I missed to pushed the updated suite, now it has 2 tests:

  • one with fixed publish rate, that seldom manifest the problem
  • the as fast as it can that fails constantly
    The tests are a "run forever" so you see the failure only in the console output

@andsel andsel added the bugfix label Feb 5, 2022
@andsel
Copy link
Collaborator Author

andsel commented Feb 5, 2022

Hi @hylkevds this PR fixes the leak, would be great if you can review it.

@andsel andsel changed the title Fix reference's leak in fast publisher slow subscriber use case Fix reference's leak in fast publish of Qos0 and slow subscriber use case Feb 5, 2022
@hylkevds
Copy link
Collaborator

hylkevds commented Feb 5, 2022

Yep, looks good.
The slow one can be reliably triggered into the exception, by manually forcing garbage collection. The reason it normally only seldomly triggers is that that test is so light on memory that it takes ages to fill up the 500MB of memory that Java starts with by default. If you start publisherAtFixedRate() in debug mode, attach a profiler, and hit the "request garbage collection" button, it will throw the Netty exception.

Testing with really big messages may also speed up the exception, since it will cause Netty to reserve much more memory for those leaked buffers!

The release in PostOffice.receivedPublishQos0() fixes it.

@andsel andsel merged commit 9ed6ca9 into moquette-io:main Feb 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants