-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] cosmos sdk: intermittent leaks detected via netty leak detector #13763
Comments
@David-Noble-at-work @xinlian12 - for your information. |
@tysonnorris - are you using DIRECT mode or GATEWAY mode ? |
We are using GATEWAY mode in our app and my sample code is as well. We cannot easily switch to DIRECT mode so I haven’t tested that.
… On Aug 4, 2020, at 8:30 PM, Kushagra Thapar ***@***.***> wrote:
@tysonnorris - are you using DIRECT mode or GATEWAY mode ?
We suspect this issue is related to Http Client, and I am wondering if you can also reproduce this on just GATEWAY mode?
So as to verify our speculation.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
@tysonnorris - thanks for confirming that, will investigate this soon. |
@tysonnorris - I tried running |
@kushagraThapar sorry I should have mentioned that it is NOT consistent, but generally I don't have problems seeing it at least once in 10 runs. Our more complicated test suites run into it always, but not in the same spot. |
No worries, let me try on the repro then, thanks! |
Hi @kushagraThapar - Any luck reproducing this issue? |
@tysonnorris - not yet, I have tried it multiple times. I am using macOS, which OS are you using ? |
@tysonnorris - I don't think that is the issue. |
Yes I'm just running
and hit it regularly, but not constantly. Here are the mvn/java versions:
|
I'm going to try to make a jar/container as well to run elsewhere, in case there is some issue with accessing cosmosdb over our corp vpn. |
@tysonnorris - earlier I was running the code through intellij, I ran it again around 10 times from command line using
Still can't repro it.. Here is my mvn/java versions. |
I created a Dockerfile to run repeatedly: Running this as a pod in an k8s cluster on azure in east us region, I get much fewer, but not none, cases where the LEAK error appears. e.g. running for last 15 min:
I have pod setup like
|
I will try it out today, thanks for the docker file. |
FWIW, I ran the dockerfile for a few hours in 2 clusters, one in east us azure region, and one in southeast asia azure region.
|
@tysonnorris - is the East US region cross region work load ? |
Not sure what you mean - in both cases, I just deployed the docker image (from the dockerfile linked above) as a pod in a kubernetes cluster. One kubernetes cluster is in east us 2 (sorry, not east us as previously mentioned), and one is in southeast asia. In both cases, the cosmosdb used is configured with write region as east us, and read regions north europe, southeast asia, and australia east. The only "cross region" aspect is that the test pod is running in a different region than the cosmosdb write/read regions. |
@tysonnorris - I see, thanks for the information, I will try testing it against an account with different read and write regions, may be that will help. |
Yes - I tried to make the failing sample based on the quickstart sample to rule out dependency versions etc. If there are other versions you'd like me to try, I'm happy to do that. |
@kushagraThapar any luck reproducing this with the Dockerfile? |
@tysonnorris - no luck so far, I tried reproducing the issue on a windows machine as well, but couldn't reproduce it. |
@kushagraThapar Steps to reproduce the behavior:
Normally the exception arrives within 5-10 runs (runtest.sh executes a while loop) Observation :
Thanks! |
@ac4922 - thanks for the above steps, I will perform them today and will provide you an update. |
@tysonnorris @ac4922 - As part of debugging this issue, I am not able to use the Do you know what changes I need to make to the |
One more thing, not sure if you got the update, but the above issue only happens on Docker container when the memory is limited to 2 GB. If we increase the memory to higher number (memory to 6 GB) - the issue is not reproducible, which is why I am not able to test it on my mac, which has more memory. |
This has been released in azure-cosmos v4.4.0 https://github.com/Azure/azure-sdk-for-java/blob/master/sdk/cosmos/azure-cosmos/CHANGELOG.md#440-2020-09-12 |
Describe the bug
A clear and concise description of what the bug is.
We intermittently receive LEAK errors from netty like
[ERROR] LEAK: ByteBuf.release() was not called before it's garbage-collected.
when using ResourceLeakDetector.setLevel(ResourceLeakDetector.Level.PARANOID);Exception or Stack Trace
Add the exception log and stack trace if available
To Reproduce
Steps to reproduce the behavior:
I created a branch of the getting started sample to demonstrate:
https://github.com/tysonnorris/azure-cosmos-java-getting-started/tree/netty_leakdetector
Run it using:
mvn clean install exec:java@leak
Code Snippet
Add the code snippet that causes the issue.
https://github.com/tysonnorris/azure-cosmos-java-getting-started/blob/netty_leakdetector/src/main/java/com/azure/cosmos/sample/async/AsyncMainLeak.java is a modified version of the
AsyncMain.java
sampleExpected behavior
A clear and concise description of what you expected to happen.
No leaks. We are upgrading from cosmos sdk 2.x where we had some leak issues long ago, so have tests that pass reliably with the paranoid level leak detection level, and those tests don't pass with the cosmos 4.3 sdk.
Screenshots
If applicable, add screenshots to help explain your problem.
Setup (please complete the following information):
com.azure:azure-cosmos:4.3.0
Additional context
Add any other context about the problem here.
Information Checklist
Kindly make sure that you have added all the following information above and checkoff the required fields otherwise we will treat the issuer as an incomplete report
The text was updated successfully, but these errors were encountered: