Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential memory leak of DiagnosticListener #1867

Closed
forik opened this issue May 13, 2020 · 15 comments
Closed

Potential memory leak of DiagnosticListener #1867

forik opened this issue May 13, 2020 · 15 comments

Comments

@forik
Copy link

forik commented May 13, 2020

Creating a separate issue per suggestion from #1678

Describe your environment. Describe any aspect of your environment relevant to the problem:

  • SDK version: 2.14.0
  • .NET runtime version (.NET or .NET Core, TargetFramework in the .csproj file): netcoreapp3.1
  • Hosting Info (IIS/Azure WebApps/etc): Azure AppService
  • Platform and OS version: Windows

What is the expected behavior?
GC able to reclaim memory

What is the actual behavior?
Number of DiagnosticSubscriptions grows in time without decreases.

Additional context.
After updating to 2.14 from 2.13.1 situation became better, but still DiagnosticSubscriptions count grows constantly. Included output from dotnet-gcdump is for approximately 12 hour period.
dumps.zip

@forik forik added the bug label May 13, 2020
@forik forik changed the title Potential memory leak from DiagnosticListener Potential memory leak of DiagnosticListener May 13, 2020
@rajkumar-rangaraj
Copy link
Member

@forik I reviewed the trace and could see a possible leak. Could you please help with the following data.

  • Execute below query at following time interval
    • First-time: process start time and the time when 1.gcdump was collected
    • Second-time: between the time 1.gcdump and 2.gcdump
dependencies 
| summarize count() by type

Based on my initial research, Azure based back-end call may be contributing to this issue and wanted to understand from the data.

It would also help if you have a repro for this issue so that I could recreate an issue and collect data.

@forik
Copy link
Author

forik commented May 19, 2020

@rajkumar-rangaraj Below are data not from the exact timings.

This one after one hour work after deploy:
image

This one after two hours after previous query:
image

Regarding to repro, it is not an easy task though, but I'll try to do my best.

@rajkumar-rangaraj
Copy link
Member

@forik I created a sample project to send/retrieve data from service bus. But I could not recreate this issue. In my scenario IndividualDiagnosticSourceListener object count always stays at 1.

Below snapshot is the growth of objects for your app from the GC dump.
image

@forik
Copy link
Author

forik commented May 22, 2020

@rajkumar-rangaraj could you please share the project so I could add the patterns we use with service bus? Probably we are doing something wrong

@cijothomas
Copy link
Contributor

Please sync and make sure both are testing the same version of ServiceBus SDK, and DIagnosticSource nuget version as well.
@forik can you temporarily disable every dependency except service bus, and see if the issue occurs. this helps us atleast narrow down to service bus dependency collection.

@forik
Copy link
Author

forik commented May 26, 2020

@cijothomas @rajkumar-rangaraj I'm using Microsoft.Azure.ServiceBus 4.1.3 and I don't explicit dependency on System.Diagnostics.DiagnosticSource, so I guess it comes as a transitive dependency

@Rayzbam
Copy link

Rayzbam commented Sep 22, 2020

Any update on this issue ? We encounter the same issue in .Net Core 3.1(.8) ..

@peterbomers
Copy link

We encounter the same issue.
PerfView shows the following from a memory dump:
afbeelding

@Rayzbam
Copy link

Rayzbam commented Sep 28, 2020

@peterbomers Hi Peter,
We just found out the issue. We had an issue about our instances number of "BlobContainerClient".
We instanciated it every time we had to get or delete a blob..
Maybe you just need to check your code for some similar issue.
PS : BlobContainerClient is not marked as IDisposable or something, so we just created one each time we needed it ... Now I just saved it in a singleton for all my use cases.
Hope it can help you ..

@peterbomers
Copy link

@peterbomers Hi Peter,
We just found out the issue. We had an issue about our instances number of "BlobContainerClient".
We instanciated it every time we had to get or delete a blob..
Maybe you just need to check your code for some similar issue.
PS : BlobContainerClient is not marked as IDisposable or something, so we just created one each time we needed it ... Now I just saved it in a singleton for all my use cases.
Hope it can help you ..

@Rayzbam thank you for the information.
We're not using a BlobContainerClient but maybe a different Azure client. I will check it!

@peterbomers
Copy link

@Rayzbam
Your suggestion did point us in the right way.
It's not only the BlobContainerClient leaking the memory, also other Azure service suffer from this problem.
Our health endpoint constructed a CertificateClient and a SecretClient to verify the connection to KeyVault.
Each request created 2 new instances and both kept a reference to the AzureSdkDiagnosticListenerSubscriber in memory. (see the picture)

We also fixed it with a singleton instance but now I'm a little bit concerned about all other Azure services.
We use a lot transient Azure services :(

image

@Rayzbam
Copy link

Rayzbam commented Sep 30, 2020

@peterbomers I'm glad it helped you !
I checked the source code of Azure and you can find many DiagnosticListener instanciated in some internal stuff.
It seems pretty hard to find out which Azure service keeps a reference on one or many instances of this..
I can only advice you to do some load tests on your app(s) and analyze the result dump in a first place..
I think BlobContainerCient or CertificateClient want to be used as Singleton, but can't find any documentation about it.. I think we just have to discover it by ourself..
This issue happened on a "small" app and it was pretty quick to replace all transient actually, i guess it's not the case of everyone in this situation.

TL;DR : Before using an Azure service, just be sure you know how to use it..

@Marusyk
Copy link

Marusyk commented Jan 14, 2021

any updates? have the same with DataLake

@pakrym
Copy link
Contributor

pakrym commented Jan 26, 2021

This is indeed a bug, we do recommend keeping clients as singletons (https://devblogs.microsoft.com/azure-sdk/lifetime-management-and-thread-safety-guarantees-of-azure-sdk-net-clients/) but for some instances, it's impractical (e.g. BlobClient). Azure/azure-sdk-for-net#18199 is tracking the work to fix the bug.

Because this component is shared between Azure SDKs as a source every individual SDK would need to update to get a fix. I suspect for most of them it would happen at the beginning of February.

@github-actions
Copy link

This issue is stale because it has been open 300 days with no activity. Remove stale label or comment or this will be closed in 7 days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants