Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TransactionScope memory usage grows endlessly #1419

Closed
karlra opened this issue Nov 3, 2019 · 18 comments
Closed

TransactionScope memory usage grows endlessly #1419

karlra opened this issue Nov 3, 2019 · 18 comments

Comments

@karlra
Copy link

karlra commented Nov 3, 2019

I noticed when doing some performance testing of my code that my usage of TransactionScope resulted in huge memory usage that just grew, never shrank, even after GC. At first I thought it was the SqlClient but in the end I boiled it down to the TransactionScope class and the TransactionScopeAsyncFlowOption.Enabled option which is used to support async/await with TransactionScope. I use this option everywhere to allow for SQL transactions to flow correctly with async code.

I am not entirely sure if it's me who is misunderstanding the debugger's output but running the code below leaves me with multi-gigabyte memory usage according to both the debugger and the task manager. It looks like a memory leak to me but I'm not an expert in these things, so I thought I'd report it.

There are two versions below, the first one with TransactionScopeAsyncFlowOption.Enabled grows its memory usage constantly and the other doesn't.

using System.Transactions;

namespace TestTrxScope
{
    class Program
    {
        static void Main(string[] args)
        {
            while(true)
            {
                using(var scope = CreateTransactionScope())
                {
                    scope.Complete();
                }
            }
        }

        public static TransactionScope CreateTransactionScope()
        {
            //Memory grows constantly
            return new TransactionScope(TransactionScopeAsyncFlowOption.Enabled);

            //This version doesn't constantly grow in memory usage
            //return new TransactionScope();
        }
    }
}
@karlra
Copy link
Author

karlra commented Nov 3, 2019

I'm having this issue on .net core 2.1 on Windows, but I tried it with .net core 3.0 and it happens there as well.

@karlra
Copy link
Author

karlra commented Nov 3, 2019

Also, running it for a while and then attempting to take a memory snapshot in the debugger crashes the debugger. If you do it very quickly instead, you get a snapshot but the memory usage of the shown object doesn't add up to anything close to what is reported in the graph.

@scalablecory
Copy link
Contributor

Confirmed. Thanks for the simple repo!

@StephenBonikowsky
Copy link
Member

@karlra Thanks for reporting the issue we will investigate.

@karlra
Copy link
Author

karlra commented Nov 20, 2019

Would it be possible to get an update on this? Performance gets exponentially worse as time goes on, even after a few million SQL queries we need to restart the service. In a testing scenario with an in-memory backend it's even worse (we do a lot of statistical testing) and performance is down 10-100x. Very disappointing to see this scheduled for 5.0. This class is the only way as far as I can tell to use SQL transactions in an asynchronous context so it must be very important but doesn't seem to be treated as such.

@StephenBonikowsky
Copy link
Member

@dasetser Have you had a chance to look into this yet?

@dasetser
Copy link

I haven't had a chance to investigate the cause of this yet. Once we understand the root cause we can consider whether this is a candidate for servicing, but that will depend on the details of the fix (how big/risky it ends up being).

@karlra
Copy link
Author

karlra commented Dec 15, 2019

I am beginning to feel like the cable guy here but this is really important to me and I did some further testing:

This bug does not happen when taking the exact same program and targeting .net framework 4.6.

Targeting .net core 3.1 and running perfview:

image

Drilling into it, you can see hundreds and hundreds of thousands of these:

image

Sorry I'm just speculating here as a complete amateur but is it in any way related to this?

https://twitter.com/maoni0/status/1206010859094306816

@danmoseley
Copy link
Member

@mateoatr for that.

@StephenBonikowsky StephenBonikowsky transferred this issue from dotnet/corefx Jan 7, 2020
@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added the untriaged New issue has not been triaged by the area owner label Jan 7, 2020
@StephenBonikowsky StephenBonikowsky added this to the 5.0 milestone Jan 7, 2020
@StephenBonikowsky StephenBonikowsky added area-System.Transactions bug and removed untriaged New issue has not been triaged by the area owner labels Jan 7, 2020
@maryamariyan maryamariyan added the untriaged New issue has not been triaged by the area owner label Feb 23, 2020
@StephenBonikowsky StephenBonikowsky removed the untriaged New issue has not been triaged by the area owner label Feb 24, 2020
@reach4thelasers
Copy link

Hi there, just to chip in, I am presently experiencing this issue with .NET 4.7. My production server is using 98% of its 4GB of RAM. And the server process is also page-faulting into a 2GB swap file. I had got over 4GB of leaked Transaction related objects.

I've rebooted my server a week ago. And cleaned it out. It is already back to 1.5GB. Here's a memory profile snapshot.

mem-leak

mem-leak2

@HongGit HongGit modified the milestones: 5.0.0, 6.0.0 Jul 13, 2020
@karlra
Copy link
Author

karlra commented Sep 14, 2020

Now that 5.0 is marked as RC, what is the status of this? The RC post asked for critical bugs and this is one is pretty serious in my opinion.

@mconnew
Copy link
Member

mconnew commented Nov 25, 2020

Insomnia and curiosity lead me to take a look at this issue. There's no memory leak bug in System.Transactions here. I modified the sample code to the following and memory didn't grow:

    class Program
    {
        static void Main(string[] args)
        {
            int i = 0;
            while (true)
            {
                using (var scope = CreateTransactionScope())
                {
                    scope.Complete();
                }
                i++;
                if (i == 100000)
                {
                    GC.Collect(2);
                    i = 0;
                }
            }
        }

        public static TransactionScope CreateTransactionScope()
        {
            //Memory grows constantly
            return new TransactionScope(TransactionScopeAsyncFlowOption.Enabled);
     }

Forcing a Gen2 GC collection every 100,000 iterations left my test process sitting at a stable 30mb heap. Basically things are being stored in a ConditionalWeakTable to keep track of the transaction. It's not being removed from the table, but it is dereferenced so the GC will collect it. It does look like there might be an issue of failing to remove it from the table as the code here does look like it attempts to remove it. So there's probably a bug around failure to remove the transaction from the table, but as it's a ConditionalWeakTable that won't cause an actual memory leak.

I can't say anything as to why the GC isn't doing a Gen2 on it's own and just letting the process grow to a large size.

@karlra
Copy link
Author

karlra commented Nov 25, 2020

Yes, you are right. I also modified the code to measure the time it takes to run 1m TransactionScopes, and when doing the GC.Collect as you suggested this is the output:

1m took 1,0882223
1m took 1,198557
1m took 1,2591079
1m took 1,4160670999999998
1m took 1,5305853000000003
1m took 1,6643583999999993
1m took 1,6032577000000003
1m took 1,6249842
1m took 1,5893472000000006
1m took 1,6326681
1m took 1,6352399999999978
1m took 1,7689416999999992
1m took 1,7557389000000008
1m took 1,7933212999999988
1m took 1,750688199999999
1m took 1,8079289000000003
1m took 1,7508125999999997
1m took 1,8007067000000028
1m took 1,7506499999999967
1m took 1,7687377000000026
1m took 1,739029600000002
1m took 1,7691805999999985
1m took 1,7244062999999983
1m took 1,7626253000000034
1m took 1,7263935999999944
1m took 1,759808300000003
1m took 1,7292494999999946
1m took 1,7460412000000005
1m took 1,7245514000000028
1m took 1,7523978999999983
1m took 1,7470739999999978
1m took 1,8428781999999941
1m took 1,7403716999999972

So it does become about 75% slower as the number of transactions grow (why?) but not as bad as without the manual GC.

Without the manual GC, this is the output:

1m took 1,0390759
1m took 1,1739017999999999
1m took 1,3381367000000002
1m took 1,4325771000000005
1m took 1,6844288
1m took 1,9273474000000004
1m took 1,9217969999999998
1m took 2,177542599999999
1m took 2,7613198000000008
1m took 2,3517020000000013
1m took 2,8539642
1m took 3,6528396
1m took 3,4112449
1m took 3,0104454999999994
1m took 3,0950014000000046
1m took 4,490170499999998
1m took 5,281133099999998
1m took 5,820333699999999
1m took 4,301628800000003
1m took 3,8935314000000005
1m took 4,181501999999995
1m took 6,046489000000001
1m took 7,229827499999999
1m took 7,903539099999989
1m took 7,684331600000007
......grows forever

So it becomes progressively slower, which I assume might be because this table grows and grows?

@mconnew
Copy link
Member

mconnew commented Jan 8, 2021

I'm going to close this issue as there's nothing to be fixed here. The only potential change that could be made is to make sure to remove the reference from the ConditionalWeakTable, but that won't make any difference. As the only reference is in the ConditionalWeakTable, that reference isn't keeping the object alive. If it was removed from the table, the unreferenced object is still using space in the heap. The solution is for the GC to run, and there's nothing in TransactionScope which is affecting that. The slowness is likely caused by growing the arrays that are backing the table. Once things hit equilibrium (GC kicks in because of memory pressure), as it's effectively a dictionary in it's structure, there shouldn't be a performance hit for putting things in table. You might want to open a separate issue to investigate why the GC isn't getting triggered.

@mconnew mconnew closed this as completed Jan 8, 2021
@reach4thelasers
Copy link

Sorry but there's a complete lack of fundamental computer science knowledge in this last comment. Any computer science graduate knows that abstract data structures like dictionaries are, at a fundamental level, implemented as arrays (sequential blocks of memory)

The statement: "as it's effectively a dictionary in its structure, there shouldn't be a performance hit for putting things in table"

This is only true if the size of the dictionary is defined up-front. Should you try to add something to a dictionary that exceeds its pre-defined size. An entirely new array needs to be created (under-the-hood) - this requires a sequential copy of the prior array into a new (bigger) array. The Insert complexity of adding to a Dictionary is O(1) - O(n) for this reason..... When the dictionary is allowed to grow and grow - it pretty much becomes O(n).

Note the private method "Resize" in the dictionary code. https://github.com/microsoft/referencesource/blob/master/mscorlib/system/collections/generic/dictionary.cs

Sorry but this needs to be re-opened and fixed. There should be code to manage and clean-up these transactions rather than relying on ignorance about .NET's abstract data structures - and laziness (let the garbage collector deal with it).

Take responsibility for your code!

I'm going to close this issue as there's nothing to be fixed here. The only potential change that could be made is to make sure to remove the reference from the ConditionalWeakTable, but that won't make any difference. As the only reference is in the ConditionalWeakTable, that reference isn't keeping the object alive. If it was removed from the table, the unreferenced object is still using space in the heap. The solution is for the GC to run, and there's nothing in TransactionScope which is affecting that. The slowness is likely caused by growing the arrays that are backing the table. Once things hit equilibrium (GC kicks in because of memory pressure), as it's effectively a dictionary in it's structure, there shouldn't be a performance hit for putting things in table. You might want to open a separate issue to investigate why the GC isn't getting triggered.

@mconnew
Copy link
Member

mconnew commented Jan 15, 2021

@reach4thelasers, if you read my comment a little more closely, I said this (emphasis added):

Once things hit equilibrium (GC kicks in because of memory pressure), as it's effectively a dictionary in it's structure, there shouldn't be a performance hit for putting things in table.

Personal insults don't compensate for reading comprehension. FYI, a dictionary is generally implemented as an array of linked lists to account for collisions. Although you can theoretically implement it using the algorithm of placing the item in the next available slot, this, this would rely on the GetHashCode() function caching it's value as when you use a pure array approach instead of the array of linked lists, if the immediate key doesn't match, you need to keep searching in the array until you either find the matching key, or a key with a non-matching hash code, or a null. This requires querying for the hash code from the stored key repeatedly. It also causes more work for deletions as instead of fixing up head and tail references in a linked list, you need to move items in the array to make sure you don't leave a null effectively making other keys with the correct hash unfindable. Otherwise a Contains() or key lookup would devolve to O(n). It likely can anyway as searching for an entry would require you to keep looking until you find a matching hash, then stop finding a matching hash (because previous values could overflow into your spot), or you find a null. Generally this type of algorithm is reserved for situations where you are more constrained for RAM than you are CPU as it's really quite inefficient.

You might not be aware, but the implementation of .NET Core has deviated from reference source significantly. It would be more useful to check the correct implementation here or if you want navigatable version you could use this here. But both of those are irrelevant here as this isn't a Dictionary, it's a ConditionalWeakTable so the source code of Dictionary doesn't matter. ConditionaryWeakTable uses an array of linked lists as defined here.

So back to my point about equilibrium. Most inserts into a dictionary are O(1). It's only when you do an insert which requires the backing array to grow that it becomes O(n). ConditionalWeakTable implements a growth algorithm of doubling it's size when full, which is a more aggressive growth than Dictionary does as that grows to the next prime number bigger than the current count. It starts with an initial capacity of 8. As it doubles in size each time, the number of times it needs to double is Log(n), and it does n work when it does this, so the total amount of work for resizing of a ConditionalWeakTable is n*Log(n), but that's amortized over n insertions, so the cost per insertion is (n*Log(n) ) / n which just comes to Log(n). So presuming infinite growth the insertion cost is O(Log(n)) which is quite a bit less than O(n). And that's if there's infinite insertion. Presuming you are running on a system with finite memory and not on a theoretical Turing machine with an infinite tape, at some point the GC kicks in and space is freed up in the ConditionalWeakTable. As the ConditionalWeakTable doesn't have code to shrink the backing store, at this point all inserts are O(1). This is what I meant by equilibrium. Maybe I should have been more clear about presuming you are running on a non-theoretical machine with actual hardware but I thought I was covered by clarifying in parenthesis "(GC kicks in because of memory pressure)".

I hope this fulfilled your desire for fundamental computer science knowledge in a comment and maybe refreshed your memory a little on big-O analysis and Dictionary implementations.

@karlra
Copy link
Author

karlra commented Jan 15, 2021

Thanks for the explanation, @mconnew . But the fact of the matter is that even if the analysis suggests there is nothing wrong here, the GC doesn't run and the RAM usage grows into the 20 gigs size before my computer grinds to a halt. As far as I understand it the default recommendation from Microsoft around manual GC is "don't do it".

Where would I open a ticket about the GC failing to collect in this case? I am an app developer who has a problem with a central part of transaction management, I am not well versed on GC internals and I am guessing that the average developer isn't either. The behavior displayed by the system here is clearly not as intended so I am happy to chase it up further if you can point me in the right direction.

@mconnew
Copy link
Member

mconnew commented Jan 15, 2021

You are right, except in very narrow cases (near real time where you disable GC for short periods is the only one I'm aware of), if you have to manually run GC then there's something wrong. This isn't one of those scenarios, so something is wrong. GC issues can be opened in this repo as everything has be combined under one repo. I was investigating another memory leak issues for WCF a few weeks ago and the GC was kicking in regularly after the process was at about 80MB and my laptop sits with about 16GB of free RAM most of the time so it's definitely not intended to hold off running GC until it's consumed all your memory. This is why I believe there's some GC misbehavior here somewhere. I know the GC will do background GC where it does the mark part of mark and sweep without a GC pause, so maybe there's something there which is tripping up on the weak references.

@ghost ghost locked as resolved and limited conversation to collaborators Feb 14, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

10 participants