-
-
Notifications
You must be signed in to change notification settings - Fork 853
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unmanaged pooling MemoryAllocator #1730
Unmanaged pooling MemoryAllocator #1730
Conversation
Might be worth testing trimming solely via GC.AddMemoryPressure & gen2 callback? |
@br3aker I'm not sure if AddMemoryPressure / RemoveMemoryPressure contributes to Gen2 allocation budget or not. Might make sense to try out, but I think the timer is not that expensive, and trimming is configured to be done every 1 minute (mimicking ArrayPool.shared) anyways. |
Codecov Report
@@ Coverage Diff @@
## master #1730 +/- ##
=======================================
Coverage 87% 87%
=======================================
Files 935 944 +9
Lines 49300 49753 +453
Branches 6102 6165 +63
=======================================
+ Hits 43175 43575 +400
- Misses 5115 5157 +42
- Partials 1010 1021 +11
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
Cannot wait to get stuck into reading this! 🤩 |
@JimBobSquarePants been thinking a lot & chatting with folks on the C#/lowlevel discord channel, I'm no longer sure if using unsafe memory is the right thing to do. The problem is that an update to 1.1 may turn minor bugs and performance issues into security errors for users of On the other hand, SkiaSharp merged a similar API without spending a single minute on security concerns: mono/SkiaSharp#1242 |
@antonfirsov That's the I need to do some reading there and maybe see if we can get someone from the runtime team who worked in that area to comment. |
Strictly speaking, the finalizer warning doesn't apply to us since the
First I also thought so, but in fact it can be as simple as: var image = new Image<Rgba32>(w, h); // no using
Span<Rgba32> span = image.GetPixelRowSpan(0); // last use of the object `image`, finalizers may run after this point
// some relatively long running code here to allow the finalizers to finish
span[0] = default; // memory corruption |
I wonder if we could write an analyzer? |
That would be awesome, but I'm afraid they would share the concerns around unsafe memory and push back.
Would it work out of the box just by using the library? Note that it won't help existing users doing a package update without recompilation, and then running into a potential security issue. |
The trick, I think, would be to make the analyzer a dependency of the main library Like Xunit do. Have we made any breaking changes that require recompilation? Maybe we should just to ensure people should rebuild. 👿 |
There is a safe, breaking way to re-implement span accessors by using delegates, inspired by the comment above: public class Image<T>
{
- public Span<T> GetPixelRowSpan(int y);
- public bool TryGetSinglePixelSpan(out Span<TPixel> span);
+ public void ProcessPixels(Action<PixelAccessor<T>> rowProcessor);
+ public bool TryProcessSinglePixelSpan(SpanAction<T> pixelProcessor);
}
+ public ref struct PixelAcceessor<T>
+ {
+ Span<T> GetRowSpan(int y);
+ } The simplest thing would be to go ahead with this breaking change and bump ImageSharp version number to 2.0. The improvements will justify the change. |
2.0 was to be my kill all old target frameworks release. I want to ship a working V1 of Fonts and Drawing before starting work on it. |
That can be 3.0 then. We follow semantic versioning more or less, so no point to be afraid of major version jumps as breaking changes land IMO. |
But I'm also fine with a hard-breaking 1.1, this is more about PR and communication than anything else, However, renaming the milestones seems to be better thing to do for me, we can even benefit out of it. |
My only issue with jumping from 2.0 to 3.0 would that in real terms it would probably occur over a short timespan which, in my opinion does reflect well on the quality. 1.1 would be, by far, my next desired target. This is a massive breaking change though so I'm deeply conflicted. 🙁 |
I have a question regarding Btw amazing work on all this @antonfirsov, I'll also need to find some time to carefully go through all this like James said and have a proper read, as the whole investigation seems super interesting! 🚀 |
After careful consideration. I'm up for a V2 release. It's good opportunity to fix a few things plus we are already adding a significant amount of fixes/functionality to the release so let's make a show of it. |
…or-02 # Conflicts: # src/ImageSharp/Formats/Jpeg/Components/Decoder/JpegComponentPostProcessor.cs
Some more detail on the failures on my machine. They are repeatable by running each theory group.
Message:
Microsoft.DotNet.RemoteExecutor.RemoteExecutionException : Remote process failed with an unhandled exception.
Stack Trace:
Child exception:
Xunit.Sdk.EqualException: Assert.Equal() Failure
Expected: 42
Actual: 224
UniformUnmanagedPoolMemoryAllocatorTests.AllocateGroupAndForget(UniformUnmanagedMemoryPoolMemoryAllocator allocator, Int32 length, Boolean check) line 295
UniformUnmanagedPoolMemoryAllocatorTests.<MemoryGroupFinalizer_ReturnsToPool>g__RunTest|12_0(String lengthStr) line 277
Child process:
SixLabors.ImageSharp.Tests, Version=1.0.0.0, Culture=neutral, PublicKeyToken=d998eea7b14cab13 SixLabors.ImageSharp.Tests.Memory.Allocators.UniformUnmanagedPoolMemoryAllocatorTests Void <MemoryGroupFinalizer_ReturnsToPool>g__RunTest|12_0(System.String)
Child arguments:
600 Message:
Microsoft.DotNet.RemoteExecutor.RemoteExecutionException : Remote process failed with an unhandled exception.
Stack Trace:
Child exception:
Xunit.Sdk.EqualException: Assert.Equal() Failure
Expected: 42
Actual: 0
UniformUnmanagedPoolMemoryAllocatorTests.AllocateGroupAndForget(UniformUnmanagedMemoryPoolMemoryAllocator allocator, Int32 length, Boolean check) line 295
UniformUnmanagedPoolMemoryAllocatorTests.<MemoryGroupFinalizer_ReturnsToPool>g__RunTest|12_0(String lengthStr) line 277
Child process:
SixLabors.ImageSharp.Tests, Version=1.0.0.0, Culture=neutral, PublicKeyToken=d998eea7b14cab13 SixLabors.ImageSharp.Tests.Memory.Allocators.UniformUnmanagedPoolMemoryAllocatorTests Void <MemoryGroupFinalizer_ReturnsToPool>g__RunTest|12_0(System.String)
Child arguments:
1200
Message:
Microsoft.DotNet.RemoteExecutor.RemoteExecutionException : Remote process failed with an unhandled exception.
Stack Trace:
Child exception:
Xunit.Sdk.EqualException: Assert.Equal() Failure
Expected: 128
Actual: 0
NonParallel.<MultiplePoolInstances_TrimPeriodElapsed_AllAreTrimmed>g__RunTest|0_0() line 84
Child process:
SixLabors.ImageSharp.Tests, Version=1.0.0.0, Culture=neutral, PublicKeyToken=d998eea7b14cab13 SixLabors.ImageSharp.Tests.Memory.Allocators.UniformUnmanagedMemoryPoolTests+Trim+NonParallel Void <MultiplePoolInstances_TrimPeriodElapsed_AllAreTrimmed>g__RunTest|0_0() I've discovered that the issue also appears outside of Remote Executor so I can attempt to debug for you. Will let you know how I get on. Question - |
@JimBobSquarePants the naming reflects the old buggy design that was depending on finalization order, will change it when everything else is fixed. IMemoryOwner/MemoryGuard doesn't have a finalizer anymore, it's the associated Lifetime Guards that return things to pools: The parameters of the tests are exercising different cases:
|
@antonfirsov The failing behavior I'm seeing is the memory pool being trimmed after the first allocation because it correctly thinks it's under high pressure (92% of my memory appears to be in use!!). |
…AllAreTrimmed on Mac" This reverts commit 7eaa5ee.
…ps://github.com/SixLabors/ImageSharp into af/UniformUnmanagedMemoryPoolMemoryAllocator-02
@JimBobSquarePants thanks that's super valuable info! I disabled these tests for local runs in 4986c52, that's the best idea I was able to come up with. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@anton I'm happy with this if you are. My debugging suggested that everything was working as expected and I cannot see any issues.
…ps://github.com/SixLabors/ImageSharp into af/UniformUnmanagedMemoryPoolMemoryAllocator-02
I haven't seen this being done in test code, wanted to avoid complicating tests further. I don't expect them to fail unless someone touches the allocator logic.
Yeah. let's merge this and start dogfeeding it, that should detect potential issues faster than sitting further on the PR :) |
Updates to use the new and default `ArrayPool`-based memory allocator (see SixLabors/ImageSharp#1730).
Updates to use the new and default `ArrayPool`-based memory allocator (see SixLabors/ImageSharp#1730).
Updates to use the new and default `ArrayPool`-based memory allocator (see SixLabors/ImageSharp#1730).
UPDATE 2: Ready for review!
OutOfMemoryException
. Consider retryingMarshal.AllocHGlobal
onOutOfMemoryException
after a short wait. DONE: We are blocking the thread on OOM to retry allocations. 32 bit is 2x slower with 20 Threads than 64 bit, but doesn't OOM. The retries alone are not responsible for the 2x slowdown, 32bit runtime seems to work 1.5x slower also with 10 threads, when there are no OOMs.PreferContiguousImageBuffers
, removeMemoryAllocator.MinimumContiguousBlockSizeBytes
.PixelAccessor<T>
and other Pixel processing breaking changes & API discussion #1739 stuffArrayPoolMemoryAllocator
MemoryAllocator.Default
. -- Need to changeMemoryAllocator.Default
, it should be get-only.Prerequisites
Description
This PR introduces
UniformUnmanagedMemoryPoolMemoryAllocator
and sets it as default to fix #1596.UniformUnmanagedMemoryPoolMemoryAllocator functional characteristics
ArrayPool<byte>.Shared
UniformUnmanagedMemoryPool
to allocate 4 MB blocks of discontiguous unmanaged memory, of up to the pool's limitPool size
According to my benchmaks, the pool should scale to the maximum desired size to achieve the best througput. There is no point placing an artificial pool limit, unless there is a physical limitation. I decided to set the maximum pool size to 1/8 th of the available physical memory in 64 bit .NET Core processes. This means that on a 16GB machine the pool can grow as large as 2 GB.
On 32 bit, and other (non-testable) platforms the pool limit is 128 MB.
Trimming
The trimming of the pools is triggered by both Gen 2 GC collect and a timer. (We need the timer since unmanaged allocations don't trigger GC) On high load we trim the entire pool, on low load we trim 50% of the pool every minute.
Finalizers
With
ArrayPoolMemoryAllocator
, if an image is GC-d without being disposed, buffers are never returned to the pool. This means no hard memory leak, but the pools will be eventually exhausted, because the bucket's running index hitting the bucket limit.To avoid this,
MemoryGroup<T>.Owned
andUniformUnmanagedMemoryPool.FinalizableBuffer<T>
have finalizers returning theUnmanagedMemoryHandle
to the pool. This can get tricky, sinceUnmanagedMemoryHandle
is also finalizable:ImageSharp/src/ImageSharp/Memory/Allocators/Internals/UnmanagedMemoryHandle.cs
Lines 58 to 78 in 1a41aaa
I'm moderately concerned about CA2015, but I don't think it applies to us. Dispose will also free the memory used by a span. Touching a span or a pointer to
SkiaSharp
image's memory would be also a bug if the image is finalized.API changes
Resolves #1739
Fixes #1675
Edit: API changes implemented according to #1739.
Benchmarking methodology
To determine these defaults I compared results of LoadResizeSaveParallelMemoryStress runs systematically, typically running them for a varying parameter a couple of times, while fixing all other parameters. I have a bunch of Excel documents comparing the tables, including all of them would be TLDR, but I can present information on request.
Benchmark results
I was benchmarking on a 10 Core (20 Thread) 19-10900X with 64 GB RAM. This means I was able to stress highly parallel workload very extensive allocation pressure.
Here is an median processing time (seconds) of 40 runs of
LoadResizeSaveParallelMemoryStress
("Classic" meansArrayPoolMemoryAllocator
):ImageSharp is about 8% faster with the new default memory allocator.
Results of
LoadResizeSaveStressBenchmarks
BDN benchmark also show 7.5% improvement:VirtualAlloc commit lifetimes graph with ArrayPoolMemoryAllocator
VirtualAlloc commit lifetimes graph with the new allocator, demonstrating the trimming
VirtualAlloc commit lifetimes graph with pool size set to zero
I would be happy to see some expert feedback on this solution, especially for the finalizer tricks.
/cc @Sergio0694 @saucecontrol @br3aker