Heap objects with custom allocator and explicit delete #5633

GSPP · 2016-04-15T16:47:01Z

.NET only supports automatic lifetime for managed objects. The GC cleans up. This is fantastic for productivity. Sometimes, developers need tight control over latency, though. The GC can interfere with that goal.

This has been discussed at length in many place. I believe the team is aware of this issue. Although great strides have been made improving the GC this is still an important concern. It is not clear that the GC can ever fully resolve this.

As a workaround we can place data in manually allocated memory and use pointers to access that data. But that data can never be a managed object. I cannot pass that data to other non-aware code. If I want to allocate an unmanaged buffer I cannot pass that buffer as a byte[] to other code. This is terrible for composability.

Please implement unsafe managed objects with user controlled lifetime. Like this:

allocated class SomeData {
 public int X;
 //...
}

SomeData someData = Activator.CreateObject<SomeData>(myCustomHeap);

someData.X = 1234;
DoWork(someData);

Activator.DeleteObject(someData, myCustomHeap); //This!

I can ask the runtime to create and destroy objects on a custom allocator that I provide. An allocator is just a custom class:

abstract class Allocator {
 IntPtr Allocate(IntPtr numberOfBytes);
 void Deallocate(IntPtr address);
}

Using this API developers can manage memory without involving the GC. They can devise their own lifetime schemes.

Benefits:

It's possible to avoid the GC
These are totally normal .NET objects that work like any other object (composability)
Deterministic memory consumption (no need to wait for the GC or trigger it)
Finalizer is called deterministically
If there are no finalizers there is no need to even call DeleteObject. The allocator can destroy all objects in constant time (arena allocation).

The usual perils of unsafe memory management apply:

Need to ensure that there are no leaks and no double-frees.
Cannot reference deleted objects.
Memory corruption can result if contract broken.

This scheme lends itself to arena allocation. A game engine can allocate all per-frame objects in an arena and constant-time delete all of them at frame end. A REST service can arena allocate all data per-request. An XML parser can allocate all temporary buffers (temp strings, etc.) in a per-parse arena.

This proposal achieves very nice integration of unsafe memory management into an otherwise managed application. The idea is that most code is safe and managed but there are performance-critical islands of unmanaged memory that interoperate nicely.

The only CLR change required would be to teach the GC to ignore such custom objects. This could be done through a bit in the object header or based on type. I have left it open whether classes need to be declared as custom-allocated or whether any class can be allocated unsafely.

The text was updated successfully, but these errors were encountered:

benaadams · 2016-04-15T16:55:27Z

Destructible Types? dotnet/roslyn#161

GSPP · 2016-04-15T17:02:08Z

That proposal seems to be safe, automatic resource management. My proposal is unsafe and manual all the way. This is about giving maximum control.

Thanks for pointing out the "near duplicate", though. It is useful to contrast the two. @benaadams @stephentoub

jakobbotsch · 2016-04-15T17:09:25Z

The only CLR change required would be to teach the GC to ignore such custom objects. This could be done through a bit in the object header or based on type.

In any case this will be a very minor change, maybe even no change. The GC can already differ between its own objects and objects it does not own - it is specified in ECMA-335 that this must be allowed:

class Program
{
    int _value;

    static unsafe void Main(string[] args)
    {
        IntPtr mem = Marshal.AllocHGlobal(4);
        Method(ref *(int*)mem);

        Program p = new Program();
        Method(ref p._value);
    }

    static void Method(ref int value)
    {
        value = 25;
    }
}

Here the GC has to update the managed pointer passed to Method only if it points into an object it owns, which it obviously doesn't in the first case, but does in the second case.

GSPP · 2016-04-15T17:29:51Z

@JanielS I did not even know that ref can do that! Is this supposed to work or a compiler bug? I did not find anything in the spec that specifies what exactly can follow ref. See §5.4 and §7.5.1.

You are right. Here, value could point to anything: Unmanaged memory, stack, heap field, array element. The GC must deal with all of that already.

jakobbotsch · 2016-04-15T17:41:22Z

@GSPP I don't know whether it is supposed to work on the C# side of things, but in the CLI it definitely is. ECMA-335 states:

III.1.1.5.1 Unmanaged pointers
...

Unverified code can pass an unmanaged pointer to a method that expects a managed
pointer. This is safe only if one of the following is true:

The unmanaged pointer refers to memory that is not in memory managed by
the garbage collector.

The unmanaged pointer refers to a field within an object.

The unmanaged pointer refers to an element within an array.

The unmanaged pointer refers to the location where the element following the
last element in an array would be located.

I have personally used this feature in C++/CLI for clean wrapper code that can work with both unmanaged and managed memory (since a pin_ptr pointing to unmanaged memory is specified to work and be ignored by the GC).

GSPP · 2016-04-15T17:58:28Z

Alright. Does this not mean that we can immediately write this allocator system on the current CLR using a tiny C++/CLI library or using ILGenerator to generate tiny helper functions to do this?

This crashes with internal corruption errors, though:

    public static void Main()
    {
        try
        {
            var lib = LoadLibrary("kernel32.dll");
            var x = GetProcAddress(lib, "GetProcAddress");
            Console.WriteLine(x);
        }
        catch (Exception ex)
        {
            Console.WriteLine(ex);
        }
    }

    [StructLayout(LayoutKind.Sequential)]
    class C2 { public int X; }

    [DllImport("kernel32.dll")]
    static extern C2 GetTickCount64();

    [DllImport("kernel32.dll", CharSet = CharSet.Unicode, SetLastError = true)]
    static extern IntPtr LoadLibrary(string libFilename);

    [DllImport("kernel32.dll", CharSet = CharSet.Ansi, SetLastError = true, ExactSpelling = true)]
    [return: MarshalAs(UnmanagedType.LPStruct)]
    static extern C2 GetProcAddress(IntPtr hModule, string methodName);

I tried type-punning the code of some library as a C2 instance. I had hoped that this code would at least limp along to be able to access X (it should read as the first 4 bytes of the x86 code of this native function).

The Activator.Allocate function would need to properly format that memory (object header, zero-init of fields, optionally call ctor). I don't think that can be emulated using code.

mikedn · 2016-04-15T18:01:20Z

ECMA-335 states ...

Unmanaged/managed pointers and object references are not the same thing. It's true that managed pointers can point to unmanaged memory but that doesn't imply that object reference too can do that. I suspect that it is more or less technically possible but I don't think there's anything that allows this in the current ECMA spec.

Please implement unsafe managed objects with user controlled lifetime. Like this: allocated class SomeData

I don't think that this should be a property of the type. Not only that this prevents allocating existing reference types outside of the GC heap but it's quite useless because you can mostly do this today with value types and unsafe code.

jakobbotsch · 2016-04-15T18:09:24Z

@mikedn I'm aware of that. What I'm saying is that the ECMA-335 states that unmanaged pointers can be converted to managed pointers. For this to be supported the CLR has to be able to answer the question I quoted from the feature request - whether the GC owns an object at the specified address.

@GSPP Maybe so. I know you can reinterpret objects with a structure with Explicit layout, however that doesn't quite allow you to examine the object representation. It could probably be done with some TypedReference hacking.

jakobbotsch · 2016-04-15T18:51:59Z

As a POC, this seems to work on desktop CLR:

internal class ArenaAllocator : IDisposable
{
    private readonly IntPtr _mem;
    private IntPtr _cur;

    public ArenaAllocator()
    {
        _mem = Marshal.AllocHGlobal(0x100000);
        _cur = _mem;
    }

    public unsafe T Allocate<T>() where T : class
    {
        *(IntPtr*)_cur = typeof(T).TypeHandle.Value;
        IntPtr ptr = _cur;
        TypedReference reference = default(TypedReference);
        ((IntPtr*)&reference)[0] = (IntPtr)(&ptr);
        ((IntPtr*)&reference)[1] = typeof(T).TypeHandle.Value;

        return __refvalue(reference, T);
    }

    public void Dispose()
    {
        Marshal.FreeHGlobal(_mem);
    }
}

It's missing getting the size of T (not sure how -- probably through the type handle somehow), and sync block indices are not handled at all (I think these are negative offsets).

On CoreCLR I don't think TypedReference is implemented, so this way won't work there.

EDIT: And of course it's missing constructor invocation too, and does not handle special classes (string, array types).

GSPP · 2016-04-15T19:31:27Z

I don't think that this should be a property of the type. Not only that this prevents allocating existing reference types outside of the GC heap

I agree with that now. @mikedn

@JanielS That is a really nasty hack :) My next idea for a hack would have been to use ILGenerator to emit T ToRef<T>(IntPtr ptr) { ldarg.0; ret; } where T : class. Also, you'd need a "template" instance of T to copy over the object header! This makes GetType() and lock work. This is fun :) Probably breaks dozens of .NET CLR invariants.

I feel we should not derail this ticket further with meaningless chatter. I'm looking forward to the team responding. I also encourage anyone to post comments for why this would help their code and to +1 the opening post.

Anyone doing games might be interested. The Stack Exchange folks posted about unsafe code tricks they did to make the tag engine perform acceptably. Would this help you, @mgravell? Or was it @mattwarren? Sorry for summoning everyone.

Maoni0 · 2016-04-15T19:40:53Z

You could construct an object perfectly but you can't call new with it...I am not aware if there's a way to tell new to goto your own allocator.

Regardless this still doesn't integrate. If you assign this to an object field, obj.x = something_I_constructed_that_looks_like_a_managed_object, GC will attempt to trace through it and it will fail. Unless this is passed as a special type that tells GC to ignore its references. But that again doesn't make it seamless.

I am thinking about isolated heaps (that allow GCs on them individually instead of per process) though. I will post something hopefully soon.

mikedn · 2016-04-15T19:41:35Z

For this to be supported the CLR has to be able to answer the question I quoted from the feature request - whether the GC owns an object at the specified address.

Yes, it has to be able to answer that and it does that. But managed pointers are quite restricted, they can live only on the stack. That makes them rather uncommon and so are any potential perf issues associated with answering the question.

EDIT: And of course it's missing constructor invocation too, and does not handle special classes ( string , array types).

It also has a good chance of corrupting memory or crashing as soon as you try to store a reference into such an object.

I feel we should not derail this ticket further with meaningless chatter.

The issue is derailed from the beginning like all other similar issues because it fails to take into account various technical realities, existing possibilities and use cases.

SunnyWar · 2016-04-15T19:55:19Z

@mikedn

The issue is derailed from the beginning like all other similar issues because it fails to take into account various technical realities, existing possibilities and use cases.

I've heard things like this before on many projects. It amounts to "we can't do it because we don't do it now" which is self-limiting. Never let historical decisions dictate future possibilities.

mikedn · 2016-04-15T20:00:34Z

I've heard things like this before on many projects. It amounts to "we can't do it because we don't do it now" which is self-limiting. Never let historical decisions dictate future possibilities.

Neah, this only has to do with people getting overly enthusiastic and claiming that a solution for a problem exists when even the problem is not understood, much less the solution.

jakobbotsch · 2016-04-15T20:10:27Z

Neah, this only has to do with people getting overly enthusiastic and claiming that a solution for a problem exists when even the problem is not understood, much less the solution.

I'll definitely admit it wasn't well tested. I didn't do much more than a few allocations and GCs. And I definitely won't argue with @Maoni0 whether it will work or not. 😄
At least it was interesting to me that the object reinterpretation worked with TypedReference. But yes, as @GSPP says, now we're getting off-topic. I'll eagerly await @Maoni0's post.

mgravell · 2016-04-17T07:46:41Z

Just to respond to an explicit mention:

Would this help you, @ https://github.com/mgravellmgravell
https://github.com/mgravell?

Not really. In general when I have data with this problem, I have lots
of them, so a block alloc (managed or unmanaged) is more interesting than
individual allocs. In the specific case of tag-engine, we're in the process
of a fundamental v2 overhaul/rewrite, with a view to making it work on GPUs
(with CPU fallback, but not shared code), so any allocation needs to be
done in a very specific way (unmanaged on fixed pages issued by the GPU
driver) for it to be compatible with the fastest data transfers.

But I share and echo the sentiment that the problem needs to be fully
understood and documented before getting excited about specific solutions.

Marc
On 15 Apr 2016 9:11 p.m., "Jakob Botsch Nielsen" [email protected]
wrote:

Neah, this only has to do with people getting overly enthusiastic and
claiming that a solution for a problem exists when even the problem is not
understood, much less the solution.

I'll definitely admit it wasn't well tested. I didn't do much more than a
few allocations and GCs. And I definitely won't argue with @Maoni0
https://github.com/Maoni0 whether it will work or not. [image: 😄]
At least it was interesting to me than the object reinterpretation worked
with TypedReference. But yes, as @GSPP https://github.com/GSPP says,
now we're getting off-topic. I'll eagerly await @Maoni0
https://github.com/Maoni0's post.

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
https://github.com/dotnet/coreclr/issues/4365#issuecomment-210621507

GSPP · 2016-04-17T13:11:48Z

@mgravell you could still block-allocate those objects. Downside is they now waste 16 bytes on the object header each. Upside is you have normal managed references. No need to pass indexes and arrays around, or pointers. To clarify: This would not involve the GC at all.

You also could have managed objects living in memory shared with the GPU.

I think we could make those objects copyable though memcpy if we disable any managed function based on the object header. That would be locking and the identity hash code I think. Those operations would throw. Also, this would obviate the need to call "Activator.Allocate". You could manually write the object header after asking the CLR what bits to write. The bits are the same for each object (basically just the type pointer).

mattwarren · 2017-07-31T10:33:50Z

It seems like you may be able to achieve this if/when the work being done in the Snowflake project arrives in CoreCLR, see Project Snowflake: Non-blocking safe manual memory management in .NET
July 26, 2017 for more info.

The code sample below is from the paper, if shows the usages of Shield<T> which implies that the allocation is on a different heap (i.e. no GC) and can be cleaned up when it's safe to do so:

T Find(Predicate<T> match) 
{
    using (Shield<T[]> s_items = _items.Defend())
    {
        for (int i = 0; i < _size; i++) 
        {
            if (match(s_items.Value[i]))
                return s_items.Value[i];
        } 
    }
    return default(T);
}

roterdam · 2018-06-23T19:24:26Z

Regardless this still doesn't integrate. If you assign this to an object field, obj.x = something_I_constructed_that_looks_like_a_managed_object, GC will attempt to trace through it and it will fail. Unless this is passed as a special type that tells GC to ignore its references. But that again doesn't make it seamless.

@Maoni0 can you explain "fail"? I see the code @jakobbotsch working fine.

GSPP · 2018-06-25T12:14:53Z

@Maoni0 Custom objects could have a bit set in the object header marking them as such to the GC. That would be a cheap way to activate custom GC behavior on a per-instance (nor per-type) basis. Would that work?

mjp41 · 2018-06-25T14:08:21Z

@GSPP so there are several pieces of meta-data the GC keeps about objects. If these aren't backed by actual allocations things can go wrong

Card Table (Used by WriteBarrier to maintain set of cross generation pointer)
Concurrent Mark Array (Used by background mark phase)
Segment lookup (find details of this part of memory the GC controls)
Brick table (find start of object)

Now, placing a bit in the header requires the operations to know where the header is. This is not always the case for pointers from the stack into the heap. In particular, the write barrier does not know the header of the object it is updating, so cannot check this bit. That means you are likely to get random segfaults when you try to write to non-existent card table.

The other data structures can also get touched based on the address, and may or may not exist for the address range you have allocated.
We found it took about 1000 line addition to gc.cpp to just to maintain the relevant other data structures, and prevent the GC tracing our objects incorrectly.

If you didn't want the card table to cover the range you are managing it would be much simpler.

roterdam · 2018-06-25T16:27:56Z

@mjp41 as long as the object is allocated outside of the GC ranges, shouldn't it "just" work? The GC code has to check if it is within range or not no? Also what happens if you would do this for an object with no fields like strings could that work?

mjp41 · 2018-06-25T17:50:39Z

There are of the order of 30 places that simply follow managed pointers by

Checking if it is not null,
Determining which heap it is on (for ServerGC)
Performing some operation on that object

Some bits do check if it is in the range of the heap, but not all of them. Many bits assume it will be able to find the GC heap/segment.

If you restrict to types that do not contain GC references (blittable types), then you would not need to deal with the card table, so it would be changing just these traversal pieces.

Our prototype put everything in separate address spaces for manually managed and GC managed, which lead to a quick cheap check. Looking in the header on every step of tracing could be expensive, and affect performance of code not using this feature.

ghost · 2022-12-26T18:01:57Z

Due to lack of recent activity, this issue has been marked as a candidate for backlog cleanup. It will be closed if no further activity occurs within 14 more days. Any new comment (by anyone, not necessarily the author) will undo this process.

This process is part of our issue cleanup automation.

GSPP · 2022-12-28T20:58:02Z

I'll go ahead and keep this issue alive. This seems to be something that people are interested in.

lakani · 2024-05-18T23:50:23Z

Ms, What are you waiting for to start on this issue or even the outcome of Snowflake project

msftgits transferred this issue from dotnet/coreclr Jan 30, 2020

msftgits added this to the Future milestone Jan 30, 2020

ghost added backlog-cleanup-candidate An inactive issue that has been marked for automated closure. no-recent-activity labels Dec 26, 2022

ghost removed the no-recent-activity label Dec 28, 2022

ghost removed the backlog-cleanup-candidate An inactive issue that has been marked for automated closure. label Dec 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Heap objects with custom allocator and explicit delete #5633

Heap objects with custom allocator and explicit delete #5633

GSPP commented Apr 15, 2016

benaadams commented Apr 15, 2016

GSPP commented Apr 15, 2016 •

edited

Loading

jakobbotsch commented Apr 15, 2016

GSPP commented Apr 15, 2016

jakobbotsch commented Apr 15, 2016

GSPP commented Apr 15, 2016 •

edited

Loading

mikedn commented Apr 15, 2016

jakobbotsch commented Apr 15, 2016 •

edited

Loading

jakobbotsch commented Apr 15, 2016 •

edited

Loading

GSPP commented Apr 15, 2016 •

edited

Loading

Maoni0 commented Apr 15, 2016

mikedn commented Apr 15, 2016 •

edited

Loading

SunnyWar commented Apr 15, 2016

mikedn commented Apr 15, 2016

jakobbotsch commented Apr 15, 2016 •

edited

Loading

mgravell commented Apr 17, 2016

GSPP commented Apr 17, 2016 •

edited

Loading

mattwarren commented Jul 31, 2017

roterdam commented Jun 23, 2018

GSPP commented Jun 25, 2018

mjp41 commented Jun 25, 2018

roterdam commented Jun 25, 2018

mjp41 commented Jun 25, 2018 •

edited

Loading

ghost commented Dec 26, 2022

GSPP commented Dec 28, 2022

lakani commented May 18, 2024

Heap objects with custom allocator and explicit delete #5633

Heap objects with custom allocator and explicit delete #5633

Comments

GSPP commented Apr 15, 2016

benaadams commented Apr 15, 2016

GSPP commented Apr 15, 2016 • edited Loading

jakobbotsch commented Apr 15, 2016

GSPP commented Apr 15, 2016

jakobbotsch commented Apr 15, 2016

GSPP commented Apr 15, 2016 • edited Loading

mikedn commented Apr 15, 2016

jakobbotsch commented Apr 15, 2016 • edited Loading

jakobbotsch commented Apr 15, 2016 • edited Loading

GSPP commented Apr 15, 2016 • edited Loading

Maoni0 commented Apr 15, 2016

mikedn commented Apr 15, 2016 • edited Loading

SunnyWar commented Apr 15, 2016

mikedn commented Apr 15, 2016

jakobbotsch commented Apr 15, 2016 • edited Loading

mgravell commented Apr 17, 2016

GSPP commented Apr 17, 2016 • edited Loading

mattwarren commented Jul 31, 2017

roterdam commented Jun 23, 2018

GSPP commented Jun 25, 2018

mjp41 commented Jun 25, 2018

roterdam commented Jun 25, 2018

mjp41 commented Jun 25, 2018 • edited Loading

ghost commented Dec 26, 2022

GSPP commented Dec 28, 2022

lakani commented May 18, 2024

GSPP commented Apr 15, 2016 •

edited

Loading

GSPP commented Apr 15, 2016 •

edited

Loading

jakobbotsch commented Apr 15, 2016 •

edited

Loading

jakobbotsch commented Apr 15, 2016 •

edited

Loading

GSPP commented Apr 15, 2016 •

edited

Loading

mikedn commented Apr 15, 2016 •

edited

Loading

jakobbotsch commented Apr 15, 2016 •

edited

Loading

GSPP commented Apr 17, 2016 •

edited

Loading

mjp41 commented Jun 25, 2018 •

edited

Loading