Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CollectionsMarshal.SetCount API for List<T> #55217

Closed
Tracked by #77391
MichalPetryka opened this issue Jul 6, 2021 · 21 comments · Fixed by #82146
Closed
Tracked by #77391

Add CollectionsMarshal.SetCount API for List<T> #55217

MichalPetryka opened this issue Jul 6, 2021 · 21 comments · Fixed by #82146
Assignees
Labels
api-approved API was approved in API review, it can be implemented area-System.Collections
Milestone

Comments

@MichalPetryka
Copy link
Contributor

MichalPetryka commented Jul 6, 2021

Background and Motivation

The existing CollectionsMarshal.AsSpan(List) API allows developers to use List for performant code as a Span, however it restricts modifications of it by having no way to change the Count of the List. Exposing an API in CollectionsMarshal for doing that would solve that and help to remove redundant array allocations and AddRange copies when adding data to a List that comes from an API that takes a Span (AddRange doesn't accept Spans so the buffers can't even be stack allocated).

Implementation details:

  • Passing the current count would be an noop
  • List version would be changed unless the count would be the same as the current one
  • Passing a count bigger than the current capacity, the List would be resized
  • Passing a count smaller than the current one would shrink the list and clear the data ONLY if it contains GC references (just like Clear does)
  • Passing a count bigger than the current one would increase the count and it would NOT initialize the exposed data.

Proposed API

namespace System.Runtime.InteropServices
{
    public static class CollectionsMarshal {
+        public static void SetCount<T>(List<T> list, int count); // alternative could be EnsureCount
    }
}

Example implementation:

public static void SetCount<T>(List<T> list, int count)
{
    if (count < 0)
        throw new ArgumentOutOfRangeException(); // would probably be a throw helper

    if (count == list._size)
        return;

    list._version++; // would need to be made internal

    if (count < list._size)
    {
        if (RuntimeHelpers.IsReferenceOrContainsReferences<T>())
            Array.Clear(list._items, count, list._size - count);
        list._size = count;
        return;
    }

    if (count > list.Capacity)
        list.Capacity = count;

    list._size = count;
}

Usage Examples

The API could be used for creating an extension method in place of the missing ReadOnlySpan AddRange:

public static void AddRange<T>(this List<T> list, ReadOnlySpan<T> span)
{
    int oldCount = list.Count;
    CollectionMarshal.SetCount(list, oldCount + span.Length);
    span.CopyTo(CollectionMarshal.AsSpan(list).Slice(oldCount))
}

Random.GetBytes is an example API that fills a Span with data here.

List<byte> list = new(16);
CollectionsMarshal.SetCount(list, 16);
Random.Shared.GetBytes(CollectionsMarshal.AsSpan(list));
// we have a count 16 list filled with random data

This would be as valid, just a bit less performant:

List<byte> list = new();
CollectionsMarshal.SetCount(list, 16);
Random.Shared.GetBytes(CollectionsMarshal.AsSpan(list));
// we have a count 16 list filled with random data

With this the end count would also be 16

List<byte> list = new() {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 };
CollectionsMarshal.SetCount(list, 16);
Random.Shared.GetBytes(CollectionsMarshal.AsSpan(list));
// we have a count 16 list filled with random data

With this the end count would also be 16 and we'd have the original content from before Clear after SetLength (not guaranteed, implementation detail)

List<byte> list = new() {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 };
list.Clear();
CollectionsMarshal.SetCount(list, 16);
// accessing the list here would give { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16 } since Clear wouldn't zero the data
Random.Shared.GetBytes(CollectionsMarshal.AsSpan(list));
// we have a count 16 list filled with random data

Alternative Designs

Currently the developer is required to do this, which requires an array allocation and a copy coming from AddRange:

List<byte> list = new(16);
list.AddRange(new byte[16]);

The first example usage would look like this:

List<byte> list = new(16);
byte[] array = new byte[16];
RandomNumberGenerator.Fill(array);
list.AddRange(array);

Risks

The API could expose non cleared memory for value types with no GC references (when the List was cleared or if Lists would use GetUninitalizedArray when the initial array was allocated unitialized).

@MichalPetryka MichalPetryka added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Jul 6, 2021
@dotnet-issue-labeler dotnet-issue-labeler bot added area-System.Collections untriaged New issue has not been triaged by the area owner labels Jul 6, 2021
@ghost
Copy link

ghost commented Jul 6, 2021

Tagging subscribers to this area: @eiriktsarpalis
See info in area-owners.md if you want to be subscribed.

Issue Details

Background and Motivation

The existing CollectionsMarshal.AsSpan(List) API allows developers to use List for performant code as a Span, however it restricts modifications of it by having no way to change the Length of the List. Exposing an API in CollectionsMarshal for doing that would solve that and help to remove redundant array allocations and AddRange copies when adding data to a List that comes from an API that takes a Span (AddRange doesn't accept Spans so the buffers can't even be stack allocated).

Implementation details:

  • Passing the current length would be an noop
  • List version would be changed unless the length would be the same as the current one
  • Passing a length bigger than the current capacity, the List would be resized
  • Passing a length smaller than the current one would shrink the list and clear the data ONLY if it contains GC references (just like Clear does)
  • Passing a length bigger than the current one would increase the length and it would NOT initialize the exposed data.

Proposed API

namespace System.Runtime.InteropServices
{
    public static class CollectionsMarshal {
+        public static void SetLength<T>(List<T> list, int length);
    }
}

Example implementation:

public static void SetLength<T>(List<T> list, int length)
{
    if (length < 0)
        throw new ArgumentOutOfRangeException(); // would probably be a throw helper

    if (length == list._size)
        return;

    list._version++; // would need to be made internal

    if (length < list._size)
    {
        if (RuntimeHelpers.IsReferenceOrContainsReferences<T>())
            Array.Clear(list._items, length, list._size - length);
        list._size = length;
        return;
    }

    if (length > list.Capacity)
        list.Capacity = length;

    list._size = length;
}

Usage Examples

RandomNumberGenerator.Fill is an example Span API here, more usual case would be using less expensive APIs.

List<byte> list = new(16);
CollectionsMarshal.SetLength(list, 16);
RandomNumberGenerator.Fill(CollectionsMarshal.AsSpan(list));
// we have a length 16 list filled with random data

Alternative Designs

Currently the developer is required to do this, which requires an array allocation and a copy coming from AddRange

List<byte> list = new(16);
byte[] array = new byte[16];
RandomNumberGenerator.Fill(array);
list.AddRange(array);

Risks

The API could expose non cleared memory for value types with no GC references (when the List was cleared or if Lists would use GetUninitalizedArray when the initial array was allocated unitialized).

Author: MichalPetryka
Assignees: -
Labels:

api-suggestion, area-System.Collections, untriaged

Milestone: -

@MichalPetryka MichalPetryka changed the title Add CollectionsMarshal.SetLength API for List<T> Add CollectionsMarshal.SetCount API for List<T> Jul 6, 2021
@GrabYourPitchforks
Copy link
Member

IMO this would be more useful and discoverable as an instance method List<T>.Resize(int newCount), which also zero-inits the buffer if needed. Zero-initing is surprisingly cheap; and it can be elided entirely if T contains gcrefs, as you called out; and it can also be elided if the underlying array ends up being resized as a result of the operation.

@MichalPetryka
Copy link
Contributor Author

MichalPetryka commented Jul 6, 2021

While an instance Resize that'd zero-init would be more discoverable, I'd rather add both APIs than just it, since probably a lot of the usecases for AsSpan that'd modify don't care about the contents of it, so it'd only prove to be an unnecessary cost.
Would adding both an instance Resize with zero init and not as exposed SetCount without it be an acceptable solution?

Edit: I guess there could also be an optional parameter, something like bool initElements = true

@eiriktsarpalis eiriktsarpalis added api-needs-work API needs work before it is approved, it is NOT ready for implementation and removed untriaged New issue has not been triaged by the area owner labels Jul 12, 2021
@eiriktsarpalis eiriktsarpalis added this to the 7.0.0 milestone Jul 12, 2021
@f-i-x-7
Copy link

f-i-x-7 commented Aug 3, 2021

May be just expose list's entire array as a span? E.g. following new API should be added (name is draft):

namespace System.Runtime.InteropServices
{
    public static class CollectionsMarshal
    {
+       public static Span<T> AsCapacitySpan<T>(List<T>? list);
    }
}

Implementation is similar to existing AsSpan:

namespace System.Runtime.InteropServices
{
    public static class CollectionsMarshal
    {
        public static Span<T> AsCapacitySpan<T>(List<T>? list)
            => list is null ? default : new Span<T>(list._items, 0, list._items.Length);
    }
}

Usage example with already mentioned RandomNumberGenerator.Fill:

List<byte> list = new(16);
RandomNumberGenerator.Fill(CollectionsMarshal.AsCapacitySpan(list));
// we have a count 16 list filled with random data

@MichalPetryka
Copy link
Contributor Author

MichalPetryka commented Aug 3, 2021

May be just expose list's entire array as a span? E.g. following new API should be added (name is draft):

namespace System.Runtime.InteropServices
{
    public static class CollectionsMarshal
    {
        public static Span<T> AsCapacitySpan<T>(List<T>? list);
    }
}

Implementation is similar to existing AsSpan:

namespace System.Runtime.InteropServices
{
    public static class CollectionsMarshal
    {
        public static Span<T> AsCapacitySpan<T>(List<T>? list)
            => list is null ? default : new Span<T>(list._items, 0, list._items.Length);
    }
}

Usage example with already mentioned RandomNumberGenerator.Fill:

List<byte> list = new(16);
RandomNumberGenerator.Fill(CollectionsMarshal.AsCapacitySpan(list));
// we have a count 16 list filled with random data

This wouldn't really help as an API like this wouldn't modify the Count, meaning that the data written there would be inaccessible without using a span.

It is also possible to somewhat create that already without using reflections with MemoryMarshal.CreateSpan(ref MemoryMarshal.GetReference(CollectionsMarshal.AsSpan(list)), list.Capacity)

@eiriktsarpalis eiriktsarpalis removed the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Nov 16, 2021
@MichalPetryka
Copy link
Contributor Author

Could I get some movement on the proposal @eiriktsarpalis? It's been sitting as api-needs-work for the past few months, with the only change being it getting added to the work planned for .Net 7, but I'm afraid it won't get there due to not being approved yet.

@stephentoub
Copy link
Member

It's been sitting as api-needs-work for the past few months, with the only change being it getting added to the work planned for .Net 7, but I'm afraid it won't get there due to not being approved yet.

As a meta point aside, it takes work to review API proposals, validate that the concept is something we want included, ensure the design makes sense given the rest of what's exposed, etc. That work all happens prior to an issue being marked as API approved. So the fact that it was added to the cited "planned work" issue is in fact movement.

@krwq
Copy link
Member

krwq commented Mar 23, 2022

@MichalPetryka I'm also not in favor of this API as proposed, I much more like @GrabYourPitchforks's proposal. If you care about perf why do you use List rather than pre-allocated buffer. Perhaps I also simply cannot see end to end scenario where I'd use List with RNG.Fill and care much about perf. Perhaps if you shared E2E scenario it would be more convincing

@MichalPetryka
Copy link
Contributor Author

RNG was used as an example API that takes in a span, if you want a more real usecase the example span AddRange from #1530 (comment) shows a scenario with less overhead. I'd be fine with a general purpose API as proposed by @GrabYourPitchforks, but an "unsafe" version would still be useful, even for example in that proposed AddRange to avoid the overhead of writing the data there twice.

@krwq
Copy link
Member

krwq commented Mar 23, 2022

I'll suggest to change proposal to what @GrabYourPitchforks has suggested and once you have that API perhaps create a second proposal (only if you feel you'll need it - perhaps share a benchmark with potential perf improvement to support that)

@jeffhandley jeffhandley modified the milestones: 7.0.0, Future Jul 9, 2022
@jeffhandley jeffhandley added the needs-author-action An issue or pull request that requires more info or actions from the author. label Jul 9, 2022
@ghost
Copy link

ghost commented Jul 9, 2022

This issue has been marked needs-author-action and may be missing some important information.

@ghost ghost added the no-recent-activity label Jul 23, 2022
@ghost
Copy link

ghost commented Jul 23, 2022

This issue has been automatically marked no-recent-activity because it has not had any activity for 14 days. It will be closed if no further activity occurs within 14 more days. Any new comment (by anyone, not necessarily the author) will remove no-recent-activity.

@MichalPetryka
Copy link
Contributor Author

I've tried benchmarking both a zeroing and a non zeroing version with an AddRange from #1530 adding 50 ints each to an empty list with preallocated memory, but me and 3 other people have gotten wildly different results:
Me:

|     Method |     Mean |    Error |   StdDev |
|----------- |---------:|---------:|---------:|
|    Zeroing | 70.31 ns | 0.411 ns | 0.365 ns |
| NonZeroing | 11.34 ns | 0.269 ns | 0.224 ns |

One person:

|     Method |     Mean |    Error |   StdDev |
|----------- |---------:|---------:|---------:|
|    Zeroing | 21.79 ns | 0.500 ns | 0.545 ns |
| NonZeroing | 14.88 ns | 0.295 ns | 0.340 ns |

Another person:

|     Method |     Mean |    Error |   StdDev |
|----------- |---------:|---------:|---------:|
|    Zeroing | 28.67 ns | 0.207 ns | 0.194 ns |
| NonZeroing | 10.20 ns | 0.012 ns | 0.011 ns |

@tannergooding:

|     Method |      Mean |     Error |    StdDev |
|----------- |----------:|----------:|----------:|
|    Zeroing | 11.600 ns | 0.1619 ns | 0.1514 ns |
| NonZeroing |  6.739 ns | 0.0170 ns | 0.0159 ns |

While the ratios are completely different, they all show a clear win for a non zeroing version, so I'd prefer both to be exposed. (I have absolutely no idea what would cause the ratio to be so wildly different on my machine, but it persists on every single run)

@ghost ghost added needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration and removed needs-author-action An issue or pull request that requires more info or actions from the author. no-recent-activity labels Aug 5, 2022
@bartonjs bartonjs added api-approved API was approved in API review, it can be implemented and removed api-ready-for-review API is ready for review, it is NOT ready for implementation labels Oct 18, 2022
@layomia layomia self-assigned this Oct 18, 2022
@stephentoub stephentoub removed their assignment Oct 24, 2022
@stephentoub
Copy link
Member

@layomia, I see you assigned this to yourself. Does that mean you're working on this?

@layomia
Copy link
Contributor

layomia commented Oct 27, 2022

@stephentoub yes.

@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Nov 2, 2022
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Dec 17, 2022
@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Jan 19, 2023
@EgorBo
Copy link
Member

EgorBo commented Feb 11, 2023

I personally am not a big fan of the decision to not zero the non-gc elements, somebody someday will shoot their leg with this:

List<int> a = Enumerable.Range(0, 1024).ToList();

a.SetCount(10);  // doesn't clear upper 1024-10 elements. OK
    ...
a.SetCount(100); // doesn't clear on expansion either. Not OK
                 // I now have garbage in my upper 90 elements

(as in implemented in #80311)
cc @GrabYourPitchforks

@tannergooding
Copy link
Member

tannergooding commented Feb 11, 2023

This is a lowlevel API where the intent of use is for developers to write their own performant APIs that extend List<T>s functionality.

It exists in the CollectionsMarshal class under the System.Runtime.InteropServices namespace, both of which API review have agreed indicate the "unsafeness" of the APIs exposed.

The API can be used "incorrectly" but the intended usage scenario is that after a.SetCount(100); the developer explicitly initialize the n (90 in this case) "newly visible" elements. Thus, for the intended use-case, zeroing is doing needless throwaway work since the API author will be immediately overwriting the elements themselves.

It is no more unsafe than module:SkipLocalsInit or Unsafe.SkipInit and is certainly more visibly unsafe than the former.

Particularly for large T and for large mutations, the zeroing and additional cache updates can quickly add up and negatively impact the performance of the application. Requiring zeroing for all types would hurt the main usage scenario of the API and ultimately result in people coming and asking for yet another API that does allow skipping the zeroing with the API that does zero then being under utilized.

@EgorBo
Copy link
Member

EgorBo commented Feb 11, 2023

module:SkipLocalsInit, Unsafe.SkipInit, GC.AllocateUninitialized all explicitly state that you'll have to deal with possibly junk data. This API doesn't. Anyway, just wanted to highlight my concern.

under the System.Runtime.InteropServices namespace, both of which API review have agreed indicate the "unsafeness"

Honestly, it doesn't read for me as "it's unsafe", but rather "a set of tools to help with interop"

@tannergooding
Copy link
Member

Interop and marshalling is by definition unsafe.

It's one of the many things we'd mark as UnsafeCallersOnly if such a concept existed.

That's why we didn't name this SetCountUnsafe for example, because we believe that it being in CollectionsMarshal implies that.

@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Feb 13, 2023
MichalPetryka added a commit to MichalPetryka/runtime that referenced this issue Feb 15, 2023
Adds the ability to resize lists, exposed in
CollectionsMarshal due to potentially risky
behaviour caused by the lack of element initialization.

Supersedes dotnet#77794.

Fixes dotnet#55217.
@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Feb 15, 2023
stephentoub pushed a commit that referenced this issue Apr 25, 2023
* Add CollectionsMarshal.SetCount(list, count)

Adds the ability to resize lists, exposed in
CollectionsMarshal due to potentially risky
behaviour caused by the lack of element initialization.

Supersedes #77794.

Fixes #55217.

* Update XML doc

* Add missing using

* Fix test

* Update CollectionsMarshalTests.cs

* Update CollectionsMarshal.cs

* Update CollectionsMarshalTests.cs

* Update CollectionsMarshalTests.cs
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Apr 25, 2023
@ghost ghost locked as resolved and limited conversation to collaborators May 25, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-approved API was approved in API review, it can be implemented area-System.Collections
Projects
None yet