Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add initial Span<T> access to creation of ImmutableArray<T> #53154

Closed
TheBuzzSaw opened this issue May 24, 2021 · 13 comments
Closed

Add initial Span<T> access to creation of ImmutableArray<T> #53154

TheBuzzSaw opened this issue May 24, 2021 · 13 comments
Labels
api-needs-work API needs work before it is approved, it is NOT ready for implementation area-System.Collections needs-author-action An issue or pull request that requires more info or actions from the author. no-recent-activity
Milestone

Comments

@TheBuzzSaw
Copy link

Background and Motivation

As .NET increasingly leverages Span<T> across its APIs, it becomes increasingly important that all core data structures fully support it as well. While ImmutableArray<T> does support being accessed as a ReadOnlySpan<T> by way of AsSpan(), there does not seem to be a way to access a Span<T> during the creation. This results in unnecessary copies of (possibly huge) arrays or performance-hostile rapid-fire usage of immutableArrayBuilder.Add(item) (particularly if it's a byte array).

Currently, I use an unsafe workaround to enable all my Span<T> goodness throughout the array-building process.

return Unsafe.As<byte[], ImmutableArray<byte>>(ref blob);

Proposed API

There is already an excellent template for one way to accomplish this: the String.Create(...) method (seen here).

namespace System.Collections.Immutable
{
    public static class ImmutableArray
    {
+        public static ImmutableArray<T> Create<T, TState>(int length, TState state, SpanAction<T, TState> action) => throw null;
    }
}

The TState enables callers to avoid closures/allocations. The SpanAction grants temporary access to the entirety of the buffer, which can now be filled using modern APIs that speak to Span<T>. Examples include the results of a stream.Read(...), a call to Base64.DecodeFromUtf8(...), etc.

Usage Examples

The usage reflects string.Create.

var encodedBytes = File.ReadAllBytes("base64.txt");
var decodedLength = Base64.GetMaxDecodedFromUtf8Length(encodedbytes.Length); // Just assume this lines up.
var decodedBytes = ImmutableArray.Create<byte, byte[]>(
    decodedLength,
    encodedBytes,
    (span, state) =>
    {
        Base64.DecodeFromUtf8(state, span, out _, out _);
    });

Alternative Designs

An unfortunate aspect of the proposal above is that, unlike string.Create(...), the generic types cannot be easily inferred from usage, so they have to be specified up front. I'd be open to another approach that simply exposes the buffer inside of a builder.

namespace System.Collections.Immutable
{
    public struct ImmutableArray<T>
    {
        public sealed class Builder
        {
+            public void SetValues<TState>(TState state, SpanAction<T, TState> action) => throw null;
        }
    }
}

This would also be an acceptable solution.

var encodedBytes = File.ReadAllBytes("base64.txt");
var decodedLength = Base64.GetMaxDecodedFromUtf8Length(encodedbytes.Length); // Just assume this lines up.
var builder = ImmutableArray.CreateBuilder<byte>(decodedLength);
builder.SetValues(
    encodedBytes,
    (span, state) =>
    {
        Base64.DecodeFromUtf8(state, span, out _, out _);
    });
var decodedBytes = builder.MoveToImmutable();

It adds a step, but being able to mutate the data en mass inside a builder could have greater potential. SetValues is not a great name, which is why it is my alternative proposal. :)

Risks

The only risk that really comes to mind is the further encouragement of immutable collections, but it is hopefully seen as an upside for most. I can't think of any neighboring APIs or extensions this would bump into or cause issues with.

@TheBuzzSaw TheBuzzSaw added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label May 24, 2021
@dotnet-issue-labeler dotnet-issue-labeler bot added area-System.Collections untriaged New issue has not been triaged by the area owner labels May 24, 2021
@ghost
Copy link

ghost commented May 24, 2021

Tagging subscribers to this area: @eiriktsarpalis
See info in area-owners.md if you want to be subscribed.

Issue Details

Background and Motivation

As .NET increasingly leverages Span<T> across its APIs, it becomes increasingly important that all core data structures fully support it as well. While ImmutableArray<T> does support being accessed as a ReadOnlySpan<T> by way of AsSpan(), there does not seem to be a way to access a Span<T> during the creation. This results in unnecessary copies of (possibly huge) arrays or performance-hostile rapid-fire usage of immutableArrayBuilder.Add(item) (particularly if it's a byte array).

Currently, I use an unsafe workaround to enable all my Span<T> goodness throughout the array-building process.

return Unsafe.As<byte[], ImmutableArray<byte>>(ref blob);

Proposed API

There is already an excellent template for one way to accomplish this: the String.Create(...) method (seen here).

namespace System.Collections.Immutable
{
    public static class ImmutableArray
    {
+        public static ImmutableArray<T> Create<T, TState>(int length, TState state, SpanAction<T, TState> action) => throw null;
    }
}

The TState enables callers to avoid closures/allocations. The SpanAction grants temporary access to the entirety of the buffer, which can now be filled using modern APIs that speak to Span<T>. Examples include the results of a stream.Read(...), a call to Base64.DecodeFromUtf8(...), etc.

Usage Examples

The usage reflects string.Create.

var encodedBytes = File.ReadAllBytes("base64.txt");
var decodedLength = Base64.GetMaxDecodedFromUtf8Length(encodedbytes.Length); // Just assume this lines up.
var decodedBytes = ImmutableArray.Create<byte, byte[]>(
    decodedLength,
    encodedBytes,
    (span, state) =>
    {
        Base64.DecodeFromUtf8(state, span, out _, out _);
    });

Alternative Designs

An unfortunate aspect of the proposal above is that, unlike string.Create(...), the generic types cannot be easily inferred from usage, so they have to be specified up front. I'd be open to another approach that simply exposes the buffer inside of a builder.

namespace System.Collections.Immutable
{
    public struct ImmutableArray<T>
    {
        public sealed class Builder
        {
+            public void SetValues<TState>(TState state, SpanAction<T, TState> action) => throw null;
        }
    }
}

This would also be an acceptable solution.

var encodedBytes = File.ReadAllBytes("base64.txt");
var decodedLength = Base64.GetMaxDecodedFromUtf8Length(encodedbytes.Length); // Just assume this lines up.
var builder = ImmutableArray.CreateBuilder<byte>(decodedLength);
builder.SetValues(
    encodedBytes,
    (span, state) =>
    {
        Base64.DecodeFromUtf8(state, span, out _, out _);
    });
var decodedBytes = builder.MoveToImmutable();

It adds a step, but being able to mutate the data en mass inside a builder could have greater potential. SetValues is not a great name, which is why it is my alternative proposal. :)

Risks

The only risk that really comes to mind is the further encouragement of immutable collections, but it is hopefully seen as an upside for most. I can't think of any neighboring APIs or extensions this would bump into or cause issues with.

Author: TheBuzzSaw
Assignees: -
Labels:

api-suggestion, area-System.Collections, untriaged

Milestone: -

@svick
Copy link
Contributor

svick commented May 24, 2021

What if this was a static method on ImmutableArray<T>? That way, type inference for state would work, without the extra step of going through a builder: e.g. ImmutableArray<byte>.Create(decodedLength, encodedBytes, ...)

Though it would be inconsistent with all the other static methods, which are on the non-generic ImmutableArray, so this is probably a bad idea.

@TheBuzzSaw
Copy link
Author

@svick I like the idea as it would indeed solve the type-inference issue, but yeah, as you pointed out, it would create an abrupt inconsistency in where all the static methods live. I'm hoping the alternative API I proposed becomes a viable path forward.

@eiriktsarpalis
Copy link
Member

Placing the method on the builder seems like a more reasonable choice, however there is also the alternative of using a CollectionsMarshall.AsSpan-like approach.

Related to #22928.

@eiriktsarpalis eiriktsarpalis added api-needs-work API needs work before it is approved, it is NOT ready for implementation and removed untriaged New issue has not been triaged by the area owner labels Jul 21, 2021
@eiriktsarpalis eiriktsarpalis added this to the 7.0.0 milestone Jul 21, 2021
@TheBuzzSaw
Copy link
Author

@eiriktsarpalis I think that would be useful as well, but it feels like unnecessary ceremony if I already know the exact size of my array. I want this to function like the string.Create(...) method.

@eiriktsarpalis eiriktsarpalis removed the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Nov 16, 2021
@Neme12
Copy link

Neme12 commented May 3, 2022

Placing the method on the builder seems like a more reasonable choice, however there is also the alternative of using a CollectionsMarshall.AsSpan-like approach.

Related to #22928.

I think the reason that CollectionsMarshall.AsSpan is on CollectionsMarshall as opposed to List<T> is that it is unsafe in that you could then add items to the list, which would regrow it and make it use a different underyling array, meaning the span you got from AsSpan doesn't represent the real contents of the list any longer. That doesn't apply to ImmutableArray<T> - conversion to ReadOnlySpan<T> should just work with an implicit conversion. If you're talking about AsSpan that returns a mutable span, I think that might become really dangerous as many APIs and consumers rely on ImmutableArray<T> being really immutable and mutating it after creating would result in the same problems as mutating string instances. I think the Create method is the best approach and it's what string already does.

@Neme12
Copy link

Neme12 commented May 3, 2022

BTW it seems that #22928 didn't actually add an implicit conversion from ImmutableArray<T> to ReadOnlySpan<T>. Why was that? Such a conversion exists from T[] and I thought ImmutableArray<T> was supposed to mimic arrays as much as possible.

@jeffhandley jeffhandley modified the milestones: 7.0.0, Future Jul 9, 2022
@TheBuzzSaw
Copy link
Author

TheBuzzSaw commented Sep 6, 2022

So, now that I see this is marked as "needs work", what direction can it go in? Is there a reasonable way to make this work?

Personally, I would not be against a collections marshal method (a la List<T>) that works on immutable array builders.

@huoyaoyuan
Copy link
Member

Is this still required with #83141 implemented?

The shape of this proposal looks "safer", but they serves for the same purpose.

@eiriktsarpalis
Copy link
Member

@huoyaoyuan I would tend to agree, while a delegate-based API might make sense for strings I don't think it's essential for immutable arrays, particularly given the newly added API. It still necessitates a tiny bit of buffer management on the user's part, but I don't think it's a dealbreaker.

@eiriktsarpalis eiriktsarpalis added the needs-author-action An issue or pull request that requires more info or actions from the author. label Sep 4, 2023
@ghost
Copy link

ghost commented Sep 4, 2023

This issue has been marked needs-author-action and may be missing some important information.

@ghost ghost added the no-recent-activity label Sep 18, 2023
@ghost
Copy link

ghost commented Sep 18, 2023

This issue has been automatically marked no-recent-activity because it has not had any activity for 14 days. It will be closed if no further activity occurs within 14 more days. Any new comment (by anyone, not necessarily the author) will remove no-recent-activity.

@ghost
Copy link

ghost commented Oct 2, 2023

This issue will now be closed since it had been marked no-recent-activity but received no further activity in the past 14 days. It is still possible to reopen or comment on the issue, but please note that the issue will be locked if it remains inactive for another 30 days.

@ghost ghost closed this as completed Oct 2, 2023
@ghost ghost locked as resolved and limited conversation to collaborators Nov 1, 2023
This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-needs-work API needs work before it is approved, it is NOT ready for implementation area-System.Collections needs-author-action An issue or pull request that requires more info or actions from the author. no-recent-activity
Projects
None yet
Development

No branches or pull requests

6 participants