Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API Proposal: C# 10 interpolated strings support, part 1 #50601

Closed
stephentoub opened this issue Apr 1, 2021 · 52 comments · Fixed by #51086
Closed

API Proposal: C# 10 interpolated strings support, part 1 #50601

stephentoub opened this issue Apr 1, 2021 · 52 comments · Fixed by #51086
Labels
api-approved API was approved in API review, it can be implemented area-System.Runtime
Milestone

Comments

@stephentoub
Copy link
Member

stephentoub commented Apr 1, 2021

Background and Motivation

C# 6 added support for interpolated strings, enabling code to easily format strings by putting arbitrary C# into the format "holes" in the string. These holes are then evaluated prior to having their result added into the string, e.g.

string name = "Stephen";
int year = 2021;
string result = $"Hello, {name}.  The year is {year}.";

The C# compiler has multiple strategies for generating the code behind such interpolation. If all of the holes are filled with string constants, e.g.

string result = $"Hello, {"Stephen"}.  The year is {"2021"}."

it can choose to simply emit a const string:

string result = "Hello, Stephen.  The year is 2021.";

Or, it might choose to use string.Concat, which it will typically do if all of the components are strings, e.g. converting:

string firstName = "Stephen";
string lastName = "Toub";
return $"Hello, {firstName} {lastName}. How are you?";

into something like:

string text = "Stephen";
string text2 = "Toub";
string result = string.Concat(new string[5]
{
    "Hello, ",
    firstName,
    " ",
    lastName,
    ". How are you?"
});

Or, as the most general support, it may choose to use string.Format, e.g.

string name = "Stephen";
int year = 2021;
string result = $"Hello, {name}.  The year is {year}.";

becomes:

string name = "Stephen";
int year = 2021;
string result = string.Format("Hello, {0}.  The year is {1}.", (object)name, (object)year);

There are various interesting deficiencies to notice here, both from a functionality and from a performance perspective:

  1. The holes in interpolated strings can’t be ref structs or pointers or anything else that can’t be boxed. Thus, you can’t use ReadOnlySpan<char>s, which is desirable if you want to, for example, slice a string.
  2. Every execution ends up needing to reparse the format string in order to find the holes, find the format or alignment associated with each hole, etc. This adds non-trivial overheads to every string.Format call.
  3. Every value type that fills in a hole ends up getting boxed. Some folks then try to work around this by ToString()’ing, under the assumption the implementation is going to call ToString() anyway, but for primitives the implementation can often avoid this. To that point…
  4. ToString on each hole is generally needed in order to create a string that can be appended to the underlying StringBuilder. Primitive types in corelib implement a special interface that enables them to be formatted into a span in order to avoid these intermediate string allocations, but that interface isn’t exposed publicly.
  5. Both string.Concat and string.Format have overloads that accept a few arguments directly and then support more than that by taking a params array. This means that anything more than three or four components results in an array allocation.
  6. Interpolated strings can be target typed to either string or FormattableString, the latter of which is meant to be a generalized mechanism that wraps the format and an object[] of the hole values, but this is similarly expensive in many ways, and also doesn’t support some key patterns desirable for interpolated strings, e.g. being able to short-circuit evaluation of one or more holes if the use demands (such as writing into some fixed-size space that no longer has room for the additional holes, or logging and wanting to avoid all computation for the holes if logging is disabled or below the associated log level).

For C# 10, all of these issues are being addressed with a new pattern-based mechanism for implementing interpolated strings. The pattern involves a builder that exposes methods that can be called to append the individual components, which enables the builder to expose whatever overloads are necessary to efficiently implement and support whatever types are desired. While the compiler still retains the ability to choose which mechanism to use on a case-by-case basis, the original example of:

string name = "Stephen";
int year = 2021;
string result = $"Hello, {name}.  The year is {year}.";

can now be compiled as:

var builder = InterpolatedStringBuilder.Create(23, 2);
builder.AppendLiteral("Hello, ");
builder.AppendFormatted(name);
builder.AppendLiteral(".  The year is ");
builder.AppendFormatted(year);
builder.AppendLiteral(".");
string result = builder.ToStringAndClear();

https://github.com/dotnet/csharplang/blob/main/proposals/improved-interpolated-strings.md#improved-interpolated-strings

Proposed API

This proposal covers the API surface area necessary to support targeting strings. A separate proposal will follow involving additional user-exposed API surface area for more advanced use of this functionality.

The builder must follow a general pattern established by the compiler. This pattern is still evolving, so we may need to tweak this slightly until the C# 10 support is fully baked. I would like to review/approve the general support and then just tweak it as the compiler’s needs dictate (e.g. there’s a discussion about whether Append methods must return bool instead of allowing void as I’ve done here, whether builders can be passed by ref, etc.)

namespace System
{
    public interface ISpanFormattable : IFormattable // currently internal
    {
        bool TryFormat(Span<char> destination, out int charsWritten, ReadOnlySpan<char> format, IFormatProvider? provider);
    }
}

namespace System.Runtime.CompilerServices
{
    public ref struct InterpolatedStringBuilder
    {
        // Create the builder and extract its string / clean up
        public static InterpolatedStringBuilder Create(int literalLength, int formattedCount);
        public override string ToString(); // for diagnostics
        public string ToStringAndClear();

        // To handle the base string portions of the interpolated string
        public void AppendLiteral(string value);

        // To handle most types, full set of overloads to minimize IL bloat at call sites
        public void AppendFormatted<T>(T value);
        public void AppendFormatted<T>(T value, string? format);
        public void AppendFormatted<T>(T value, int alignment);
        public void AppendFormatted<T>(T value, int alignment, string? format);

        // To allow for ROS to be in holes
        public void AppendFormatted(ReadOnlySpan<char> value);
        public void AppendFormatted(ReadOnlySpan<char> value, int alignment = 0, string? format = null);

        // To handle strings, because they're very common and we can optimize for them specially
        public void AppendFormatted(string? value);
        public void AppendFormatted(string? value, int alignment = 0, string? format = null);

        // Fallback for everything that can't be target typed
        public void AppendFormatted(object? value, int alignment = 0, string? format = null);

        // Optionally: to handle nullable value types, as currently they're super expensive via the generic T path
        // Won't add as long as the JIT can optimize for them soon-ish.
        public void AppendFormatted<T>(T? value) where T : struct;
        public void AppendFormatted<T>(T? value, string? format) where T : struct;
        public void AppendFormatted<T>(T? value, int alignment) where T : struct;
        public void AppendFormatted<T>(T? value, int alignment, string? format) where T : struct;
    }
}

We already implement ISpanFormattable on a bunch of types in Corelib: Byte, DateTime, DateTimeOffset, Decimal, Double, Guid, Half, Int16, Int32, Int64, IntPtr, SByte, Single, TimeSpan, UInt16, UInt32, UInt64, UIntPtr, Version. With the interface exposed publicly, we'll implement it in a few other places, at least the types that already expose the right Append method: BigInteger and IPAddress.

Note that InterpolatedStringBuilder stores an ArrayPool array. We've been hesitant to expose such structs historically when they're meant to be general purpose. Here, though, it's a compiler helper type in System.Runtime.CompilerServices.

Issues for Discussion

  1. Pattern method names. The "AppendLiteral" and "AppendFormatted" names are aspects of the design we can help C# to decide on.
  2. Pointers. The shape as outlined will not support pointers in holes. We could do so by adding overloads that accepted void*; most likely implementation would be to cast to IntPtr and delegate to the other overloads. A developer can also do that themselves, though, in the code they write. So, I think we shouldn’t have them; we can always add them in the future if it turns out to be important.
  3. IFormattable DIM. As an alternative to exposing a new ISpanFormattable, we could add TryFormat as a DIM to the existing IFormattable. However, a consumer would then have no way to know whether they were getting an optimized, non-allocating TryFormat, or whether they were getting the default implementation that would need to ToString and then copy into the destination span. This is problematic for code that assumes TryFormat is non-allocating, e.g. where TryFormat might be called in a loop that grows the buffer repeatedly until it’s big enough: using a default implementation could result in a new ToString each time, whereas if the code is forced to just use IFormattable.ToString, it can get the resulting string once and resize to a guaranteed known-long-enough size.
  4. AppendFormatted overload sets. We could tweak what types we have overloads for and whether we have dedicated alignment/format overloads for them.
  5. Special-casing ISpanFormattable in code rather than signature. We can’t have two overloads both generic on T, one that constrains to ISpanFormattable and one that doesn’t. Instead, the implementation of the generic overload will do an interface check for ISpanFormattable. For value types, generic specialization will compile away this check. But reference types generally will end up paying for it.
  6. Dispose. Should InterpolatedStringBuilder implement Dispose? Doing so would enable us to always return a rented array to the pool, even in the case of exception, however exceptions should be very rare and we generally don't add try/finally blocks in order to return arrays to the pool. The most compelling argument would be that we can null out the references to the array in the calling frame, which would only be meaningful in a case where the calling frame kept the local alive and used some helper method to call ToString (which would then be the one doing the returning and nulling out). But, adding Dispose will end up adding a try/finally at every use site, further bloating the IL.
  7. ISpanFormattable : IFormattable. Should ISpanFormattable inherit IFormattable? It helps to clarify that TryFormat should do the same operation as IFormattable.ToString and that a consumer should always be able to fall back to using ToString if, for example, a destination span isn't available with enough space. It also means for reference types we can check for IFormattable first, and if the object isn't, then we don't have to pay for a second interface check. The downside is not all types we want to be able to format into a span take advantage of a string format or IFormatProvider, e.g. char, Rune, and Version will all just delegate to ToString().

Additional Builder Support

Part 2 covers additional builders for additional scenarios:
#50635
These are dependent on additional language support that’s not yet been committed to, but hopefully will be soon.

cc: @333fred, @jaredpar

@stephentoub stephentoub added area-System.Runtime blocking Marks issues that we want to fast track in order to unblock other important work api-ready-for-review API is ready for review, it is NOT ready for implementation labels Apr 1, 2021
@stephentoub stephentoub added this to the 6.0.0 milestone Apr 1, 2021
@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Apr 1, 2021
@stephentoub stephentoub removed the untriaged New issue has not been triaged by the area owner label Apr 1, 2021
@GrabYourPitchforks
Copy link
Member

@stephentoub In your sample, you had a compiler-generated call to InterpolatedStringBuilder.Format, but no such API exists in your proposal. Is something missing?

Also, is there an equivalent to FormattableString.Invariant($"...") for scenarios where we don't want to capture the current culture?

@stephentoub
Copy link
Member Author

stephentoub commented Apr 1, 2021

In your sample, you had a compiler-generated call to InterpolatedStringBuilder.Format, but no such API exists in your proposal. Is something missing?

Copy/paste error. Fixed.

Also, is there an equivalent to FormattableString.Invariant($"...") for scenarios where we don't want to capture the current culture?

There will be, just not in this proposal. That'll be part of the follow-on mentioned at the end with "Enabling the developer to provide an IFormatProvider when creating strings / formatting into spans". But the quick summary is you'll be able to write:

string result = string.Format(CultureInfo.InvariantCulture, $"{a} = {b}");

and it'll "just work". I wanted to split it into two parts: 1) things necessary to support the existing syntax when building strings, and 2) new dev-consumable surface are for customizing how such strings are built and building other things.

@Clockwork-Muse
Copy link
Contributor

.... what happens if the type needs more space than remains in the Span? Since I'm assuming we're eventually heading towards making this user-implementable...

@stephentoub
Copy link
Member Author

.... what happens if the type needs more space than remains in the Span? Since I'm assuming we're eventually heading towards making this user-implementable...

Which span?

@Tornhoof
Copy link
Contributor

Tornhoof commented Apr 1, 2021

In your example, i think the baseLength should be 24, you forgot the space after Hello,. Assuming your baseLength is the total length of the un-interpolated parts

@stephentoub
Copy link
Member Author

In your example, i think the baseLength should be 24, you forgot the space after Hello,. Assuming your baseLength is the total length of the un-interpolated parts

I was missing the space in my hand-generation of the compiler output, but I think 23 is correct.

@miyu
Copy link

miyu commented Apr 1, 2021

Can we include example implementations of TryFormat in this spec?

  • How does that work for nested objects (e.g. Rectangle -> Point/Size -> Int)?
    • [Rectangle Position={position} Size={size}]
    • [Point X={x} Y={y}]
    • [Size Width={width} Height={height}]
    • [Rectangle Position=[Point X=1 Y=3] Size=[Size Width=10 Height=30]]
  • How does that work for recursive or generic objects (e.g. TreeNode<Rectangle>)?
  • Can variable sized structures (e.g. List) request a number of bytes prior to filling? What if the parent object does not know the size of a child object it wishes to serialize?
  • Is ReadOnlySpan cacheable?

Another concern: I expected a StringBuilder / TextWriter-ish abstraction instead of a Span, which I perceive as a high-perf abstraction that might be less accessible. Perhaps an alternative to ISpanFormattable could leverage the InterpolatedStringBuilder APIs?

@stephentoub
Copy link
Member Author

stephentoub commented Apr 1, 2021

How does that work for nested objects (e.g. Rectangle -> Point -> Int)?

Types can implement ISpanFormattable just as they can implement IFormattable today. If they do, their TryFormat will be used. If they don't, their ToString will be used.

How does that work for recursive or generic objects (e.g. TreeNode)?

Same as it does today. TryFormat can call to nested type TryFormats, just as is possible for ToString.

Can variable sized structures (e.g. List) request a number of bytes prior to filling?

No

What if the parent object does not know the size of a child object it wishes to serialize?

TryFormat will return false, and the code providing the buffer will resize (typically double) and try again.

Is ReadOnlySpan cacheable?

No. Span is just a view over something else. The builder will get underlying space into which to format from somewhere that's an implementation detail. A benefit of span as a ref struct is actually that you can't cache it.

@AraHaan
Copy link
Member

AraHaan commented Apr 1, 2021

I think I would like it to use a normal StringBuilder from System.Text vs making a new class just for interpolated strings, where basically it would append each and every component, and anything that is not a string implies a conversion to a string by automatically calling the object's ToString() member (likehow interpolated strings do it automatically currently already).

So then in this case this would for example:

string name = "Stephen";
int year = 2021;
string result = $"Hello, {name}.  The year is {year}.";

be compiled as:

var builder = new StringBuilder();
// append a list of string's. I could see this as ok for performance as long as this generated code
// is in it's own scope or dummy method so it gets freed (GC'd) as soon as possible and the
// result returned from turning the builder to a normal string. Also ToString() only added if the
// type of the input is not a string, or is another StringBuilder.
_ = builder.Append(new string[] { "Hello, ", name, "  The year is ", year.ToString(), "."}); 
string result = builder.ToString();

Making each time an interpolated string is used a dummy method with the above code is generated by the compiler for each interpolated string and the variables needed passed as arguments to the dummy method and as such could be seen as not only an improvement but possibly performant as well too.

However I am not sure if StringBuilder could append each item from an array of strings into it currently however it could be faster to get done by reusing already existing logic for strings.

But even this I think it could be simplified even further by the compiler with:

// construct the StringBuilder with the array of strings.
var builder = new StringBuilder()
{
"Hello, ", name, "  The year is ", year.ToString(), ".",
};
string result = builder.ToString();

@miyu
Copy link

miyu commented Apr 1, 2021

TryFormat will return false, and the code providing the buffer will resize (typically double) and try again.

That makes sense. I wonder if the bool TryFormat(Span<char>, out int, ReadOnlySpan<char>, IFormatProvider?); API is great for simple types in Corelib (e.g. floats, DateTime) where redundant reserialization isn't a problem, but less awesome for more complex objects where logarithm-of-output-size redundant computation can be an unacceptable perf hit. A StringBuilder-style API would 1. avoid reserialization 2. likely be accessible to more .NET developers 3. in my mind is more easily composable 4. gives way for a "ReserveAdditionalBytes()..." API which avoids redundantly resizing the underlying buffer.

With a StringBuilder-like API I'd expect stringifying a rectangle to look like:

public void Serialize(ISomeWriterObject writer, string format) {
  writer.Visit("[Rectangle Position=");
  writer.Visit(position); // recursively invokes Point.Serialize(ISomeWriterObject) if possible, else Point.ToString()
  writer.Visit(" Size=");
  writer.Visit(size); // likewise as above
  writer.Visit("]");
}

With TryFormat what does this look like? Here's my best guess:

bool TryFormat(Span<char> destination, out int charsWritten, ReadOnlySpan<char> format, IFormatProvider? provider) {
  int totalCharsWritten = 0;
  int partCharsWritten;

  if (!"[Rectangle Position=".TryFormat(destination, out partCharsWritten)) return false;
  if (!TryBumpDestinationByOffsetAndAddCharsWritten(ref destination, partCharsWritten, ref totalCharsWritten)) return false;
  if (!position.TryFormat(destination, out partCharsWritten)) return false;
  if (!TryBumpDestinationByOffsetAndAddCharsWritten(ref destination, partCharsWritten, ref totalCharsWritten)) return false;
  ...
  return true;
}

// Does everyone have to write this themselves? Or perhaps there's an API I'm not familiar with?
bool TryBumpDestinationByOffsetAndAddCharsWritten(ref Span<char> span, int partCharsWrittenOffset, ref int totalCharsWritten) {
  // bump span to start [offset] indices earlier & have [offset] less length
  // return true if possible, false if span not large enough
  // also add partCharsWrittenOffset to totalCharsWritten
}

(Thank you for spearheading this API proposal - I've been wanting this in C# for so long!)

@miyu
Copy link

miyu commented Apr 1, 2021

I think it'd drastically simplify this API's end-user experience to support and optimize using string interpolations in TryFormat. (Edit: That's mentioned in "TryFormat... on Span receivers" in the spec)

At that point, I wonder if introducing a separate ToString concept to developers is even necessary. Could users just specify ToString() => $"Some {Interpolation}" and have the compiler magically generate an implementation of ISpanFormattable? I suspect this could also improve the performance of existing nested ToString() invocations (ToString() could invoke ISpanFormattable to allocate just 1 string for an entire object tree)

(As a side-note that probably won't fly, I think one could avoid introducing the ISpanFormattable API altogether. Example: Autogenerate ToString() such that nested invocations return a dummy value & actually write to a threadstatic writer object. Old ToString implementations allocate and return strings. New ToString implementations don't allocate memory; they write to the threadstatic writer)

@stephentoub
Copy link
Member Author

stephentoub commented Apr 2, 2021

I think it'd drastically simplify this API's end-user experience to support and optimize using string interpolations in TryFormat. (Edit: That's mentioned in "TryFormat... on Span receivers" in the spec)

This is the "Enabling the developer to use string interpolation syntax to format into an existing span (replacing current use of manual of slicing and TryFormat calls)" I call out at the end of the proposal, where I say I'll be writing it up separately. It depends on support that hasn't yet been approved for C# 10 but is very likely to be, so I kept it separate. If the lack of it is causing confusion, I can post the issue, regardless. But, for example, if you had a Point type, its TryFormat could look like:

struct Point
{
    public int X, Y;
    public bool TryFormat(Span<char> destination, out in charsWritten, ReadOnlySpan<char> format, IFormatProvider? provider) =>
        destination.TryWrite(provider, $"X={X} Y={Y}", out charsWritten);
}

(As a side-note that probably won't fly, I think one could avoid introducing the ISpanFormattable API altogether. Example: Autogenerate ToString() such that nested invocations return a dummy value & actually write to a threadstatic writer object. Old ToString implementations allocate and return strings. New ToString implementations don't allocate memory; they write to the threadstatic writer)

I don't think so 😄

With TryFormat what does this look like? Here's my best guess:

destination.TryWrite($"[Rectangle Position={position} Size={size}]", out charsWritten);

I think I would like it to use a normal StringBuilder from System.Text vs making a new class just for interpolated strings

You can absolutely write such a builder using the C# 10 support; it's one of the really nice things about the design, it lets you use the interpolation syntax but implement it in various ways just by supporting the right pattern. For the built-in support, however, we are optimizing for simple cases, e.g. primitives and POCOs built up from primitives. Even string.Format assumes that the arguments won't be very long: it presizes assuming an average length of 8 characters per hole. We want to make these cases fast and with as little allocation as possible, as well as not having to introduce new pools of types to avoid such allocations.

But even this I think it could be simplified even further by the compiler with:

Again, this is where additional builders come in, which will be covered by a subsequent proposal. Assuming we add the relevant builder / corresponding method, you would be able to write:

StringBuilder sb = ...;
sb.AppendFormat($"{a} = {b}");

and that would be translated down into individual calls to append a, " = ", and b to the StringBuilder.

@stephentoub
Copy link
Member Author

I call out at the end of the proposal, where I say I'll be writing it up separately. It depends on support that hasn't yet been approved for C# 10 but is very likely to be, so I kept it separate. If the lack of it is causing confusion, I can post the issue, regardless.

Part 2 is here: #50635

@Tornhoof
Copy link
Contributor

Tornhoof commented Apr 2, 2021

Maybe add to your original introduction, that it should work with async methods too, if the holes are not async.
https://github.com/dotnet/csharplang/blob/main/proposals/improved-interpolated-strings.md#await-usage-in-interpolation-holes

@ladeak
Copy link
Contributor

ladeak commented Apr 2, 2021

InterpolatedStringBuilder stores an ArrayPool array.

Is it static ArrayPool<char> or static ArrayPool<char>[] ?

@stephentoub
Copy link
Member Author

Maybe add to your original introduction, that it should work with async methods too, if the holes are not async.

For strings with this builder it should really work even when await is used in holes. The compiler can special case it, with all holes precomputed just as when targeting string.Format.

@stephentoub
Copy link
Member Author

Is it static ArrayPool or static ArrayPool[] ?

It stores a char[] it rents from ArrayPool<T>.Shared.

@333fred
Copy link
Member

333fred commented Apr 2, 2021

Maybe add to your original introduction, that it should work with async methods too, if the holes are not async.

For strings with this builder it should really work even when await is used in holes. The compiler can special case it, with all holes precomputed just as when targeting string.Format.

Uh... That was not the plan as I understood it. I suppose we could do this, but the plan was to have very little implementation differences between this builder and the general pattern.

@stephentoub
Copy link
Member Author

stephentoub commented Apr 2, 2021

Uh... That was not the plan as I understood it. I suppose we could do this, but the plan was to have very little implementation differences between this builder and the general pattern

There's zero preventing the compiler from doing so. Whether you do is an implementation decision, and one that can change over time. The holes are already precomputed for use with string.Format; no reason it can't be the same here.

@Clockwork-Muse
Copy link
Contributor

.... what happens if the type needs more space than remains in the Span? Since I'm assuming we're eventually heading towards making this user-implementable...

Which span?

I meant the writable span, so this:

What if the parent object does not know the size of a child object it wishes to serialize?

TryFormat will return false, and the code providing the buffer will resize (typically double) and try again.

... answers my question (and really was about what I expected).

Rust, when it does its equivalent of ToString() provides a type that's a mutable buffer, meaning that it doesn't have to back out and try again. Effectively, they pass StringBuilder (although lifetime guarantees mean you can't keep the reference).

@333fred
Copy link
Member

333fred commented Apr 2, 2021

@stephentoub we may want to add a set of object overloads to this interpolated builder type, or target-typed scenarios aren't going to be able to use the builder. As an example of a problem with that today:

bool b = false;
C.M(b switch { true => 1, false => null }); // Chooses M(object)
C.M(1); // Chooses M<T>
C.M(""); // Chooses M<T>

class C
{
    public static void M<T>(T t) {}
    public static void M(object o) {} // Comment this out, and the first call will fail to compile
}

This uses a different set of methods to show the problem in C# 9 code, but just imagine any of those expressions as the interpolation hole elements. More insidiously, though, the first one would just cause the interpolated string builder to fall back to string.Format silently, with no user indication.

@333fred
Copy link
Member

333fred commented Apr 4, 2021

it doesn't because we have the object overload, in which case adding the overload now is a deoptimization in the future.

There's nothing we can do to avoid this. Interpolated strings today have this behavior because there is already an object target type. The only way for us to replicate the existing behavior is by adding one to the builder type.

@stephentoub
Copy link
Member Author

stephentoub commented Apr 4, 2021

There's nothing we can do to avoid this.

Of course there is. That's what this discussion is about. You've enumerated multiple ways it could work, you just don't like them ;-)

A key goal of this feature is to avoid the boxing. If we can't satisfy that, I'm tempted to say we should just let such cases fall back to string.Format until the language can do better.

Unless you're saying the language will never get better here, in which case i don't understand what you were saying about leaving room for the language to improve being a reason not to do something special here.

@333fred
Copy link
Member

333fred commented Apr 4, 2021

Of course there is. That's what this discussion is about. You've enumerated multiple ways it could work, you just don't like them ;-)

No, I've enumerated multiple ways we could avoid adding an object overload to this particular type by effectively making the language assume that one exists. There's no getting around the basic fact that C# 6 shipped interpolated strings that were a sugar over string.Format, which provides the object target type. If we want InterpolatedStringBuilder to be a drop-in replacement for the pattern, we have to maintain that.

A key goal of this feature is to avoid the boxing. If we can't satisfy that, I'm tempted to say we should just let such cases fall back to string.Format until the language can do better.

I think you're over-focusing on the one example I provided. There are many ways to get into a scenario that only compiles because there is an object target type, and large number of them are just reference types. I don't think we should arbitrarily throw out the savings for those scenarios. The language is almost certainly never going to be able to handle all of them.

@stephentoub
Copy link
Member Author

I think you're over-focusing on the one example I provided. There are many ways to get into a scenario that only compiles because there is an object target type, and large number of them are just reference types. I don't think we should arbitrarily throw out the savings for those scenarios. The language is almost certainly never going to be able to handle all of them.

You've only shared one example. Please share more.

It is also in no way arbitrary. I'm concerned that if we expose an object overload now, we will actually harm our ability to optimize in the future. I can fully imagine a world in these deficient situations where the language improves its target typing in the case where the target is T, but for compatibility says if there's an object target in another overload as well it has to prefer that. I want to avoid that, potentially at the expense of an optimization today in order to get better performance tomorrow. Are you saying that's never going to happen and that exposing an object based overload will never in the future negatively impact that?

@333fred
Copy link
Member

333fred commented Apr 4, 2021

You've only shared one example. Please share more.

For example, $"{boolThing ? derivedLeafNode1 : derivedLeafNode2}", where those nodes are not convertible to each other, but are convertible to a base type.

Are you saying that's never going to happen and that exposing an object based overload will never in the future negatively impact that?

No, I am not saying that. I am saying that, especially as we've added more target-typing in C# 9, that I consider it more detrimental to not introduce this overload now than would potentially be saved.

@stephentoub
Copy link
Member Author

I consider it more detrimental to not introduce this overload now than would potentially be saved.

How so?

@stephentoub
Copy link
Member Author

For example

Are all the problematic cases involving value types for nullable, or are there others? If they're all nullable, is there a signature we could expose that would target type all those cases well? I tried T? where T:struct, and it still fails to target type appropriately.

@333fred
Copy link
Member

333fred commented Apr 6, 2021

Are all the problematic cases involving value types for nullable, or are there others?

Nothing about my example specified nullable types. They could very easily be reference types.

If they're all nullable, is there a signature we could expose that would target type all those cases well? I tried T? where T:struct, and it still fails to target type appropriately.

I don't believe there's a signature we could expose that would work for nullable types specifically.

I consider it more detrimental to not introduce this overload now than would potentially be saved.

How so?

I'm mainly concerned about how people will start going "Oh, for best perf make sure that you're not using target-typing here or you're going to be implicitly using the wrong pattern."

@stephentoub
Copy link
Member Author

stephentoub commented Apr 6, 2021

Nothing about my example specified nullable types. They could very easily be reference types.

Huh? I think we're talking past each other. You specifically gave the example:

b switch { true => 1, false => null }

which can be successfully assigned to int?:

int? i = b switch { true => 1, false => null };

and I'm asking about whether the examples involving value types (which this one does) are all around nullables, i.e. Nullable<T>. I'm not asking about reference types because I care less about reference types here, since there's no boxing concern.

I'm mainly concerned about how people will start going "Oh, for best perf make sure that you're not using target-typing here or you're going to be implicitly using the wrong pattern."

That's going to be the case either way, if an example like b switch { true => 1, false => null } boxes when it doesn't have to, e.g. a dev can insert a cast to (int?) to get better perf.

@iSazonov
Copy link
Contributor

iSazonov commented Apr 6, 2021

  1. Dispose. Should InterpolatedStringBuilder implement Dispose?

This Try* pattern assumes no exceptions. Their presence could be considered a bug. (But if we cannot guarantee this, then we could have the parameter like out Exception exc. I only see this for one scenario with void* but it is not considered yet.)

@iSazonov
Copy link
Contributor

iSazonov commented Apr 6, 2021

  1. Pattern method names. The "TryFormatBaseString" and "TryFormatInterpolationHole" names are aspects of the design we can help C# to decide on.

InterpolatedStringBuilder name already contains "Interpolated" and "String" so the method names could be shorter.

  1. Pointers.

This looks like an rare edge case.

@stephentoub
Copy link
Member Author

This Try* pattern assumes no exceptions.

No, it doesn't. Try is just about using a bool for one condition, it doesn't mean there won't be exceptions.

InterpolatedStringBuilder name already contains "Interpolated" and "String" so the method names could be shorter.

We need to decide on the names that are part of the pattern, for all builders, not just this one.

@olivier-spinelli
Copy link

Instead of returning a bool, could the TryXXX returns an int, either:

  • 0 (I'm done - the current true),
  • -1 (I need more space, please double it - the current false) or
  • the positive exact number of missing characters I know is required?

It may help for big output (but the API is more complex to use).

@stephentoub
Copy link
Member Author

Instead of returning a bool, could the TryXXX returns an int

Are you asking about the ISpanFormattable interface or the methods on the builder?

For the builder the optional bool return value is about short-circuiting and whether the operation should continue at all. So int wouldn't really help there.

For ISpanFormattable, this is formalizing as an interface the TryFormat methods we've been exposing publicly already on various types. Can you share a real example where you'd be formatting something very large such that a typical grow-and-retry would be problematic? I understand the concern in theory, and it is something we debated heavily back in the .NET Core 2.1 time-frame. The most benefit would come if doubling was insufficient to meet the required size and a lot of other work would be required before getting back to that point again, but it would also add complexity to every operation and all of the consumers, to handle all the various outcomes, when most output is actually relatively small. It also couldn't necessity be trusted, at least not without forcing more work. Consider a composed type like a Person storing a string name and a double age. Its TryFormat sees that the destination buffer is insufficient to store the name, but it doesn't yet know how many chars the double will need... does it need to format the double into temp space just to get that count? If it does, then it's doing extra throw away work. If it doesn't, then the return count can't be trusted to actually be sufficient. So the return value can only end up being a hint rather than a guarantee.

@olivier-spinelli
Copy link

Are you asking about the ISpanFormattable interface or the methods on the builder?

My bad. The ISpanFormattable.

So the return value can only end up being a hint rather than a guarantee.

In my understanding it would always be a hint since, if it's not enough then another run will be done. I can also always maximize my estimated size (say a double is 32 chars). In either cases, there will be less runs, less intermediate allocations.
(I totally understand the higher complexity for the developper. But since we are here at a "low level", this MAY be acceptable.)

@stephentoub
Copy link
Member Author

stephentoub commented Apr 8, 2021

In my understanding it would always be a hint

Ah, ok, you'd written "the positive exact number of missing characters I know is required", hence my stressing the hint rather than guarantee.

I can also always maximize my estimated size (say a double is 32 chars)

Some consumers will be of fixed maximum length, e.g. if you were formatting into an existing span (rather than into something growable). It'd be one thing to return a value that says "I know I will need at least X to be successful", as such consumers can reasonably deal with that. But it's not clear what such a consumer could meaningfully do with something that might only require 1 char saying it would need 32. It could no longer make early-out decisions based on that, at which point you could very likely get into an infinite loop without making the pattern even more complicated, e.g.

  • I call into Person.TryFormat with a fixed size destination of length 10
  • The Person contains a 5 character name and a double, so TryFormat returns 37 (5 + 32)
  • Either the consumer fails, even though it could have succeeded (if the double actually did fit in 5 chars), or the consumer calls back in again, and we loop indefinitely.

In either cases, there will be less runs, less intermediate allocations

There generally shouldn't be intermediate allocations, at least not in the common case: consumers that grow (e.g. InterpolatedStringBuilder) will generally use some kind of pool for the arrays, e.g. string.Format today already uses ArrayPool. You will pay the cost of renting/returning arrays and repeating the formatting again up until the last point of failure.

Can you share a concrete example of a type you might format into a hole where you expect this will be helpful? Again, I understand it in theory, e.g. a type that wraps a string which could be long such that the type could pass out the length of the string. But I'd like to understand how it would actually play out. Imagine, for example, that InterpolatedStringBuilder always started with at least space for 256 characters (which string.Format does today). That means every doubling growth will be at least 256 as well, which means that for this to be meaningful you'd need to routinely be formatting such string wrappers for strings longer than 256. Is that very common? At that point, if you're using this with normal string interpolation, you're also allocating long strings for every overall interpolation operation.

@olivier-spinelli
Copy link

You are totally right on every aspects. This was just an idea that brings more complexity (and API ambiguities) than overall benefits. Thank you for having take the time to evict it!

@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Apr 11, 2021
@IDisposable
Copy link
Contributor

Tiny nit: 3. non-trivial overheads to every string.Format call. Every value type that fills in a whole ends up getting boxed. s/whole/hole

@bartonjs
Copy link
Member

bartonjs commented Apr 13, 2021

Video

API Review notes:

  • Why do we need the InterpolationHole version for string? (To specialize out of T/object and to avoid future "universal formatting")
  • InterpolatedStringBuilder is a ref struct. Is that important?
    • What's the behavior in async?
    • Answers were "yes", and "we'll make it work".
  • Do/can we do anything special for T : IFormattable or ISpanFormattable?
    • Seems like when we want it (value types) it's already handled by the JIT
  • TryFormatBaseString => AppendLiteral
  • TryFormatInterpolationHole => AppendFormatted
  • "Should we add support for void* or T*?" We're happy with "no".
  • Do we need InterpolatedStringBuilder to be IDisposable? We're happy with "no".
  • We decided ToString() shouldn't be destructive, so added ToStringAndClear() (which the compiler will call).
  • InterpolatedStringBuilder.Create, rename parameters to literalLength and formattedCount (to match the method renames)
namespace System
{
    public interface ISpanFormattable : IFormattable // currently internal
    {
        bool TryFormat(Span<char> destination, out int charsWritten, ReadOnlySpan<char> format, IFormatProvider? provider);
    }
}

namespace System.Runtime.CompilerServices
{
    public ref struct InterpolatedStringBuilder
    {
        // Create the builder and extract its string / clean up
        public static InterpolatedStringBuilder Create(int literalLength, int formattedCount);
        public string ToStringAndClear();

        // To handle the base string portions of the interpolated string
        public void AppendLiteral(string value);

        // To handle most types, full set of overloads to minimize IL bloat at call sites
        public void AppendFormatted<T>(T value);
        public void AppendFormatted<T>(T value, string? format);
        public void AppendFormatted<T>(T value, int alignment);
        public void AppendFormatted<T>(T value, int alignment, string? format);

        // To allow for ROS to be in holes
        public void AppendFormatted(ReadOnlySpan<char> value);
        public void AppendFormatted(ReadOnlySpan<char> value, int alignment = 0, string? format = null);

        // To handle strings, because they're very common and we can optimize for them specially
        public void AppendFormatted(string? value);
        public void AppendFormatted(string? value, int alignment = 0, string? format = null);

        // Fallback for everything that can't be target typed
        public void AppendFormatted(object? value, int alignment = 0, string? format = null);
    }
}

@bartonjs bartonjs added api-approved API was approved in API review, it can be implemented and removed api-ready-for-review API is ready for review, it is NOT ready for implementation labels Apr 13, 2021
@stephentoub stephentoub removed the blocking Marks issues that we want to fast track in order to unblock other important work label Apr 13, 2021
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Apr 15, 2021
@ghost ghost locked as resolved and limited conversation to collaborators May 15, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-approved API was approved in API review, it can be implemented area-System.Runtime
Projects
None yet
Development

Successfully merging a pull request may close this issue.