-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API Proposal: C# 10 interpolated strings support, part 1 #50601
Comments
@stephentoub In your sample, you had a compiler-generated call to Also, is there an equivalent to |
Copy/paste error. Fixed.
There will be, just not in this proposal. That'll be part of the follow-on mentioned at the end with "Enabling the developer to provide an IFormatProvider when creating strings / formatting into spans". But the quick summary is you'll be able to write: string result = string.Format(CultureInfo.InvariantCulture, $"{a} = {b}"); and it'll "just work". I wanted to split it into two parts: 1) things necessary to support the existing syntax when building strings, and 2) new dev-consumable surface are for customizing how such strings are built and building other things. |
.... what happens if the type needs more space than remains in the |
Which span? |
In your example, i think the baseLength should be 24, you forgot the space after |
I was missing the space in my hand-generation of the compiler output, but I think 23 is correct. |
Can we include example implementations of TryFormat in this spec?
Another concern: I expected a StringBuilder / TextWriter-ish abstraction instead of a Span, which I perceive as a high-perf abstraction that might be less accessible. Perhaps an alternative to ISpanFormattable could leverage the InterpolatedStringBuilder APIs? |
Types can implement ISpanFormattable just as they can implement IFormattable today. If they do, their TryFormat will be used. If they don't, their ToString will be used.
Same as it does today. TryFormat can call to nested type TryFormats, just as is possible for ToString.
No
TryFormat will return false, and the code providing the buffer will resize (typically double) and try again.
No. Span is just a view over something else. The builder will get underlying space into which to format from somewhere that's an implementation detail. A benefit of span as a ref struct is actually that you can't cache it. |
I think I would like it to use a normal So then in this case this would for example: string name = "Stephen";
int year = 2021;
string result = $"Hello, {name}. The year is {year}."; be compiled as: var builder = new StringBuilder();
// append a list of string's. I could see this as ok for performance as long as this generated code
// is in it's own scope or dummy method so it gets freed (GC'd) as soon as possible and the
// result returned from turning the builder to a normal string. Also ToString() only added if the
// type of the input is not a string, or is another StringBuilder.
_ = builder.Append(new string[] { "Hello, ", name, " The year is ", year.ToString(), "."});
string result = builder.ToString(); Making each time an interpolated string is used a dummy method with the above code is generated by the compiler for each interpolated string and the variables needed passed as arguments to the dummy method and as such could be seen as not only an improvement but possibly performant as well too. However I am not sure if StringBuilder could append each item from an array of strings into it currently however it could be faster to get done by reusing already existing logic for strings. But even this I think it could be simplified even further by the compiler with: // construct the StringBuilder with the array of strings.
var builder = new StringBuilder()
{
"Hello, ", name, " The year is ", year.ToString(), ".",
};
string result = builder.ToString(); |
That makes sense. I wonder if the With a StringBuilder-like API I'd expect stringifying a rectangle to look like:
With TryFormat what does this look like? Here's my best guess:
(Thank you for spearheading this API proposal - I've been wanting this in C# for so long!) |
I think it'd drastically simplify this API's end-user experience to support and optimize using string interpolations in TryFormat. (Edit: That's mentioned in "TryFormat... on Span receivers" in the spec) At that point, I wonder if introducing a separate ToString concept to developers is even necessary. Could users just specify (As a side-note that probably won't fly, I think one could avoid introducing the ISpanFormattable API altogether. Example: Autogenerate ToString() such that nested invocations return a dummy value & actually write to a threadstatic writer object. Old ToString implementations allocate and return strings. New ToString implementations don't allocate memory; they write to the threadstatic writer) |
This is the "Enabling the developer to use string interpolation syntax to format into an existing span (replacing current use of manual of slicing and TryFormat calls)" I call out at the end of the proposal, where I say I'll be writing it up separately. It depends on support that hasn't yet been approved for C# 10 but is very likely to be, so I kept it separate. If the lack of it is causing confusion, I can post the issue, regardless. But, for example, if you had a Point type, its TryFormat could look like: struct Point
{
public int X, Y;
public bool TryFormat(Span<char> destination, out in charsWritten, ReadOnlySpan<char> format, IFormatProvider? provider) =>
destination.TryWrite(provider, $"X={X} Y={Y}", out charsWritten);
}
I don't think so 😄
You can absolutely write such a builder using the C# 10 support; it's one of the really nice things about the design, it lets you use the interpolation syntax but implement it in various ways just by supporting the right pattern. For the built-in support, however, we are optimizing for simple cases, e.g. primitives and POCOs built up from primitives. Even string.Format assumes that the arguments won't be very long: it presizes assuming an average length of 8 characters per hole. We want to make these cases fast and with as little allocation as possible, as well as not having to introduce new pools of types to avoid such allocations.
Again, this is where additional builders come in, which will be covered by a subsequent proposal. Assuming we add the relevant builder / corresponding method, you would be able to write: StringBuilder sb = ...;
sb.AppendFormat($"{a} = {b}"); and that would be translated down into individual calls to append |
Part 2 is here: #50635 |
Maybe add to your original introduction, that it should work with async methods too, if the holes are not async. |
Is it static |
For strings with this builder it should really work even when await is used in holes. The compiler can special case it, with all holes precomputed just as when targeting string.Format. |
It stores a |
Uh... That was not the plan as I understood it. I suppose we could do this, but the plan was to have very little implementation differences between this builder and the general pattern. |
There's zero preventing the compiler from doing so. Whether you do is an implementation decision, and one that can change over time. The holes are already precomputed for use with string.Format; no reason it can't be the same here. |
I meant the writable span, so this:
... answers my question (and really was about what I expected). Rust, when it does its equivalent of |
@stephentoub we may want to add a set of bool b = false;
C.M(b switch { true => 1, false => null }); // Chooses M(object)
C.M(1); // Chooses M<T>
C.M(""); // Chooses M<T>
class C
{
public static void M<T>(T t) {}
public static void M(object o) {} // Comment this out, and the first call will fail to compile
} This uses a different set of methods to show the problem in C# 9 code, but just imagine any of those expressions as the interpolation hole elements. More insidiously, though, the first one would just cause the interpolated string builder to fall back to |
There's nothing we can do to avoid this. Interpolated strings today have this behavior because there is already an |
Of course there is. That's what this discussion is about. You've enumerated multiple ways it could work, you just don't like them ;-) A key goal of this feature is to avoid the boxing. If we can't satisfy that, I'm tempted to say we should just let such cases fall back to string.Format until the language can do better. Unless you're saying the language will never get better here, in which case i don't understand what you were saying about leaving room for the language to improve being a reason not to do something special here. |
No, I've enumerated multiple ways we could avoid adding an
I think you're over-focusing on the one example I provided. There are many ways to get into a scenario that only compiles because there is an |
You've only shared one example. Please share more. It is also in no way arbitrary. I'm concerned that if we expose an object overload now, we will actually harm our ability to optimize in the future. I can fully imagine a world in these deficient situations where the language improves its target typing in the case where the target is T, but for compatibility says if there's an object target in another overload as well it has to prefer that. I want to avoid that, potentially at the expense of an optimization today in order to get better performance tomorrow. Are you saying that's never going to happen and that exposing an object based overload will never in the future negatively impact that? |
For example,
No, I am not saying that. I am saying that, especially as we've added more target-typing in C# 9, that I consider it more detrimental to not introduce this overload now than would potentially be saved. |
How so? |
Are all the problematic cases involving value types for nullable, or are there others? If they're all nullable, is there a signature we could expose that would target type all those cases well? I tried T? where T:struct, and it still fails to target type appropriately. |
Nothing about my example specified nullable types. They could very easily be reference types.
I don't believe there's a signature we could expose that would work for nullable types specifically.
I'm mainly concerned about how people will start going "Oh, for best perf make sure that you're not using target-typing here or you're going to be implicitly using the wrong pattern." |
Huh? I think we're talking past each other. You specifically gave the example: b switch { true => 1, false => null } which can be successfully assigned to int? i = b switch { true => 1, false => null }; and I'm asking about whether the examples involving value types (which this one does) are all around nullables, i.e.
That's going to be the case either way, if an example like |
This |
This looks like an rare edge case. |
No, it doesn't. Try is just about using a bool for one condition, it doesn't mean there won't be exceptions.
We need to decide on the names that are part of the pattern, for all builders, not just this one. |
Instead of returning a bool, could the TryXXX returns an int, either:
It may help for big output (but the API is more complex to use). |
Are you asking about the ISpanFormattable interface or the methods on the builder? For the builder the optional bool return value is about short-circuiting and whether the operation should continue at all. So int wouldn't really help there. For ISpanFormattable, this is formalizing as an interface the TryFormat methods we've been exposing publicly already on various types. Can you share a real example where you'd be formatting something very large such that a typical grow-and-retry would be problematic? I understand the concern in theory, and it is something we debated heavily back in the .NET Core 2.1 time-frame. The most benefit would come if doubling was insufficient to meet the required size and a lot of other work would be required before getting back to that point again, but it would also add complexity to every operation and all of the consumers, to handle all the various outcomes, when most output is actually relatively small. It also couldn't necessity be trusted, at least not without forcing more work. Consider a composed type like a Person storing a string name and a double age. Its TryFormat sees that the destination buffer is insufficient to store the name, but it doesn't yet know how many chars the double will need... does it need to format the double into temp space just to get that count? If it does, then it's doing extra throw away work. If it doesn't, then the return count can't be trusted to actually be sufficient. So the return value can only end up being a hint rather than a guarantee. |
My bad. The ISpanFormattable.
In my understanding it would always be a hint since, if it's not enough then another run will be done. I can also always maximize my estimated size (say a double is 32 chars). In either cases, there will be less runs, less intermediate allocations. |
Ah, ok, you'd written "the positive exact number of missing characters I know is required", hence my stressing the hint rather than guarantee.
Some consumers will be of fixed maximum length, e.g. if you were formatting into an existing span (rather than into something growable). It'd be one thing to return a value that says "I know I will need at least X to be successful", as such consumers can reasonably deal with that. But it's not clear what such a consumer could meaningfully do with something that might only require 1 char saying it would need 32. It could no longer make early-out decisions based on that, at which point you could very likely get into an infinite loop without making the pattern even more complicated, e.g.
There generally shouldn't be intermediate allocations, at least not in the common case: consumers that grow (e.g. InterpolatedStringBuilder) will generally use some kind of pool for the arrays, e.g. string.Format today already uses ArrayPool. You will pay the cost of renting/returning arrays and repeating the formatting again up until the last point of failure. Can you share a concrete example of a type you might format into a hole where you expect this will be helpful? Again, I understand it in theory, e.g. a type that wraps a string which could be long such that the type could pass out the length of the string. But I'd like to understand how it would actually play out. Imagine, for example, that InterpolatedStringBuilder always started with at least space for 256 characters (which string.Format does today). That means every doubling growth will be at least 256 as well, which means that for this to be meaningful you'd need to routinely be formatting such string wrappers for strings longer than 256. Is that very common? At that point, if you're using this with normal string interpolation, you're also allocating long strings for every overall interpolation operation. |
You are totally right on every aspects. This was just an idea that brings more complexity (and API ambiguities) than overall benefits. Thank you for having take the time to evict it! |
Tiny nit: |
API Review notes:
namespace System
{
public interface ISpanFormattable : IFormattable // currently internal
{
bool TryFormat(Span<char> destination, out int charsWritten, ReadOnlySpan<char> format, IFormatProvider? provider);
}
}
namespace System.Runtime.CompilerServices
{
public ref struct InterpolatedStringBuilder
{
// Create the builder and extract its string / clean up
public static InterpolatedStringBuilder Create(int literalLength, int formattedCount);
public string ToStringAndClear();
// To handle the base string portions of the interpolated string
public void AppendLiteral(string value);
// To handle most types, full set of overloads to minimize IL bloat at call sites
public void AppendFormatted<T>(T value);
public void AppendFormatted<T>(T value, string? format);
public void AppendFormatted<T>(T value, int alignment);
public void AppendFormatted<T>(T value, int alignment, string? format);
// To allow for ROS to be in holes
public void AppendFormatted(ReadOnlySpan<char> value);
public void AppendFormatted(ReadOnlySpan<char> value, int alignment = 0, string? format = null);
// To handle strings, because they're very common and we can optimize for them specially
public void AppendFormatted(string? value);
public void AppendFormatted(string? value, int alignment = 0, string? format = null);
// Fallback for everything that can't be target typed
public void AppendFormatted(object? value, int alignment = 0, string? format = null);
}
} |
Background and Motivation
C# 6 added support for interpolated strings, enabling code to easily format strings by putting arbitrary C# into the format "holes" in the string. These holes are then evaluated prior to having their result added into the string, e.g.
The C# compiler has multiple strategies for generating the code behind such interpolation. If all of the holes are filled with string constants, e.g.
it can choose to simply emit a const string:
Or, it might choose to use string.Concat, which it will typically do if all of the components are strings, e.g. converting:
into something like:
Or, as the most general support, it may choose to use string.Format, e.g.
becomes:
There are various interesting deficiencies to notice here, both from a functionality and from a performance perspective:
ReadOnlySpan<char>
s, which is desirable if you want to, for example, slice a string.For C# 10, all of these issues are being addressed with a new pattern-based mechanism for implementing interpolated strings. The pattern involves a builder that exposes methods that can be called to append the individual components, which enables the builder to expose whatever overloads are necessary to efficiently implement and support whatever types are desired. While the compiler still retains the ability to choose which mechanism to use on a case-by-case basis, the original example of:
can now be compiled as:
https://github.com/dotnet/csharplang/blob/main/proposals/improved-interpolated-strings.md#improved-interpolated-strings
Proposed API
This proposal covers the API surface area necessary to support targeting strings. A separate proposal will follow involving additional user-exposed API surface area for more advanced use of this functionality.
The builder must follow a general pattern established by the compiler. This pattern is still evolving, so we may need to tweak this slightly until the C# 10 support is fully baked. I would like to review/approve the general support and then just tweak it as the compiler’s needs dictate (e.g. there’s a discussion about whether Append methods must return bool instead of allowing void as I’ve done here, whether builders can be passed by ref, etc.)
We already implement ISpanFormattable on a bunch of types in Corelib: Byte, DateTime, DateTimeOffset, Decimal, Double, Guid, Half, Int16, Int32, Int64, IntPtr, SByte, Single, TimeSpan, UInt16, UInt32, UInt64, UIntPtr, Version. With the interface exposed publicly, we'll implement it in a few other places, at least the types that already expose the right Append method: BigInteger and IPAddress.
Note that InterpolatedStringBuilder stores an ArrayPool array. We've been hesitant to expose such structs historically when they're meant to be general purpose. Here, though, it's a compiler helper type in System.Runtime.CompilerServices.
Issues for Discussion
Additional Builder Support
Part 2 covers additional builders for additional scenarios:
#50635
These are dependent on additional language support that’s not yet been committed to, but hopefully will be soon.
cc: @333fred, @jaredpar
The text was updated successfully, but these errors were encountered: