Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Proposal]: Raw string literal #4304

Closed
4 tasks done
Tracked by #9 ...
CyrusNajmabadi opened this issue Jan 6, 2021 · 329 comments
Closed
4 tasks done
Tracked by #9 ...

[Proposal]: Raw string literal #4304

CyrusNajmabadi opened this issue Jan 6, 2021 · 329 comments
Assignees
Labels
Implemented Needs ECMA Spec This feature has been implemented in C#, but still needs to be merged into the ECMA specification Proposal champion Proposal
Milestone

Comments

@CyrusNajmabadi
Copy link
Member

CyrusNajmabadi commented Jan 6, 2021

Raw string literal

Summary

Allow a new form of string literal that starts with a minimum of three """ characters (but no maximum), optionally followed by a new_line, the content of the string, and then ends with the same number of quotes that the literal started with. For example:

var xml = """
          <element attr="content"/>
          """;

Spec: https://github.com/dotnet/csharplang/blob/main/proposals/raw-string-literal.md


Special thanks to @jnm2 for a deep review of this proposal

@AartBluestoke
Copy link
Contributor

AartBluestoke commented Jan 6, 2021

is there a way that literal be combined with $ to embed tokens, or is this completely literal strings only?

string exampleJson= $"""
                     {{
                         "name" = "{this.thingName}"
                     }}""";

(expecting that the answer is 'no - raw means raw with no interpolation' )

@CyrusNajmabadi
Copy link
Member Author

is there a way that literal be combined with $ to embed tokens, or is this completely literal strings only?

I personally think our existing strings are good enough for the template literal case. This is only for raw strings, and is intended when you just want to take a real snippet of some other language out there and put it into C#.

The moment we allow things like { to mean something, then we run into the escaping problem again. You'll have the problem that doing this, along with json, will be just as painful as the past.

So, tbh, i believe this should just be for raw-strings. And the best way to handle that is to make sure that you can provide a literal that will never conflict with the contents, and that the contents can't ever have meaning. :)

@Tragen
Copy link

Tragen commented Jan 6, 2021

In my opinion, Example 2 should trow an error and is confusing.
The ending string literal must be in its own line.
So the string doesn't end with a new line as also it doesn't start with one.
If you want an empty line at the end, add an empty line.
Perhaps this makes it also easier for the parser.

@AartBluestoke
Copy link
Contributor

"Perhaps this makes it also easier for the parser."
@Tragen agreed, that would also allow strings of quotes to appear mid-string, however how would you indicate if the block of text ends in a new line or not?

@Tragen
Copy link

Tragen commented Jan 6, 2021

Thats easy. Add an empty line

No empty line at the end. Last character in the string is >

var xml = """
          <element attr="contents">
            <body>
            </body>
          </element>
          """;

with empty line at the end

var xml = """
          <element attr="contents">
            <body>
            </body>
          </element>

          """;

For me, that is much more intuitive and logical

@merarischroeder
Copy link

is there a way that literal be combined with $ to embed tokens, or is this completely literal strings only?

I personally think our existing strings are good enough for the template literal case. This is only for raw strings, and is intended when you just want to take a real snippet of some other language out there and put it into C#.

The moment we allow things like { to mean something, then we run into the escaping problem again. You'll have the problem that doing this, along with json, will be just as painful as the past.

So, tbh, i believe this should just be for raw-strings. And the best way to handle that is to make sure that you can provide a literal that will never conflict with the contents, and that the contents can't ever have meaning. :)

Actually, I believe the escalating problem is about double-quote marks specifically. Having a raw-string+interpolation should therefore be possible and useful for at least HTML,XML and non-C markup/languages, but this is something that can be deferred for the future.

Examples:

example[0] = $"""<a href="{url}">{label}</a>"""
example[1] = $"""<tiger age="{age}"><eyes colour="{eye_color}" count="2"></tiger>"""

(I am using a single-line mode for these examples for brevity)

Examples with raw strings that would have braces:

var templateName = "C# Example Generator";
$"""
void Example(string Name)
{{
    Console.WriteLine($"Hello {{Name}}, Welcome to {templateName}");
}}
"""

Although the braces still need escaping, the ability to include raw double-quotes makes this much easier to read.

@merarischroeder
Copy link

merarischroeder commented Jan 6, 2021

is there a way that literal be combined with $ to embed tokens, or is this completely literal strings only?

I personally think our existing strings are good enough for the template literal case. This is only for raw strings, and is intended when you just want to take a real snippet of some other language out there and put it into C#.

The moment we allow things like { to mean something, then we run into the escaping problem again. You'll have the problem that doing this, along with json, will be just as painful as the past.

So, tbh, i believe this should just be for raw-strings. And the best way to handle that is to make sure that you can provide a literal that will never conflict with the contents, and that the contents can't ever have meaning. :)

I don't see any conceptual paradox in having single-line and multi-line raw string literals with de-indentation.

example[0] = """raw string here"""; //closing quote is found on the same-line, so there is no multi-line processing to do
example[1] = """multiline string here
                with no de-indentation, because 
                the string opener was not followed by new-line""";
example[2] = """
                this string can be de-indented
                because the string opener
                was directly followed by new-line"""; //it shouldn't matter if the string closer is here, or on the following line, the first line's indentation is the reference-point.
example[3] = """
                this also means, that indentation
                       can increase above the base-line
                       the same amount of spaces are 
                       still removed according to the base-line
"""; //even if the string closer has zero indent

Perhaps it isn't impossible to implement, but it would be much more complex or the spec-system isn't flexible enough?

@HaloFour
Copy link
Contributor

HaloFour commented Jan 6, 2021

@Tragen

In my opinion, Example 2 should trow an error and is confusing.
The ending string literal must be in its own line.
So the string doesn't end with a new line as also it doesn't start with one.
If you want an empty line at the end, add an empty line.
Perhaps this makes it also easier for the parser.

I disagree. There's nothing confusing about the closing quote being on the same line. It's exactly how text blocks work in Python and Java and it is not a problem in either of those languages.

@Tragen
Copy link

Tragen commented Jan 6, 2021

@HaloFour
A lot of other languages disagree with you.
When you can have it on the same line, then you would have an empty line at the beginning in all of the examples in the first post.

@HaloFour
Copy link
Contributor

HaloFour commented Jan 6, 2021

@Tragen

Other languages are welcome to do what they wish, but given two major languages have adopted the behavior proposed above it demonstrates that there is nothing inherently confusing about it.

@Tragen
Copy link

Tragen commented Jan 6, 2021

Because major languages have some features doesn't automatically mean that it isn't confusing.
E.g. C++ is very confusing.

@HaloFour
Copy link
Contributor

HaloFour commented Jan 6, 2021

@Tragen

Many different ways to skin that cat. To be honest I kind of prefer C++'s general approach to raw strings over text blocks since you're given a lot of flexibility to customize the delimiters while still retaining the syntax of a string (unlike heredocs in many languages). See the syntax I originally proposed here: #89

I will admit that having the closing delimiter on a separate line does make it easier to control the indentation without including that final newline character, and Cyrus was a little surprised that Java does include that newline when the delimiter is on the next line (so does Python).

@CyrusNajmabadi
Copy link
Member Author

Because major languages have some features doesn't automatically mean that it isn't confusing.

It does help with the argument though. Ultimately, either approach will need to be learned. Given that this doesn't really seem to have been a problem for many other languages, I'm not too worried for us. That said, I'm certain we'll discuss that option when we design this.

@CyrusNajmabadi
Copy link
Member Author

For me, that is much more intuitive and logical

I'm certain we'll discuss this during the design process.

@CyrusNajmabadi
Copy link
Member Author

I don't see any conceptual paradox in having single-line and multi-line raw string literals with de-indentation.

I'm certain we'll discuss this during the design process.

@CyrusNajmabadi
Copy link
Member Author

Although the braces still need escaping,

We'll likely discuss this. Though I'm personally against it. It will depend on what he rest of the ldm wants here.

Needing to escape defeats the purpose here. Once you have to escape something, you're back where you started. The goal of these strings was to allow you to embed any content and not have to deal with escaping at all.

@YairHalberstadt
Copy link
Contributor

There's a conflation of two different issues here:

  1. Supporting the ability to define raw straw string literals which require no escaping.
  2. Trimming indentation whitespace from literals.

I don't see that they necessarily have to come packaged together.

For example I would often want to indent interpolated strings as well.

It's also not clear how often raw literals have to be constants, and can't afford the overhead of calling something like .TrimIndentation() on them. I imagine the main use case would be tests, where such overhead would be marginal.

@CyrusNajmabadi
Copy link
Member Author

It's also not clear how often raw literals have to be constants, and can't afford the overhead of calling something like .TrimIndentation() on them

My position is that that's what would be wanted the majority of times. As such, doing it by default should just be how the language works. Why foist it on the user to have to add that extra work when it can just be the default oob behavior?

@CyrusNajmabadi
Copy link
Member Author

I don't see that they necessarily have to come packaged together.

They don't. But if we do raw strings this, I think we might as well do both to allow the literals to be ergonomically formatted without any downsides.

I'm sure though that we'll discuss this in the design meetings.

@HaloFour
Copy link
Contributor

HaloFour commented Jan 6, 2021

@YairHalberstadt

Java went through a similar design process and initially considered them separate with the inclusion of a helper method to align and trim the incidental whitespace. That was found to be more confusing and unattractive. Furthermore, since the helper method at runtime had less information regarding the formatting of the source around the string it ended up being necessary to include sentinel characters within the String to help inform it as to where the margin was supposed to be.

See: https://openjdk.java.net/jeps/326

I agree with Cyrus, the margin trimming is the most common thing you'd want to do and it's trivial to manage how the compiler behaves by the positioning of the delimiters. IDEs can include visual hints as to where the margin will be (as IntelliJ does with Java).

image

@CyrusNajmabadi
Copy link
Member Author

IDEs can include visual hints as to where the margin will be

Yes. I intend to do this as part of the implementation.

@jnm2
Copy link
Contributor

jnm2 commented Jan 6, 2021

This is great. By the time I got to the examples, they were already doing everything I intuitively wanted them to be doing. The indentation removal (or lack of indentation inclusion) is excellent and I would like to use it for things like EF/Dapper SQL queries.

I like the fact that you can explicitly include or exclude an ending newline by putting """ on the same line as the last line vs putting it on the next line. If there was a totally blank line before the ending """, I would strongly intuit that there would be two ending newlines. On the other hand, I could get used to anything. A newline is excluded at the top every time already.

There are a bunch of cases where I'd love to be able to use interpolation together with not having to escape double quote characters. For example: https://github.com/nunit/nunit3-vs-adapter/blob/master/src/NUnit.TestAdapter.Tests.Acceptance/SinglePassingTestResultTests.cs#L47-L60
Using raw strings without interpolation just for the benefit of excluding indentation and not having to escape quotes is probably something that would be quite hard to read if you have to inject values.

@jairbubbles
Copy link

jairbubbles commented Jan 6, 2021

Any thoughts on line endings? I once saw a case where a multi-line text was used in a unit test. As the source code was on git and autocrlf was set to true the string had different line endings when compiled on linux vs windows leading to different behaviors.

I personnally think it would be cool to have a way to specify what line endings the string should have and not rely on line endings of the file itself.

@IanYates
Copy link

IanYates commented Jan 6, 2021

Any thoughts on line endings? I once saw a case where a multi-line text was used in a unit test. As the source code was on git and autocrlf was set to true the string had different line endings when compiled on linux vs windows leading to different behaviors.

I personnally think it would be cool to have a way to specify what line endings the string should have and not rely on line endings of the file itself.

This is worthy of consideration. Some prefix sign ahead of the string (as we have $ and @ now) perhaps?
I was also thinking a lot of "tabs to/from spaces" converters may need to get smarter here too. Having the IDE clearly indicate the common indent and show if it has a mix of tabs and spaces in it would be very helpful.

Allowing string interpolation seems reasonable to me. This does reduce the "can paste anything" ability, and more makes it an easier way to include blocks of text with quotes in them. However that seems a reasonable trade-off as it's very opt-in (only works if user places the $ in front)

Finally, the original proposal to have the closing quotes on their own line seems sensible to me. Imagine overwrite-pasting a good chunk of text - so much easier to select whole lines than to select many lines and then all bar the last N characters of the last line. I would much prefer to have the closing quotes on their own line.

@CyrusNajmabadi
Copy link
Member Author

CyrusNajmabadi commented Jan 6, 2021

Any thoughts on line endings?

I would preserve them as is. It's intentionally a raw string, not an interpreted one. :-)

As the source code was on git and autocrlf was set to true the string had different line endings when compiled on linux vs windows leading to different behaviors.

Sounds like a problem for all strings. Don't do that :-D

@SimonCropp
Copy link

would preserve them as is.

so different behavior based on what OS the code is build on?

@CyrusNajmabadi
Copy link
Member Author

@amro93 because you still need a way to escape ${ if it is actual content in your string.

@amro93
Copy link

amro93 commented Apr 11, 2022

@amro93 because you still need a way to escape ${ if it is actual content in your string.

No matter what is the boundaries it will require an escape char for it 😅
$`...{ }` could be a better idea,
But I'm pretty sure that this idea is not considered for a good reason, that's why I was curious.

@333fred
Copy link
Member

333fred commented Apr 11, 2022

@amro93 because you still need a way to escape ${ if it is actual content in your string.

No matter what is the boundaries it will require an escape char for it 😅
$`...{ }` could be a better idea,
But I'm pretty sure that this idea is not considered for a good reason, that's why I was curious.

This proposal does not require an escape character for boundaries, as it allows you to change the boundary character.

@CyrusNajmabadi
Copy link
Member Author

No matter what is the boundaries it will require an escape char for it

The idea of this proposal is to make it so you never need an escape char. That's one of the top motivations for it :-)

@Xyncgas

This comment was marked as off-topic.

@KieranDevvs
Copy link

Why not use AI in the future to guess what I am trying to do instead of trying to come up with new syntax to do it right

Because there would be no defined behavior and it would differ between each user environment.

@jrmoreno1
Copy link

Is there a way to make this work when compiling a framework application (i.e. not 11) but using 17.0+?

This is an EXTREMELY useful feature, and it's a real shame if it can't be used with older projects since it is really only the compiler that needs to know about it.

@CyrusNajmabadi
Copy link
Member Author

@jrmoreno1 not sure wha you mean. There are no restrictions on using this feature.

@jrmoreno1
Copy link

jrmoreno1 commented Jun 10, 2023

@CyrusNajmabadi : I am using the latest version of VS (17.6.2), with a console application targeting net48 (which will automatically target language version 7.3), results in CS8730.

But since this is entirely compiler magic and both my build server and any local builds will be with a compiler that supports this feature, it would be terrific if it worked.

It's very annoying to start typing and then get an error message telling me that I have to do a lot of escaping to get things to work...

@CyrusNajmabadi
Copy link
Member Author

You can manually set the language version. This is a requirement. If you have your lang version set to something lower (implicitly or explicitly) it will of course be blocked as that's literally what a language version is (a claim about which versions of the language you desire to use).

Use the <LangVersion> tag for this.

@HaloFour
Copy link
Contributor

@CyrusNajmabadi

This is technically not a supported scenario, though, right? I thought language versions were now tied to runtime versions, regardless of whether or not runtime changes were required to support any specific features, in order to avoid confusion as to which combination of features would work.

@CyrusNajmabadi
Copy link
Member Author

it's not supported. but it's something that can be done. We're not going to stop you. We're not going to spend any effort on going and making these scenarios supported though.

smfeest added a commit to smfeest/buttercup that referenced this issue Aug 6, 2023
I've also taken this opportunity to the new raw string literal syntax
[1] so that all the double quotes in the string no longer need to be
escaped.

[1] dotnet/csharplang#4304
@KyleMit
Copy link

KyleMit commented Nov 13, 2024

Should this be closed now that the proposal has gone through?

@CyrusNajmabadi
Copy link
Member Author

@KyleMit We have not added this to the ecma standard yet.

@julealgon

This comment has been minimized.

@CyrusNajmabadi
Copy link
Member Author

@julealgon Feel free to open a discussion on this. This issue is not the location to meta discus process.

@ilmax

This comment has been minimized.

@Xyncgas

This comment has been minimized.

@HaloFour

This comment has been minimized.

@dotnet dotnet locked and limited conversation to collaborators Nov 19, 2024
@CyrusNajmabadi CyrusNajmabadi converted this issue into discussion #8646 Nov 19, 2024

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
Implemented Needs ECMA Spec This feature has been implemented in C#, but still needs to be merged into the ECMA specification Proposal champion Proposal
Projects
None yet
Development

No branches or pull requests