[Proposal]: Embedded Language Indicators for raw string literals #8653
Replies: 48 comments
-
i would not limit this to identifier, as that would not allow things like |
Beta Was this translation helpful? Give feedback.
-
Not to argue either way, but that doesn't seem to limit markdown. |
Beta Was this translation helpful? Give feedback.
-
you can use class Foo { } |
Beta Was this translation helpful? Give feedback.
-
Markdown also accepts file extensions which I find easier to type.. but either way backtick is not going to be (re)considered for string literals, right? |
Beta Was this translation helpful? Give feedback.
-
@alrz I would only support backticks if we were actually adding support for markdown (something I do want). |
Beta Was this translation helpful? Give feedback.
-
Why limit it to only language indicators? It could be anything, e. g. locale or color or any other editor hint. |
Beta Was this translation helpful? Give feedback.
-
Sure, we won't be stopping whatever you want to put there. However, my intention with this proposal is that editors will use it to drive interior language highlighting. |
Beta Was this translation helpful? Give feedback.
-
It's not limited to only language indicators. It's just that that's a primary consumption case.
These are also 'language indicators' :) |
Beta Was this translation helpful? Give feedback.
-
Why not this?
A single-line comment would fit after |
Beta Was this translation helpful? Give feedback.
-
Primarily verbosity. It seems esp. excessive given how markdown is commonly used to write
If you prefer that, that's already supported. You can do both: // lang=c#
return """
class Foo { }
"""; Or return /* lang=c# */ """
class Foo { }
"""; Given that, we don't need an interior-form of this comment. But having a simple interior form that is much less verbose than the comment form would be nice. |
Beta Was this translation helpful? Give feedback.
-
Something like this? var example = """
SELECT * FROM table
"""sql;
var example = """SELECT * FROM table"""sql; I prefer that TBH. It's low-importance metadata and is more "out of the way" when it's appended. It's also "outside of the string" this way, like a tag. Is the above technically possible? Is it possible to add whitespace before the "tag"? var example = """
SELECT * FROM table
""" sql;
var example = """SELECT * FROM table""" sql; It's cleaner/less "squashed" that way. |
Beta Was this translation helpful? Give feedback.
-
@glen-84 yes, those are potential alternatives we can consider. However, it is unlikely as "text on outside" already has meaning today and actually affects the semantics of hte string. e.g. |
Beta Was this translation helpful? Give feedback.
-
I see. In a way that could also be seen as a "tag" or "metadata", so it could make sense to extend that in a more general sense, in a way that clearly indicates its user-defined nature. // Would this be confused with a preprocessor directive?
var example = """SELECT * FROM table"""#sql;
var example = """SELECT * FROM table"""u8#sql; This could apply equally to regular strings. "Tagged string literals" |
Beta Was this translation helpful? Give feedback.
-
@glen-84 We'll keep those alternatives in mind when designing this. Thanks! |
Beta Was this translation helpful? Give feedback.
-
Not to bikeshed too much but I'm a bit torn. I do like the markdown-like approach of having the tag on the opening line of the raw literal. I think it's easier to see what the dialect is without having to find the end of the literal, plus it's familiar. But that poses a problem with single-line raw literals. Having the tag at the end for a single line literal looks nicer, but collides with decision to use a suffix to denote UTF-8 literals. The "tag" approach does solve that, but IMO isn't very attractive. Maybe a prefix? var example1 = sql"""SELECT * FROM table;""";
var example2 = sql"""
SELECT *
FROM table;
"""; And while I'm partial to the feature it does feel a little weird that the syntax would only exist to facilitate tooling. Almost feels like something better served through source attributes. |
Beta Was this translation helpful? Give feedback.
-
Thanks. So what is the |
Beta Was this translation helpful? Give feedback.
-
There's a Syntax.xml file that contains our syntactic definitions.
No. There is not. :-) Such a thing isn't really useful for us. Our language is much more about what we want it to be, not about feeding into tools with their restrictions.
No. :-) Because it's not really relevant for our compiler design. We don't want to limit ourselves to limitations often inherent in particular grammar models.
I don't know what a grammatical inconsistency is. Ambiguities are interesting, but easy to find. We don't really care though as lots of stuff are ambiguous in our language and we're ok with that. :-) |
Beta Was this translation helpful? Give feedback.
-
@Korporal if you have any questions about the grammar and/or syntax, def come to discord and we can totally help you out there. Thanks! |
Beta Was this translation helpful? Give feedback.
-
I hope there would be some extra considerations for user-defined string syntax rules, meaning raw string literals will be highlighted not only for built-in kinds defined in |
Beta Was this translation helpful? Give feedback.
-
@SunnieShine can you give an example? |
Beta Was this translation helpful? Give feedback.
-
@CyrusNajmabadi Sorry for not fully described and late. For example: using System;
string s =
"""C#
// A test snippet for C# language
Console.WriteLine("Hello, world!");
""";
Console.WriteLine(s); If we can add indicator such as "C#", the string literal will be highlighted as C# syntax rule, alright? I want to have a mechanism to make indicators and its syntax highlighting rules not only limited to some "commonly-used" ones. Instead, we can use some indicators such as "abc". Although "abc" is not a valid indicator, we can define it by using Roslyn APIs (if available), to support for syntax highlighting for strings marked as indicator "abc". string s =
"""abc
A test string that can be highlighted as "abc" rule,
which can be defined by us using Roslyn APIs.
"""; Write Visual Studio extensions for highlighting this is too difficult for me because it may produce a high complexity of implementation. I found that Roslyn uses It is good if C# language or Roslyn APIs (language level or complier level) has a same but easier way to achieve this. |
Beta Was this translation helpful? Give feedback.
-
@SunnieShine ... That's exactly what this proposal is :) |
Beta Was this translation helpful? Give feedback.
-
Ah... 🤣 Sorry. I might miss the point for this proposal. |
Beta Was this translation helpful? Give feedback.
-
@SunnieShine very understandable. The issue description is incredibly cryptic - the "detailed design" is practically impossible to understand for people who don't speak fluent csharp-language-standards and it does not contain any examples of what it's proposing |
Beta Was this translation helpful? Give feedback.
-
I don't know that the proposal suggests anything that allowing some kind of identifier to be embedded in the raw string literal. That literal may be used to indicate a "type" of the raw string which can be used to influence tooling like syntax highlighting, but nothing in the proposal suggests how that would work, or even that it would be a part of the Roslyn compiler aside from metadata. |
Beta Was this translation helpful? Give feedback.
-
This feature would be extremely beneficial if the IDE could offer syntax highlighting, autocomplete, and validation for various data types / languages such as XML, JSON, and SQL. |
Beta Was this translation helpful? Give feedback.
-
Another use case for this: VS 2022 17.6 just released a new "spellchecker" feature that marks misspelled words in code. The spellchecker has exclusion lists that differ by language:
However, a problem arises when SQL code is embedded in C# string literals: // Warning: "ROWCOUNT is misspelled"
string sqlQuery = "SELECT ROWCOUNT FROM users"; This proposal should solve that, as VS would know what language is in the literal string 🙂 // No spellcheck warning!
string sqlQuery = """
SELECT ROWCOUNT FROM users
"""sql; (Although this would probably require that we be able to differentiate between different types of SQL, as different variants have different built-in functions and reserved words) |
Beta Was this translation helpful? Give feedback.
-
as far as I understood, there is lot of favor for the original proposed syntax: var s = """c#
var t = 5;
"""; but there is a concern about single line raw string literals. |
Beta Was this translation helpful? Give feedback.
-
Will Roslyn provide semantic tokens for embedded syntax? Does it provide for EDIT: we do have semantic tokens for |
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
Embedded Language Indicators for raw string literals
Summary
When we were designing raw string literals, we intentionally left the door open for putting a language indicator at the end of the opening
"""
for the multi-line form. This proposal adds the support to do that.Motivation
In the BCL, we added
StringSyntaxAttribute
for applying to parameters, which allows parameters to indicate the strings passed to them contain some form of embedded language, which is then used for syntax highlighting. However, this only works for strings passed directly to the parameter. For strings first stored in a variable, the only solution is a// lang = x
comment. This means that, if the IDE wants to extract a multi-line raw string literal, it cannot neatly preserve the highlighting that was used. This syntax form is intended to help bridge that gap.Detailed design
The existing raw string literal proposal has the following multi-line grammar:
multi_line_raw_string_literal : raw_string_literal_delimiter whitespace* new_line (raw_content | new_line)* new_line whitespace* raw_string_literal_delimiter ;
This is updated to the following:
multi_line_raw_string_literal : raw_string_literal_delimiter identifier? whitespace* new_line (raw_content | new_line)* new_line whitespace* raw_string_literal_delimiter ;
Where the
identifier?
token is added right after the delimiter.Drawbacks
This form is not equally applicable to all string types, so it would only apply to multi-line raw string literals. Ideas on other forms that could be more broadly applied would be useful: maybe putting the identifier after the closing quote could work?
Alternatives
Unresolved questions
Design meetings
Beta Was this translation helpful? Give feedback.
All reactions