-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Proposal] Modify how literal strings are parsed. #301
Comments
I am sure allowing "UncodeOpenDoubleQuote" would break something (it would in my code) but it would not be silent and I personally think it is worth it. Sidenote I can't even get this editor to accept a Unicode quote so I type it as UncodeOpenDoubleQuote. I prefer @"" to "Option CharacterEscapes On" unless we get local Options because sometimes I want Unicode characters and sometimes I really want "\u" and don't want to escape it. If we use @"" you could just use the C# code unmodified to parse it guaranteeing compatibility. |
And also add to the "fun" of having a diametrically different meaning for Escapes can be handy, but really they're not that essential. I think BTW, what sort of code base(s) are you translating that has all these crazy unicode double quotes in them? (which in turn are driving you crazy?) |
@rskar-git I didn't know @"" was different, I don't know C# that well. I was asking that they be the same. As for the code base I am using Roslyn as my test vehicle. I also had no idea VB supported escapes. |
Implementing escapes would have been less painful if they'd been added along with interpolation. Then $"" could be used for either, and people would have gotten used to it as part of switching to interpolation. Since we missed that opportunity, since we already have a string suffixes for char literals ("x"c), what about just using a different one for escaped strings, something like Dim s = "Look ma!\nSee that, new line! \u263A (smiles!)"E or one could have different variants for ASCII and UNICODE. |
@pricerc That works for me. Hopefully what is inside the quotes is identical to C#. |
Sorry about the miscommunication, but presently VB does NOT support escapes. I was simply pointing out how C# has
But with
So C# has these two modes as part of its syntax. But to reiterate, VB does NOT support escapes as a part of its syntax; however, via the .NET Framework, VB can decode escapes via
Can you tell us a little more about what it is you need to get done? Perhaps there are better ways to get it accomplished in VB in dealing with unicode etc. |
The limitation to that idea is that
Actually, I wonder if that concept could be made to work. There are two downsides I can see: (1) The string will need to be rescanned (to be decoded) since the I still think that |
I am working on a C# to VB translator that preserves comment, and formatting where possible. I started with some of the Roslyn Samples but they throw away most, if not all comments and formatting so the resulting code is really hard to read and understand and some lines are 1,000's of characters. Also many features that are easy to translate are just skipped in most translators. The best example is Checked math which is what VB does by default. At this point I can successfully translate and compile the first 2,000 C# files in the Roslyn src tree and preserve all the comments but they are not all in the correct place (or sometimes even close to where they belong). In the process I am learning to read C#. Just because something compiles doesn't mean the code executes correctly, my misunderstanding C# escaped strings is an example of that, plus I have not found a general workaround for Unchecked Math, I have special cased typical uses so it is not hopeless. What I really need is a VB comment that can be used in more places or more flexibility around comments and blank lines in argument lists. Just looking at Roslyn, if they is a way to write a C# comment and place it, somewhere in Roslyn is an example. Except for 1 very small dll for Hash and Unchecked math, everything in written in VB using Roslyn. |
@pricerc Given VB's compatibility requirements I can't think of any workaround for Unicode Double Quotes without an Option or a version specific feature that breaks existing code. I think most users would be happy for a little pain removing the doubled Unicode quotes for the convivence of just being able to past from a Word Processor or Web Page and not have to double all the quotes. |
Would it be possible to just do away with that limitation? i.e. if it's possible to evaluate an interpolated string at compile time (as in all inserted values are constants), the compiler does so and treats it just like any other string literal. |
@rskar-git VB has many artificial limitations around constants that are not obvious and maybe something to look at fixing even if only a few at a time. Chr(W) with a constant, Nothing... |
tangential to the topic. Since Chr and ChrW both return the same UNICODE Char datatype in VB.NET, does anyone know why ChrW wasn't retired or Chr and ChrW made synonyms with the advent of VB.NET? I get that you'd want some compatibility with VB6/VBA, but I'm not sure I see the value in the distinction. |
That's a big job, made all the more challenging by learning C# as you go - hat's off to you for taking it on! Remarkably, each of these mess-up on the fancy unicode double quotes!: Are you doing this for yourself or an employer? Asking because I'm curious if you've checked this out: https://www.tangiblesoftwaresolutions.com/product_details/csharp-to-vb-converter.html |
I suppose, but as I'm not among the somebodies whose job it is to maintain compiler, IDE, and "tooling", I've got no informed opinion on whether that's a big job or not. Actually why is this good enough:
when instead adding an
The other idea of:
doesn't strike me as doable at all. Right now any valid expression is what's expected between the curly braces. Adding in the complexity of detecting escapes sounds like a messy and painful job. |
So there you have it. |
@rskar-git Doing it for myself, I love the tangible software solutions converter and have provided feedback to them to improve it but it doesn't do many of the things I need and some of the code it produces converts but doesn't compile. What I have is already well beyond anything available, If I understood GitHub better I would be happy to share source but my past experience trying to fix issues in Open Source projects without being an insider has proven very frustrating. I have a UI that translates folders recursively and compiles the result, I am working on a comment comparer to make sure I don't drop anything but in some cases the Roslyn Syntax Walker is the cause of the issue especially around document comments where Roslyn, VB, C# and Visual Studio all allow malformed comments but you can't create them with a SyntaxFactory and if you look at them they look perfectly correct. |
The former supports a larger set of uses than the latter for one. While the latter is strictly for escaping characters, the former would also enable stuff like: Private Const IdentifierStartPattern = "(\p{Lu}|\p{Ll}|\p{Lt}|\p{Lm}|\p{Lo}|\p{Nl})"
Private Const IdentifierEndPattern = "(\p{Mn}|\p{Mc}|\p{Nd}|\p{Pc}|\p{Cf})"
Private Const IdentifierPattern = $"{IdentifierStartPattern}({IdentifierStartPattern}|{IdentifierEndPattern})*"
Private Const QualifiedIdentifierPattern = $"({IdentifierPattern}\.)*{IdentifierPattern}" |
I am liking the idea of doing this in combination with the new
...which is readily understandable to existing users, requires almost no new syntax, and causes no breaking changes or ambiguity since expressions in VB can't begin with a Another idea, less readable but requires no new syntax at all:
...and just have the compiler realize that this is an escape, and inline it directly into the string, instead of an actual call to ChrW. It's almost a performance optimization rather than a new feature. (But I prefer the first way.) |
@bandleader Both have already been suggested. |
Item 1 of the proposal is a breaking change, so we are not going to do it. Item 2 introduces a new option, which further complicate the language. The underlying problem of escape sequences deserves more thought, but this proposal isn't a solution we are happy with. |
@KathleenDollard I've been think about change how strings are represented internally in the visual basic language. By considering the inheriting from a base, that represents textual symbols in the language.
This textual base, consist of up to three sections
The semantic validity of the char literal, is lifted out of syntax analysis to semantic analysis. This then allows us to potentially extend or add additional forms of textual representations. |
@KathleenDollard : when this was considered by the LDM was the “\u263A”e syntax considered? It’s not mentioned in the proposal, but one of the comments by @pricerc |
Since this was a proposal, we restricted our decision to reject this to this approach. I'd love to see one or more issues created from the underlying problems (unless they already exist, in which case, ping that issue with your thoughts) |
The underlying issue (difficulty working with Unicode strings) was brought up as a problem in |
Two items to this proposal:
Item 1: ANSI-quotation-marks versus Unicode-double-quotation-marks; if the string begins with
"
(i.e.Chr(34)
) it must end only with"
.Item 2: A new
Option
which allows for C-style escape codes.I believe these would address issues such as those stated in #276 and #299.
P.S. Alternative idea for Item 1 above: Introduce
@""
to VB with a meaning and syntax identical with that of C# (and perhaps$@""
too). This avoids breaking existing code, and also would naturally complementOption
of Item 2.=== Item 1 ===
In VB today, a literal string can be composed with ANSI-quotation-marks (code &H22) and Unicode-double-quotation-marks (left and right, codes &H201C and &H201D). However, they are treated like exactly equivalent symbols. In other words, the literal can begin with any one of &H22, &H201C, or &H201D, but how it ends does not depend on how it begins - currently it can again be any one of them.
I am guessing the designers went this way as a help to those who code in a (or via a?) word-processor, such as Microsoft Word; or copy-paste from badly edited web pages. I'm not sure how many folks are out there who regularly do that - I can only guess they are the few and the proud. Considering how many languages in use today use either &H22 or ANSI-apostrophe (&H27) to form their literals, it doesn't seem like there are any real international/Unicode issues at play here.
So, I would like a change: If the literal begins with &H22, it ends with &H22. Therefore, this would then be valid:
Note there would no longer be a need to double-up on &H201C and &H201D in this mode.
I sincerely doubt this to be a breaking change in terms of how coding is actually done. I would leave it to others to decide on what to do about literals starting with &H201C or &H201D - I'm OK with keeping current behavior (any one of &H22, &H201C, or &H201D will do).
Alternatively, we could instead introduce C#-style
@""
(which would work nicely with Item 2 below).=== Item 2 ===
Introduce a new
Option
which allows for C-style escape codes in string literals. I'll leave it for others to consider on whether we need yet another literal string format - perhaps we could simply follow C# here, and allow @"" and $@"" (both redundant in today's VB). Anyway, maybe we could call itCharacterEscapes
with settings ofOn
andOff
.The text was updated successfully, but these errors were encountered: