-
Notifications
You must be signed in to change notification settings - Fork 10.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
An implementation for 0168-multi-line-string-literals.md #8813
Conversation
Four of these new tests are currently disabled (by commenting them out) because they fail to compile. I’ll tackle them shortly.
To tell the truth, I had no idea tabs were forbidden in the first place.
Tests and a bug fix
lib/Parse/Lexer.cpp
Outdated
// work back from the end to find whitespace to strip | ||
while (start > Bytes.begin() && isWhitespace(start[-1])) { | ||
if (*--start == '\n' || *start == '\r') { | ||
if (start[-1] == '\r') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not right, I think: \r\r is two classic Mac newlines, not one. Surely, what you want is this:
switch (*--start) {
case '\n':
if (start[-1] == '\r')
--start;
LLVM_FALLTHROUGH;
case '\r':
return std::string(start, end-start);
}
include/swift/Parse/Lexer.h
Outdated
@@ -397,12 +402,13 @@ class Lexer { | |||
/// If a copy needs to be made, it will be allocated out of the provided | |||
/// Buffer. | |||
static StringRef getEncodedStringSegment(StringRef Str, | |||
SmallVectorImpl<char> &Buffer); | |||
SmallVectorImpl<char> &Buffer, | |||
unsigned Modifiers = 0, const std::string &ToReplace = ""); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: 80-character lines, here and throughout.
include/swift/Parse/Lexer.h
Outdated
@@ -505,6 +511,9 @@ class Lexer { | |||
/// Try to lex conflict markers by checking for the presence of the start and | |||
/// end of the marker in diff3 or Perforce style respectively. | |||
bool tryLexConflictMarker(); | |||
|
|||
// new for multiline string literals |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: capitalize and punctuate comments, here and throughout.
lib/Parse/Lexer.cpp
Outdated
case '\n': // String literals cannot have \n or \r in them. | ||
case '\r': | ||
if (Modifiers & StringLiteralMultiline) // ... unless they are mutli-line |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit, typo. (Also, decide if you want to write "multi-line" or "multiline" throughout, and camel-case accordingly in the code.)
lib/Parse/Lexer.cpp
Outdated
@@ -1192,6 +1207,15 @@ unsigned Lexer::lexCharacter(const char *&CurPtr, char StopQuote, | |||
case '"': ++CurPtr; return '"'; | |||
case '\'': ++CurPtr; return '\''; | |||
case '\\': ++CurPtr; return '\\'; | |||
case '\n': | |||
LLVM_FALLTHROUGH; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per existing code in these files, no need for LLVM_FALLTHROUGH;
when your case is otherwise empty.
lib/Parse/Lexer.cpp
Outdated
} | ||
|
||
/// determine contents of literal to be normalised - either | ||
/// to strip indenting or normalise line endings to a single \n |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: while you're capitalizing and re-wrapping these comments to 80 characters, U.S. English spellings. (Yes, I know; I'm Canadian, but them's the rules...)
lib/Parse/Lexer.cpp
Outdated
} | ||
|
||
// are there windows line endings in the source, if so return it to be replaced | ||
const char *windowsLinesep = strnstr(Bytes.begin(), "\r\n", Bytes.end()-Bytes.begin()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A nit and a more substantive comment.
- Nit:
windowsLineSeparator
(capitalize). - More substantive comment:
strnstr
isn't portable, is it? Ironic that this won't work on Windows...
lib/Parse/Lexer.cpp
Outdated
while ((BytesPtr = (const char *)memchr(BytesPtr, '\n', Bytes.end()-BytesPtr)) != nullptr) { | ||
const char *NextPtr = BytesPtr + 1; | ||
if (*NextPtr != '\n' && *NextPtr != '\r') { | ||
if (BytesPtr[-1] == '\r') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See previous comment about how this treats \r\r.
lib/Parse/Lexer.cpp
Outdated
@@ -1324,8 +1404,15 @@ void Lexer::lexStringLiteral() { | |||
// NOTE: We only allow single-quote string literals so we can emit useful | |||
// diagnostics about changing them to double quotes. | |||
|
|||
bool wasErroneous = false; | |||
|
|||
bool wasErroneous = false, wasWhitespace = false, allWhitespace = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: wasAllWhitespace
, surely.
test/stdlib/MultilineString.swift
Outdated
// CHECK: -2- | ||
print("-2-") | ||
// SKIP-CHECK-NEXT: <"Two Beta"> | ||
//print(delimit(""""Two Beta"""")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this commented out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a test for two quotes in the middle of the string?
BytesPtr = NextPtr; | ||
} | ||
} | ||
|
||
/// lexStringLiteral: | ||
/// string_literal ::= ["]([^"\\\n\r]|character_escape)*["] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update this comment
lib/Parse/Lexer.cpp
Outdated
// Strips any indent that corresponds to the indent | ||
// of the multi-line string terminating line and | ||
// normalises line endings in the source to \n. | ||
// It also removes any intial empty line. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wrap comments at 80 lines.
lib/Parse/Lexer.cpp
Outdated
bool wasErroneous = false, wasWhitespace = false, allWhitespace = true; | ||
unsigned Modifiers = 0; | ||
|
||
// is this the start of a multiline string litersl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo
test/stdlib/MultilineString.swift
Outdated
// CHECK: -4- | ||
print("-4-") | ||
// CHECK-NEXT: <FourDelta> | ||
print(delimit("""Four\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned on swift-evolution, this should be disallowed.
test/stdlib/MultilineString.swift
Outdated
|
||
// CHECK: -14- | ||
print("-14-") | ||
// CHECK-WARNINGS: warning: invalid mix of multi-line string literal indentation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be an error instead of a warning.
test/stdlib/MultilineString.swift
Outdated
|
||
// CHECK: -11- | ||
print("-11-") | ||
// CHECK-WARNINGS: warning: invalid mix of multi-line string literal indentation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be an error.
test/stdlib/MultilineString.swift
Outdated
// CHECK: -12- | ||
print("-12-") | ||
// Note: The next few tests use physical tab characters, not spaces. | ||
// CHECK-WARNINGS: warning: invalid mix of multi-line string literal indentation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be an error.
test/stdlib/MultilineString.swift
Outdated
|
||
// CHECK: -13- | ||
print("-13-") | ||
// CHECK-WARNINGS: warning: invalid mix of multi-line string literal indentation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be an error.
Hi, @johnno1962, thanks for sending this patch! Do you think you'll have time to address the review comments? I'd like to get this into Swift 4, and we're getting close to the deadline for it. |
Thanks @xwu and @kubamracek for the comments which I have endevoured to address with this last commit. This should be compliant with the core teams decision except that it includes a small amount of code to implement a new proposal https://github.com/johnno1962c/swift-evolution/blob/master/proposals/0173-newline-escape-in-strings.md to allow escaping of newlines in all strings. This code is around line 1212 of Lexer.cpp if you want to remove it. There are limitations to the implementation in that while it will normalise end-of-line to \n for sources that use \n, \r\n or \r as the line separator, they can not be mixed within a literal and have this work. |
@johnno1962, could you extract the newline escaping feature into a separate pull request? Since it's a separate proposal that wasn't accepted yet, we shouldn't block merging this PR on the other feature. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lots of nits, sorry.
My main feedback would be that I'd like to see more tests. It'd be very important to guarantee that your line normalization code does not strip manually escaped \r\n
(or, for that matter, \r
+ literal newline).
Also, what can be done about this limitation as to normalization of mixed newlines?
ERROR(lex_unicode_escape_braces,none, | ||
"expected hexadecimal code in braces after unicode escape", ()) | ||
ERROR(lex_illegal_multiline_string_start,none, | ||
"inavlid start of multi-line string literal", ()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo
ERROR(lex_illegal_multiline_string_start,none, | ||
"inavlid start of multi-line string literal", ()) | ||
ERROR(lex_illegal_multiline_string_end,none, | ||
"inavlid end of multi-line string literal", ()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo
ERROR(lex_ambiguous_string_indent,none, | ||
"invalid mix of multi-line string literal indentation", ()) | ||
WARNING(lex_trailing_multiline_whitespace,none, | ||
"includes trailing space characters in multi-line string literal", ()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Warning on trailing whitespace is not a part of the approved proposal; please also split into separate PR.
include/swift/Parse/Lexer.h
Outdated
@@ -505,6 +512,9 @@ class Lexer { | |||
/// Try to lex conflict markers by checking for the presence of the start and | |||
/// end of the marker in diff3 or Perforce style respectively. | |||
bool tryLexConflictMarker(); | |||
|
|||
/// New for multiline string literals |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments should have punctuation; also, please describe what it does and not just label it as new.
include/swift/Parse/Token.h
Outdated
@@ -46,6 +46,9 @@ class Token { | |||
/// \brief Whether this token is an escaped `identifier` token. | |||
unsigned EscapedIdentifier : 1; | |||
|
|||
/// modifiers for string literals |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: capitalize and punctuate.
lib/Parse/Lexer.cpp
Outdated
Segments.push_back( | ||
StringSegment::getLiteral(getSourceLoc(SegmentStartPtr), | ||
Bytes.end()-SegmentStartPtr)); | ||
Bytes.end()-SegmentStartPtr, Modifiers, ToReplace)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
80-character lines please.
test/stdlib/MultilineString.swift
Outdated
|
||
// ===---------- Done --------=== | ||
// CHECK-NEXT: Done. | ||
print("Done.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add more tests to ensure that the errors and warnings you have added behave as intended?
Some other ideas:
- Check that only one leading newline and one trailing newline is stripped.
- Check that all other newlines are preserved, including multiple consecutive ones.
- Check that manually escaped
\t
,\r
,\n
,\\
are all possible and correctly lexed, including at the end of a line. - Check that a manually escaped
\r\n
is not normalized to\n
. - Check that string literal interpolation works correctly.
- Check that invalid ragged leading indents trigger the expected error.
- Check that escaping
\"""
works correctly.
Also, I would like to see, for the purposes of this particular implementation, that end-of-line newlines after \
are preserved and that trailing whitespace is correctly preserved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @xwu I’ve actioned most of your nits. “wasAllWhitespace” is there to be able to generate errors when there is non-whitespace before the closing delimiter. @kubamracek this commit should be in line with the proposal as accepted. I’ll put newline escapes back in as a separate PR if required. Looking at more tests while the toolchain builds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One typo left :)
Why can find-and-replace operations not simply replace all literal \r
with \n
and all literal \r\n
with \n
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could do it the other way around I suppose. It tries to do all the replacing including indent stripping with a single loop due to how the code developed. Seems OK as it is apart from exotic mixes of line endings inside a single literal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see the logic behind doing indent stripping in a single loop, because you need to get to the end of the literal before you know how much to strip. But, without having thought too deeply, it would seem that normalizing line endings can happen line-by-line as you go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, and your logic isn't stripping leading whitespace from escaped newlines, is it?
"""
\n \n
"""
...should give "\n \n" and not "\n\n".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indent stripping is done on program text not expanded escapes which happens afterwards
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good. I assumed so--just wanted to check.
include/swift/Parse/Lexer.h
Outdated
std::string ToReplace; | ||
|
||
static StringSegment getLiteral(SourceLoc Loc, unsigned Length, | ||
unsigned Modifiers, const std::string &ToReplace) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
80-character lines, please.
include/swift/Parse/Lexer.h
Outdated
@@ -495,7 +502,7 @@ class Lexer { | |||
static unsigned lexUnicodeEscape(const char *&CurPtr, Lexer *Diags); | |||
|
|||
unsigned lexCharacter(const char *&CurPtr, | |||
char StopQuote, bool EmitDiagnostics); | |||
char StopQuote, bool EmitDiagnostics, unsigned Modifiers = 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
80-character lines, please.
include/swift/Parse/Token.h
Outdated
@@ -273,11 +276,17 @@ class Token { | |||
void setText(StringRef T) { Text = T; } | |||
|
|||
/// \brief Set the token to the specified kind and source range. | |||
void setToken(tok K, StringRef T, unsigned CommentLength = 0) { | |||
void setToken(tok K, StringRef T, unsigned CommentLength = 0, unsigned Modifiers = 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
80-character lines, please.
One more time 🙄 |
BTW, you can get a quicker turn around time by running "ninja swift" in your local swift build directory to just build the compiler (without the standard library). |
@swift-ci please smoke test |
Thanks for the tip. I’ve been building toolchains. Is there an easy way to build a toolchain without building all architectures? |
Nothing supported that I know of. Of course, you can intercept the commands and hack something together for yourself, but that's pretty dirty. If you're just iterating fast on a .cpp file, "ninja swift" will turn around in about 20 seconds or so. You can then use e.g. |
Toolchains weren't so bad. it’s just that if the day changes you have to rebuild llvm + clang. Looks like tests are ok. Merge away 👍 |
🍾 |
SR for diagnostics improvements: https://bugs.swift.org/browse/SR-4701 @kubamracek do you have an SR for multi-line literals inside of interpolations? |
Thanks @milseman, @kubamracek and particularly @xwu for all your help. Final toolchain: |
https://bugs.swift.org/browse/SR-4708: Add support for multiline strings inside string interpolations |
@kubamracek I’ve just opened PR #9049 for multiline inside interpolations |
Was the warning about trailing whitespaces removed? 😞 let s = """
abc
"""
print(s.characters.count) // prints 66 |
What would be the workaround if trailing whitespace is desired? |
@milseman swiftlang/swift-evolution#695 Without the trailing backslash you'd need something like |
The example from above is actually this: let s = """
abc \("")
"""
print(s.characters.count) // prints 66 which has a visible indication that trailing whitespaces are involved. Ideally we still need the trailing backslash. |
As already discussed on swift-evolution, the accepted proposal does not include warnings about trailing whitespace. This PR correctly implements the proposal as accepted. |
@xwu I bet you've included all the points the core team mentioned in the accepted thread, which cases should be errors and which should be allowed, like for instance the blank line without any indent, but you seem exclusively pick things that you like and silently ignore things that you don't like the same way you did during the discussion thread. I'm not being offensive by any means, but I' highly critical about that. https://lists.swift.org/pipermail/swift-evolution/Week-of-Mon-20170417/035931.html
https://lists.swift.org/pipermail/swift-evolution/Week-of-Mon-20170417/035934.html I don't want to sit around and watch how we'll introduce a half baked multi-line string literal. |
In case you insist for a bug issue: https://bugs.swift.org/browse/SR-4874 |
Adrian, you have a subjective opinion that trailing whitespace is critical that didn’t win over the majority of the thread on this proposal. It seems like you're a bit confused as to the idea behind my follow up proposal about newline escapes (elided newlines) I'd not intended it to have anything to do with helping make explicit trailing whitespace. |
A prototype implementation for the proposal 0168 as discussed on the swift evolution thread over the last week. It has been tested inside the Xcode source editor is completely functional as a reference implementation bar any changes that would be need to be made to other parts of the toolchain which seem to be minimal. There is a new test file.
Resolves #42792.