An implementation for 0168-multi-line-string-literals.md #8813

johnno1962 · 2017-04-17T14:12:11Z

A prototype implementation for the proposal 0168 as discussed on the swift evolution thread over the last week. It has been tested inside the Xcode source editor is completely functional as a reference implementation bar any changes that would be need to be made to other parts of the toolchain which seem to be minimal. There is a new test file.

Resolves #42792.

Four of these new tests are currently disabled (by commenting them out) because they fail to compile. I’ll tackle them shortly.

To tell the truth, I had no idea tabs were forbidden in the first place.

Tests and a bug fix

…ping

xwu · 2017-04-19T22:45:31Z

lib/Parse/Lexer.cpp

+  // work back from the end to find whitespace to strip
+  while (start > Bytes.begin() && isWhitespace(start[-1])) {
+    if (*--start == '\n' || *start == '\r') {
+      if (start[-1] == '\r')


This is not right, I think: \r\r is two classic Mac newlines, not one. Surely, what you want is this:

switch (*--start) { case '\n': if (start[-1] == '\r') --start; LLVM_FALLTHROUGH; case '\r': return std::string(start, end-start); }

xwu · 2017-04-19T22:46:01Z

include/swift/Parse/Lexer.h

@@ -397,12 +402,13 @@ class Lexer {
  /// If a copy needs to be made, it will be allocated out of the provided
  /// Buffer.
  static StringRef getEncodedStringSegment(StringRef Str,
-                                           SmallVectorImpl<char> &Buffer);
+                                           SmallVectorImpl<char> &Buffer,
+                                           unsigned Modifiers = 0, const std::string &ToReplace = "");


Nit: 80-character lines, here and throughout.

xwu · 2017-04-19T22:46:45Z

include/swift/Parse/Lexer.h

@@ -505,6 +511,9 @@ class Lexer {
  /// Try to lex conflict markers by checking for the presence of the start and
  /// end of the marker in diff3 or Perforce style respectively.
  bool tryLexConflictMarker();
+
+  // new for multiline string literals


Nit: capitalize and punctuate comments, here and throughout.

xwu · 2017-04-19T22:47:50Z

lib/Parse/Lexer.cpp

  case '\n':  // String literals cannot have \n or \r in them.
  case '\r':
+    if (Modifiers & StringLiteralMultiline) // ... unless they are mutli-line


Nit, typo. (Also, decide if you want to write "multi-line" or "multiline" throughout, and camel-case accordingly in the code.)

xwu · 2017-04-19T22:48:34Z

lib/Parse/Lexer.cpp

@@ -1192,6 +1207,15 @@ unsigned Lexer::lexCharacter(const char *&CurPtr, char StopQuote,
  case '"': ++CurPtr; return '"';
  case '\'': ++CurPtr; return '\'';
  case '\\': ++CurPtr; return '\\';
+  case '\n':
+    LLVM_FALLTHROUGH;


Per existing code in these files, no need for LLVM_FALLTHROUGH; when your case is otherwise empty.

xwu · 2017-04-19T22:50:14Z

lib/Parse/Lexer.cpp

+}
+
+/// determine contents of literal to be normalised - either
+/// to strip indenting or normalise line endings to a single \n


Nit: while you're capitalizing and re-wrapping these comments to 80 characters, U.S. English spellings. (Yes, I know; I'm Canadian, but them's the rules...)

xwu · 2017-04-19T22:54:29Z

lib/Parse/Lexer.cpp

+  }
+
+  // are there windows line endings in the source, if so return it to be replaced
+  const char *windowsLinesep = strnstr(Bytes.begin(), "\r\n", Bytes.end()-Bytes.begin());


A nit and a more substantive comment.

Nit: windowsLineSeparator (capitalize).

More substantive comment: strnstr isn't portable, is it? Ironic that this won't work on Windows...

xwu · 2017-04-19T22:55:18Z

lib/Parse/Lexer.cpp

+  while ((BytesPtr = (const char *)memchr(BytesPtr, '\n', Bytes.end()-BytesPtr)) != nullptr) {
+    const char *NextPtr = BytesPtr + 1;
+    if (*NextPtr != '\n' && *NextPtr != '\r') {
+      if (BytesPtr[-1] == '\r')


See previous comment about how this treats \r\r.

xwu · 2017-04-19T22:55:32Z

lib/Parse/Lexer.cpp

@@ -1324,8 +1404,15 @@ void Lexer::lexStringLiteral() {
  // NOTE: We only allow single-quote string literals so we can emit useful
  // diagnostics about changing them to double quotes.

-  bool wasErroneous = false;
-
+  bool wasErroneous = false, wasWhitespace = false, allWhitespace = true;


Nit: wasAllWhitespace, surely.

kubamracek · 2017-04-21T18:44:33Z

test/stdlib/MultilineString.swift

+// CHECK: -2-
+print("-2-")
+// SKIP-CHECK-NEXT: <"Two Beta">
+//print(delimit(""""Two Beta""""))


Why is this commented out?

Can we add a test for two quotes in the middle of the string?

kubamracek · 2017-04-21T18:46:09Z

lib/Parse/Lexer.cpp

+    BytesPtr = NextPtr;
+  }
+}
+
 /// lexStringLiteral:
 ///   string_literal ::= ["]([^"\\\n\r]|character_escape)*["]


Update this comment

kubamracek · 2017-04-21T18:49:13Z

lib/Parse/Lexer.cpp

+  // Strips any indent that corresponds to the indent
+  // of the multi-line string terminating line and
+  // normalises line endings in the source to \n.
+  // It also removes any intial empty line.


Wrap comments at 80 lines.

kubamracek · 2017-04-21T18:49:27Z

lib/Parse/Lexer.cpp

+  bool wasErroneous = false, wasWhitespace = false, allWhitespace = true;
+  unsigned Modifiers = 0;
+
+  // is this the start of a multiline string litersl


kubamracek · 2017-04-21T20:20:41Z

test/stdlib/MultilineString.swift

+// CHECK: -4-
+print("-4-")
+// CHECK-NEXT: <FourDelta>
+print(delimit("""Four\


As mentioned on swift-evolution, this should be disallowed.

kubamracek · 2017-04-21T20:29:49Z

test/stdlib/MultilineString.swift

+
+// CHECK: -14-
+print("-14-")
+// CHECK-WARNINGS: warning: invalid mix of multi-line string literal indentation


Should be an error instead of a warning.

kubamracek · 2017-04-21T20:30:43Z

test/stdlib/MultilineString.swift

+
+// CHECK: -11-
+print("-11-")
+// CHECK-WARNINGS: warning: invalid mix of multi-line string literal indentation


Should be an error.

kubamracek · 2017-04-21T20:30:47Z

test/stdlib/MultilineString.swift

+// CHECK: -12-
+print("-12-")
+// Note: The next few tests use physical tab characters, not spaces.
+// CHECK-WARNINGS: warning: invalid mix of multi-line string literal indentation


Should be an error.

kubamracek · 2017-04-21T20:30:53Z

test/stdlib/MultilineString.swift

+
+// CHECK: -13-
+print("-13-")
+// CHECK-WARNINGS: warning: invalid mix of multi-line string literal indentation


Should be an error.

kubamracek · 2017-04-22T01:06:02Z

Hi, @johnno1962, thanks for sending this patch! Do you think you'll have time to address the review comments? I'd like to get this into Swift 4, and we're getting close to the deadline for it.

johnno1962 · 2017-04-23T11:56:39Z

Thanks @xwu and @kubamracek for the comments which I have endevoured to address with this last commit. This should be compliant with the core teams decision except that it includes a small amount of code to implement a new proposal https://github.com/johnno1962c/swift-evolution/blob/master/proposals/0173-newline-escape-in-strings.md to allow escaping of newlines in all strings. This code is around line 1212 of Lexer.cpp if you want to remove it.

There are limitations to the implementation in that while it will normalise end-of-line to \n for sources that use \n, \r\n or \r as the line separator, they can not be mixed within a literal and have this work.

kubamracek · 2017-04-23T16:17:07Z

@johnno1962, could you extract the newline escaping feature into a separate pull request? Since it's a separate proposal that wasn't accepted yet, we shouldn't block merging this PR on the other feature.

xwu

Lots of nits, sorry.

My main feedback would be that I'd like to see more tests. It'd be very important to guarantee that your line normalization code does not strip manually escaped \r\n (or, for that matter, \r + literal newline).

Also, what can be done about this limitation as to normalization of mixed newlines?

xwu · 2017-04-23T16:21:11Z

include/swift/AST/DiagnosticsParse.def

 ERROR(lex_unicode_escape_braces,none,
      "expected hexadecimal code in braces after unicode escape", ())
+ERROR(lex_illegal_multiline_string_start,none,
+      "inavlid start of multi-line string literal", ())


xwu · 2017-04-23T16:21:22Z

include/swift/AST/DiagnosticsParse.def

+ERROR(lex_illegal_multiline_string_start,none,
+      "inavlid start of multi-line string literal", ())
+ERROR(lex_illegal_multiline_string_end,none,
+      "inavlid end of multi-line string literal", ())


xwu · 2017-04-23T16:22:10Z

include/swift/AST/DiagnosticsParse.def

+ERROR(lex_ambiguous_string_indent,none,
+      "invalid mix of multi-line string literal indentation", ())
+WARNING(lex_trailing_multiline_whitespace,none,
+        "includes trailing space characters in multi-line string literal", ())


Warning on trailing whitespace is not a part of the approved proposal; please also split into separate PR.

xwu · 2017-04-23T16:23:16Z

include/swift/Parse/Lexer.h

@@ -505,6 +512,9 @@ class Lexer {
  /// Try to lex conflict markers by checking for the presence of the start and
  /// end of the marker in diff3 or Perforce style respectively.
  bool tryLexConflictMarker();
+
+  /// New for multiline string literals


Comments should have punctuation; also, please describe what it does and not just label it as new.

xwu · 2017-04-23T16:23:26Z

include/swift/Parse/Token.h

@@ -46,6 +46,9 @@ class Token {
  /// \brief Whether this token is an escaped `identifier` token.
  unsigned EscapedIdentifier : 1;

+  /// modifiers for string literals


Nit: capitalize and punctuate.

xwu · 2017-04-23T16:57:04Z

lib/Parse/Lexer.cpp

  Segments.push_back(
      StringSegment::getLiteral(getSourceLoc(SegmentStartPtr),
-                                Bytes.end()-SegmentStartPtr));
+                                Bytes.end()-SegmentStartPtr, Modifiers, ToReplace));


80-character lines please.

xwu · 2017-04-23T17:02:46Z

test/stdlib/MultilineString.swift

+
+// ===---------- Done --------===
+// CHECK-NEXT: Done.
+print("Done.")


Can you add more tests to ensure that the errors and warnings you have added behave as intended?

Some other ideas:

Check that only one leading newline and one trailing newline is stripped.

Check that all other newlines are preserved, including multiple consecutive ones.

Check that manually escaped \t, \r, \n, \\ are all possible and correctly lexed, including at the end of a line.

Check that a manually escaped \r\n is not normalized to \n.

Check that string literal interpolation works correctly.

Check that invalid ragged leading indents trigger the expected error.

Check that escaping \""" works correctly.

Also, I would like to see, for the purposes of this particular implementation, that end-of-line newlines after \ are preserved and that trailing whitespace is correctly preserved.

Thanks @xwu I’ve actioned most of your nits. “wasAllWhitespace” is there to be able to generate errors when there is non-whitespace before the closing delimiter. @kubamracek this commit should be in line with the proposal as accepted. I’ll put newline escapes back in as a separate PR if required. Looking at more tests while the toolchain builds.

One typo left :)

Why can find-and-replace operations not simply replace all literal \r with \n and all literal \r\n with \n?

I could do it the other way around I suppose. It tries to do all the replacing including indent stripping with a single loop due to how the code developed. Seems OK as it is apart from exotic mixes of line endings inside a single literal.

I can see the logic behind doing indent stripping in a single loop, because you need to get to the end of the literal before you know how much to strip. But, without having thought too deeply, it would seem that normalizing line endings can happen line-by-line as you go.

Oh, and your logic isn't stripping leading whitespace from escaped newlines, is it?

""" \n \n """

...should give "\n \n" and not "\n\n".

Indent stripping is done on program text not expanded escapes which happens afterwards

Good. I assumed so--just wanted to check.

xwu · 2017-04-23T17:03:22Z

include/swift/Parse/Lexer.h

+    std::string ToReplace;
+
+    static StringSegment getLiteral(SourceLoc Loc, unsigned Length,
+                                    unsigned Modifiers, const std::string &ToReplace) {


80-character lines, please.

xwu · 2017-04-23T17:03:38Z

include/swift/Parse/Lexer.h

@@ -495,7 +502,7 @@ class Lexer {
  static unsigned lexUnicodeEscape(const char *&CurPtr, Lexer *Diags);

  unsigned lexCharacter(const char *&CurPtr,
-                        char StopQuote, bool EmitDiagnostics);
+                        char StopQuote, bool EmitDiagnostics, unsigned Modifiers = 0);


80-character lines, please.

xwu · 2017-04-23T17:04:03Z

include/swift/Parse/Token.h

@@ -273,11 +276,17 @@ class Token {
  void setText(StringRef T) { Text = T; }

  /// \brief Set the token to the specified kind and source range.
-  void setToken(tok K, StringRef T, unsigned CommentLength = 0) {
+  void setToken(tok K, StringRef T, unsigned CommentLength = 0, unsigned Modifiers = 0) {


80-character lines, please.

johnno1962 · 2017-04-26T00:13:49Z

One more time 🙄

milseman · 2017-04-26T00:13:53Z

@johnno1962

/Users/buildnode/jenkins/workspace/swift-PR-osx-smoke-test/branch-master/swift/lib/Parse/Lexer.cpp:1365:25: error: use of undeclared identifier 'getSourceLoc'; did you mean 'swift::Lexer::getSourceLoc'?
        Diags->diagnose(getSourceLoc(start),
                        ^~~~~~~~~~~~
                        swift::Lexer::getSourceLoc
/Users/buildnode/jenkins/workspace/swift-PR-osx-smoke-test/branch-master/swift/include/swift/Parse/Lexer.h:433:20: note: 'swift::Lexer::getSourceLoc' declared here
  static SourceLoc getSourceLoc(const char *Loc) {
                   ^
1 error generated.

BTW, you can get a quicker turn around time by running "ninja swift" in your local swift build directory to just build the compiler (without the standard library).

milseman · 2017-04-26T00:16:09Z

@swift-ci please smoke test

johnno1962 · 2017-04-26T00:19:08Z

Thanks for the tip. I’ve been building toolchains. Is there an easy way to build a toolchain without building all architectures?

milseman · 2017-04-26T00:26:23Z

Nothing supported that I know of. Of course, you can intercept the commands and hack something together for yourself, but that's pretty dirty.

If you're just iterating fast on a .cpp file, "ninja swift" will turn around in about 20 seconds or so. You can then use e.g. ./bin/swift -frontend -parse -verify foo.swift for the error message checking for a test.

johnno1962 · 2017-04-26T00:50:19Z

Toolchains weren't so bad. it’s just that if the day changes you have to rebuild llvm + clang. Looks like tests are ok. Merge away 👍

milseman · 2017-04-26T01:14:04Z

🍾

milseman · 2017-04-26T01:24:13Z

SR for diagnostics improvements: https://bugs.swift.org/browse/SR-4701

@kubamracek do you have an SR for multi-line literals inside of interpolations?

johnno1962 · 2017-04-26T12:56:47Z

Thanks @milseman, @kubamracek and particularly @xwu for all your help. Final toolchain:
http://johnholdsworth.com/swift-LOCAL-2017-04-27-a-osx.tar.gz

kubamracek · 2017-04-26T17:24:04Z

https://bugs.swift.org/browse/SR-4708: Add support for multiline strings inside string interpolations

johnno1962 · 2017-04-26T23:31:24Z

@kubamracek I’ve just opened PR #9049 for multiline inside interpolations

DevAndArtist · 2017-05-11T17:49:47Z

Was the warning about trailing whitespaces removed? 😞

let s = """
	abc                                                               
	"""

print(s.characters.count) // prints 66

milseman · 2017-05-11T18:22:48Z

What would be the workaround if trailing whitespace is desired?

DevAndArtist · 2017-05-11T18:24:57Z

@milseman swiftlang/swift-evolution#695

Without the trailing backslash you'd need something like \("").

DevAndArtist · 2017-05-11T18:28:56Z

The example from above is actually this:

let s = """
	abc                                                               \("")
	"""

print(s.characters.count) // prints 66

which has a visible indication that trailing whitespaces are involved. Ideally we still need the trailing backslash.

xwu · 2017-05-11T20:56:07Z

As already discussed on swift-evolution, the accepted proposal does not include warnings about trailing whitespace. This PR correctly implements the proposal as accepted.

DevAndArtist · 2017-05-12T05:09:19Z

@xwu I bet you've included all the points the core team mentioned in the accepted thread, which cases should be errors and which should be allowed, like for instance the blank line without any indent, but you seem exclusively pick things that you like and silently ignore things that you don't like the same way you did during the discussion thread. I'm not being offensive by any means, but I' highly critical about that.

https://lists.swift.org/pipermail/swift-evolution/Week-of-Mon-20170417/035931.html

That seems like a reasonable thing to warn about. [...]

-Joe

https://lists.swift.org/pipermail/swift-evolution/Week-of-Mon-20170417/035934.html

I don't want to sit around and watch how we'll introduce a half baked multi-line string literal.

DevAndArtist · 2017-05-12T05:26:17Z

In case you insist for a bug issue: https://bugs.swift.org/browse/SR-4874

johnno1962 · 2017-05-12T06:47:05Z

Adrian, you have a subjective opinion that trailing whitespace is critical that didn’t win over the majority of the thread on this proposal. It seems like you're a bit confused as to the idea behind my follow up proposal about newline escapes (elided newlines) I'd not intended it to have anything to do with helping make explicit trailing whitespace.

johnno1962 and others added 15 commits April 10, 2017 18:18

prototype implementation of multiline string literals

d37fa1c

tidyup of line ending normalisation

8ad6272

Add multiline string literal tests

12cfc63

Four of these new tests are currently disabled (by commenting them out) because they fail to compile. I’ll tackle them shortly.

Permit tabs in multiline strings

5c960fc

To tell the truth, I had no idea tabs were forbidden in the first place.

Merge pull request #1 from brentdax/multiline-literal

3f52458

Tests and a bug fix

innitial deleimter not required to be at end of line for indent strip…

201acde

…ping

Merge branch 'master' of https://github.com/DoubleSpeak/swift

9575546

initial deleimter not required to be at end of line for indent stripping

9129aaa

reinstate stripping of initial empty line

e36bbd5

warning for whitespace at eand of line

ecccee0

always trim first line

965378f

separate out tests

bccae22

separate out tests

3157849

better comment

5723d97

tweaks

cee194e

johnno1962 mentioned this pull request Apr 17, 2017

Updated description for 0168-multi-line-string-literals.md swiftlang/swift-evolution#685

Closed

fix test

4691cb2

johnno1962 mentioned this pull request Apr 18, 2017

Updated description for 0168-multi-line-string-literals.md swiftlang/swift-evolution#688

Closed

xwu reviewed Apr 19, 2017

View reviewed changes

kubamracek self-requested a review April 20, 2017 21:01

ematejska requested a review from milseman April 21, 2017 17:15

kubamracek reviewed Apr 21, 2017

View reviewed changes

updated to core team decision

ab9f827

xwu reviewed Apr 23, 2017

View reviewed changes

johnno1962 added 2 commits April 23, 2017 19:05

strictly compliant with core team decision

c7d651d

strictly compliant with core team decision

49ec123

jessesquires mentioned this pull request Apr 23, 2017

[67] Issue #67 - April 27, 2017 SwiftWeekly/swiftweekly.github.io#224

Closed

milseman merged commit 981e706 into swiftlang:master Apr 26, 2017

tkremenek mentioned this pull request Apr 27, 2017

[SR-170] SE 0168 Multi-Line String Literals #42792

Closed

milseman mannequin mentioned this pull request Apr 26, 2017

[SR-4701] Starter: better diagnostics for multi-line string literals #47278

Closed

kubamracek mannequin mentioned this pull request Apr 26, 2017

[SR-4708] Starter: Add support for multiline strings inside string interpolations #47285

Closed

An implementation for 0168-multi-line-string-literals.md #8813

An implementation for 0168-multi-line-string-literals.md #8813

Conversation

johnno1962 commented Apr 17, 2017 • edited by AnthonyLatsis Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kubamracek commented Apr 22, 2017

johnno1962 commented Apr 23, 2017 • edited Loading

kubamracek commented Apr 23, 2017

xwu left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xwu Apr 23, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

johnno1962 commented Apr 26, 2017

milseman commented Apr 26, 2017

milseman commented Apr 26, 2017

johnno1962 commented Apr 26, 2017

milseman commented Apr 26, 2017

johnno1962 commented Apr 26, 2017

milseman commented Apr 26, 2017

milseman commented Apr 26, 2017

johnno1962 commented Apr 26, 2017 • edited Loading

kubamracek commented Apr 26, 2017

johnno1962 commented Apr 26, 2017

DevAndArtist commented May 11, 2017

milseman commented May 11, 2017

DevAndArtist commented May 11, 2017 • edited Loading

DevAndArtist commented May 11, 2017

xwu commented May 11, 2017

DevAndArtist commented May 12, 2017

DevAndArtist commented May 12, 2017

johnno1962 commented May 12, 2017

johnno1962 commented Apr 17, 2017 •

edited by AnthonyLatsis

Loading

johnno1962 commented Apr 23, 2017 •

edited

Loading

xwu Apr 23, 2017 •

edited

Loading

johnno1962 commented Apr 26, 2017 •

edited

Loading

DevAndArtist commented May 11, 2017 •

edited

Loading