Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: escape sequences in text-blocks are kept #4409

Merged
merged 5 commits into from
Jan 11, 2022

Conversation

xzel23
Copy link
Contributor

@xzel23 xzel23 commented Jan 6, 2022

Fixes #4408
Besides adding the (currently disabled) test case, I changed the Junit imports and fixed the parameter order of the assertEquals calls.

@xzel23 xzel23 changed the title test: escape sequences in text-blocks are not retained (bug #4408) fix #4408: escape sequences in text-blocks are not retained in code Jan 7, 2022
@xzel23
Copy link
Contributor Author

xzel23 commented Jan 7, 2022

The failure in the "Extra Checks" step seems to be a technical problem and not related to the changes:
Error: OpenJDK 64-Bit Server VM warning: Sharing is only supported for boot loader classes because bootstrap classpath has been appended

@xzel23 xzel23 changed the title fix #4408: escape sequences in text-blocks are not retained in code review: fix #4408: escape sequences in text-blocks are not retained in code Jan 7, 2022
* @return source code representation of the literal
*/
public static String getTextBlockToken(CtTextBlock literal) {
String token = "\"\"\"\n"
+ literal.getValue()
+ literal.getValue().replaceAll("\\\\", "\\\\\\\\")
Copy link
Collaborator

@MartinWitt MartinWitt Jan 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this work? From reading the code, I get no clue why.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It escapes each single backslash by replacing it with a double backslash. It just looks weird because each backslash has to be escaped twice, once for the string literal and then again in the regular expression.

Copy link
Contributor

@algomaster99 algomaster99 Jan 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It escapes each single backslash by replacing it with a double backslash.

Would it not be sufficient to write .replaceAll("\", "\\")? This is what I can infer from your explanation.

Copy link
Contributor Author

@xzel23 xzel23 Jan 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@algomaster99 no, that's a compile time error. The backslash inside the quotes has to be escaped. Then you have a regex consisting of only a backslash. That is an error because the backslash has to be escaped in the regex. So you need another backslash escaping that one, which has to be escaped inside the quotes. That's why you need four backslashes to represent a single backslash in the search regex.

The same then applies for the replacement, so this makes it four backslashes in the search string and eight backslashes in the replacement string.

But you have a point in that we don't really need a regex here. So we can use String.replace(CharSequence, CharSequence) instead (note the different method name), which does not use regular expressions. This will be both faster and require only half the escaping. I have updated the PR.

// test text block value
CtTextBlock l1 = (CtTextBlock) stmt5.getValueByRole(CtRole.ASSIGNMENT);
assertEquals("no-break space: \\00a0\n" +
"newline: \\n\n" +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using replace("\\", "\\\\") would only escape the backslashes and not \n because the latter is treated as a single character and not a combination of \ + n. For example, if you run replace("\\", "\\\\") on \\\n, it would return \\\\\n. Notice that \n is not escaped. I think it would be better if we write a generic function to escape all characters or use one which already exists. I have something like this in mind:

public static String escape(String s){
  return s.replace("\\", "\\\\")
          .replace("\t", "\\t")
          .replace("\b", "\\b")
          .replace("\n", "\\n")
          .replace("\r", "\\r")
          .replace("\f", "\\f")
          .replace("\'", "\\'")
          .replace("\"", "\\\"");
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as in the source code it's written as "\n" it works because at that time it's still two characters. It is not supposed to escape the newline at the end of a line in a textblock. The unit test for this passes and the original problem I had with spoon is also solved by this patch, so I am quote sure it works.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as in the source code it's written as "\n" it works because at that time it's still two characters.

Okay. So do we just want to escape backslashes when we put the literal inside a text block?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, "normal" String literals are handled in LiteralHelper.getLiteralToken(), which in turn calls LiteralHelper.getStringLiteral(). But if we use that one, all newlines, tabs etc. are replaced by their respective escape sequences and all textblocks would be printed on a single line.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. Your patch looks good to me in that case.

@monperrus monperrus changed the title review: fix #4408: escape sequences in text-blocks are not retained in code fix: escape sequences in text-blocks are kept Jan 11, 2022
@monperrus monperrus merged commit 8af9cce into INRIA:master Jan 11, 2022
@monperrus
Copy link
Collaborator

Thanks a lot @xzel23 for the patch and @algomaster99 for the review.

@xzel23 xzel23 deleted the textblock-bug branch January 12, 2022 16:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] escape sequences in text-blocks are not retained
4 participants