Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ascii=true and fullhex=true flags for escape_string #55099

Merged
merged 2 commits into from
Aug 1, 2024

Conversation

stevengj
Copy link
Member

@stevengj stevengj commented Jul 10, 2024

This PR adds two new optional keyword flags ascii=true and fullhex=true to the escape_string function, both of which default to false (= current behavior).

If ascii=true is passed, then all non-ASCII characters are escaped. If fullhex=true is passed, then \u and \U escapes are printed with 4- and 8-digit hex values, respectively (instead of omitting leading zeros).

Motivation:

  • I often find myself wanting to escape non-ASCII characters in order to see more easily what codepoints a string contains, both for debugging/pedagogy and for literal strings in code where character normalization is nonobvious. For example, if you want to be explicit about whether the string "äöü" is written in NFC normalization "\u00e4\u00f6\u00fc" or NFD normalization "a\u0308o\u0308u\u0308", and don't want to run the risk of an editor accidentally re-normalizing the string for you (as happens in some browsers), you have to write it escaped.
  • printing the full 4/8-digit hex values is important if you want to print the string in a form that is compatible with C or other C-like languages.

@stevengj stevengj added unicode Related to unicode characters and encodings strings "Strings!" labels Jul 10, 2024
Copy link
Member

@fingolfin fingolfin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

@IanButterworth IanButterworth merged commit 0f51a63 into master Aug 1, 2024
12 checks passed
@IanButterworth IanButterworth deleted the sgj/more_escapes branch August 1, 2024 04:13
lazarusA pushed a commit to lazarusA/julia that referenced this pull request Aug 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
strings "Strings!" unicode Related to unicode characters and encodings
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants