Sanitize multiple spaces in display names to protect against some security concerns #5703

pfrazee · 2024-10-11T03:09:25Z

Adds a display name sanitation rule which reduces runs of spaces into a single space.

Before	After

With apologies to rem.

…urity concerns

haileyok

works on both native and web

github-actions · 2024-10-11T03:15:51Z

Old size	New size	Diff
7.89 MB	7.89 MB	32 B (0.00%)

DavidBuchanan314 · 2024-10-11T13:49:51Z

It's possible to bypass this regex by interleaving zero-width codepoints (e.g. U+200B) with whitespace, for example: " "

const CHECK_MARKS_RE = /[\u2705\u2713\u2714\u2611]/gu
const CONTROL_CHARS_RE =
  /[\u0000-\u001F\u007F-\u009F\u061C\u200E\u200F\u202A-\u202E\u2066-\u2069]/g
const MULTIPLE_SPACES_RE = /[\s][\s]+/g
console.log("hello              world".replace(CHECK_MARKS_RE, '').replace(CONTROL_CHARS_RE, '').replace(MULTIPLE_SPACES_RE, ' ').trim())

…3 (comment))

pfrazee · 2024-10-11T20:58:48Z

@DavidBuchanan314 appreciate it #5729

…3 (comment)) (#5729)

JamesKoenig · 2024-10-12T00:09:22Z

What was the security concern?

pfrazee · 2024-10-12T00:24:07Z

Using whitespace to push the domain handle off screen so you can impersonate people

JamesKoenig · 2024-10-12T00:48:09Z

Since this is a change to the social-app frontend, usernames with tons of spaces will still be stored on the backend, no?
Meaning other clients using the service (or using the at protocol at large) would still have to face the same sorts of concerns?

if that's the case the api docs for app-bsky-actor-get-profile already state that the username could be 640 characters long, which should be enough of a warning for username length to be truncated appropriately, but it might be a good idea to have a style guide (if I missed one), that tells third party client developers to always have the domain handle visible (given that impersonation is a concern).

oestradiol · 2024-10-13T20:33:59Z

@pfrazee still managed to bypass it, btw: test-acc.bsky.social

I think trying to validate like this is not the best idea since there are so many damn characters that can be used to do this (here's a regex I made to validate spaces, although not every character might fit this scenario since our use case was slightly different, albeit similar: ensureMultilineValid.ts#L2)

Display names are so problematic... my suggestion is follow what we're doing there and truncate the display name rather than the handle: psky-atp/client@dd48b6a

For reference, the character I used in the bypass is Hangul Filler, but I bet there are other characters such as Mongolian Vowel Separator which might also work. Some of these characters probably can't even be just replaced out without breaking some stuff, considering that they have legitimate use cases in some languages.

pfrazee · 2024-10-14T19:47:43Z

Thanks so much @oestradiol. The only reason I haven't gone with that approach is because that kind of layout control is wildly finicky across all the react-native platforms, but it sounds like we just have to accept reality and attack it from that direction.

Really appreciate the pointers. You win again, unicode.

oestradiol · 2024-10-14T22:58:10Z

You're welcome, Paul. Good luck everyone, you got this! <3

JamesKoenig · 2024-10-14T23:24:41Z

@pfrazee

Really appreciate the pointers. You win again, unicode.

Please bear with me to the end.

The design of Javascript purposefully excludes whitespace and control characters outside of a certain scope, as outlined in ECMA-262 12.2 "White space", Note 2:

NOTE 2 Other than for the code points listed in Table 36, ECMAScript WhiteSpace intentionally excludes all code points that have the Unicode “White_Space” property but which are not classified in general category “Space_Separator” (“Zs”)

NB also the following paragraph from 12.3 "Line Terminators":

Line terminators are included in the set of white space code points that are matched by the \s class in regular expressions

ECMA-262 Annex F notes:

12.2: In ECMAScript 2016, Unicode 8.0.0 or higher is mandated, as opposed to ECMAScript 2015 which mandated Unicode 5.1. In particular, this caused U+180E MONGOLIAN VOWEL SEPARATOR, which was in the Space_Separator (Zs) category and thus treated as whitespace in ECMAScript 2015, to be moved to the Format (Cf) category (as of Unicode 6.3.0). This causes whitespace-sensitive methods to behave differently. For example, "\u180E".trim().length was 0 in previous editions, but 1 in ECMAScript 2016 and later. Additionally, ECMAScript 2017 mandated always using the latest version of the Unicode Standard

The conclusion then would be that there are intentional limitations of what \s is intended to capture, which a pathological user uploading arbitrary unicode strings may take advantage of, which makes attempt to stop them futile.

However

Javascripts regexp engine, in unicode aware mode (eg. /.+/u) has a different tool we can use to address this: the character class escape \p{}, (ECMA-262 22.2.1 Patterns, in the bnf under CharacterClassEscape). For a list of categories one can consult https://www.unicode.org/reports/tr44/#GC_Values_Table (though the paragraph above it is salient and a good reference). A more interactive version that more easily presents which characters can be seen in those categories is https://www.compart.com/en/unicode/category .

Given that the code already attempts to strip out control characters with a rather daunting capture class range:

social-app/src/lib/strings/display-names.ts

Line 9 in 725e8c1

/[\u0000-\u001F\u007F-\u009F\u061C\u200E\u200F\u202A-\u202E\u2066-\u2069]/g

And that all of the above problematic characters are in the unicode control class category, If no control characters are desired, lines 8-9 could be replaced with:

const CONTROL_CHARS_RE = /\p{C}/gu

if some control chars are desired I would highly recommend whitelisting those in:

const CONTROL_CHARS_RE = /\p{C}/gu  //node automatically orders these gu, which is cute, so I pronounce it guh
const CONTROL_CHARS_WHITELIST_RE = /[\u0600-\u08E2]/ //e.g. Arabic format characters, g is not needed for now, but can be used
const MULTIPLE_SPACES_RE = /\s{2,}/g //two or more captures of whitespace
///[...]
    return str
      .replace(CHECK_MARKS_RE, '')
      .replace(CONTROL_CHARS_RE, (reMatchStr) =>
              reMatchStr.match(CONTROL_CHARS_WHITELIST_RE) ? reMatchStr : '' )
      .replace(MULTIPLE_SPACES_RE, ' ')
      .trim()

If \s, being defined by ecma language needs, and the unicode separator categories differ, /\s{2,}/g above could be replaced with /\p{Z}{2,}/gu to showcase explicit intention, though it is harder to immediately understand intent and should have a comment.

The above replace function argument can be defined elsewhere to improve readability and formatting. The basic structure is that every control char match is checked to see if it's permitted, and if it is, passed through. Because it breaks down to a ternary, the truthy String.prototype.match call could be replaced with a Set.prototype.has call, where a set of what control characters are permissible could be outlined as follows:

const controlCharsWhitelist = new Set([
 "\uXXXX", //a reason why this is allowed
 // ...
]);

As all the zero width and nonprinting spaces listed in the ticket's feedback are all found in the control character category, the rough solution here provided would likely fit your needs.

Any questions are welcome, or feedback on the presentational or design aspects outlined here.

I personally disagree with the approach of truncation in this PR and in the above code, and think you should use the layout control option, however.

oestradiol · 2024-10-15T01:43:06Z

@JamesKoenig wow, thanks for the valuable input! I didn't know about that, and it's also gonna help me in my application.
I managed to clean up my regex a lot by replacing most of it with simply \p{C}\p{Z}\p{M}. That being said, I still think this is not enough for Bluesky, because for some reason some characters such as U+3164, U+115F and U+1160 are classified as... letters..? While doing my research, I found this source, if that helps at all.

Here's the final regex I'll be using, if anyone ever comes across this:
/[\p{C}\p{Z}\p{M}\u{115F}\u{1160}\u{2800}\u{3164}\u{FFA0}\u{FFFC}\u{1D159}]/gu

Sanitize multiple spaces in display names to protect against some sec…

8d2e9e9

…urity concerns

arcalinea temporarily deployed to fix-display-name-sanitization - social-app PR #5703 October 11, 2024 03:09 — with Render Destroyed

haileyok approved these changes Oct 11, 2024

View reviewed changes

pfrazee merged commit 7677bb3 into main Oct 11, 2024
6 checks passed

pfrazee deleted the fix-display-name-sanitization branch October 11, 2024 05:14

pfrazee added a commit that referenced this pull request Oct 11, 2024

Protect against zero-width chars in display name sanitation (see #570…

725e8c1

…3 (comment))

pfrazee mentioned this pull request Oct 11, 2024

Protect against zero-width chars in display name sanitation #5729

Merged

pfrazee added a commit that referenced this pull request Oct 11, 2024

Protect against zero-width chars in display name sanitation (see #570…

f7852d0

…3 (comment)) (#5729)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sanitize multiple spaces in display names to protect against some security concerns #5703

Sanitize multiple spaces in display names to protect against some security concerns #5703

pfrazee commented Oct 11, 2024 •

edited

Loading

haileyok left a comment

github-actions bot commented Oct 11, 2024

DavidBuchanan314 commented Oct 11, 2024

pfrazee commented Oct 11, 2024

JamesKoenig commented Oct 12, 2024

pfrazee commented Oct 12, 2024

JamesKoenig commented Oct 12, 2024

oestradiol commented Oct 13, 2024 •

edited

Loading

pfrazee commented Oct 14, 2024

oestradiol commented Oct 14, 2024

JamesKoenig commented Oct 14, 2024 •

edited

Loading

oestradiol commented Oct 15, 2024 •

edited

Loading

Sanitize multiple spaces in display names to protect against some security concerns #5703

Sanitize multiple spaces in display names to protect against some security concerns #5703

Conversation

pfrazee commented Oct 11, 2024 • edited Loading

haileyok left a comment

Choose a reason for hiding this comment

github-actions bot commented Oct 11, 2024

DavidBuchanan314 commented Oct 11, 2024

pfrazee commented Oct 11, 2024

JamesKoenig commented Oct 12, 2024

pfrazee commented Oct 12, 2024

JamesKoenig commented Oct 12, 2024

oestradiol commented Oct 13, 2024 • edited Loading

pfrazee commented Oct 14, 2024

oestradiol commented Oct 14, 2024

JamesKoenig commented Oct 14, 2024 • edited Loading

However

oestradiol commented Oct 15, 2024 • edited Loading

pfrazee commented Oct 11, 2024 •

edited

Loading

oestradiol commented Oct 13, 2024 •

edited

Loading

JamesKoenig commented Oct 14, 2024 •

edited

Loading

oestradiol commented Oct 15, 2024 •

edited

Loading