-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sanitize multiple spaces in display names to protect against some security concerns #5703
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
works on both native and web
|
It's possible to bypass this regex by interleaving zero-width codepoints (e.g. U+200B) with whitespace, for example: " " const CHECK_MARKS_RE = /[\u2705\u2713\u2714\u2611]/gu
const CONTROL_CHARS_RE =
/[\u0000-\u001F\u007F-\u009F\u061C\u200E\u200F\u202A-\u202E\u2066-\u2069]/g
const MULTIPLE_SPACES_RE = /[\s][\s]+/g
console.log("hello world".replace(CHECK_MARKS_RE, '').replace(CONTROL_CHARS_RE, '').replace(MULTIPLE_SPACES_RE, ' ').trim()) |
@DavidBuchanan314 appreciate it #5729 |
What was the security concern? |
Using whitespace to push the domain handle off screen so you can impersonate people |
Since this is a change to the if that's the case the api docs for |
@pfrazee still managed to bypass it, btw: test-acc.bsky.social I think trying to validate like this is not the best idea since there are so many damn characters that can be used to do this (here's a regex I made to validate spaces, although not every character might fit this scenario since our use case was slightly different, albeit similar: ensureMultilineValid.ts#L2) Display names are so problematic... my suggestion is follow what we're doing there and truncate the display name rather than the handle: psky-atp/client@dd48b6a For reference, the character I used in the bypass is Hangul Filler, but I bet there are other characters such as Mongolian Vowel Separator which might also work. Some of these characters probably can't even be just replaced out without breaking some stuff, considering that they have legitimate use cases in some languages. |
Thanks so much @oestradiol. The only reason I haven't gone with that approach is because that kind of layout control is wildly finicky across all the react-native platforms, but it sounds like we just have to accept reality and attack it from that direction. Really appreciate the pointers. You win again, unicode. |
You're welcome, Paul. Good luck everyone, you got this! <3 |
Please bear with me to the end. The design of Javascript purposefully excludes whitespace and control characters outside of a certain scope, as outlined in
NB also the following paragraph from 12.3 "Line Terminators":
ECMA-262 Annex F notes:
The conclusion then would be that there are intentional limitations of what HoweverJavascripts regexp engine, in unicode aware mode (eg. Given that the code already attempts to strip out control characters with a rather daunting capture class range:
And that all of the above problematic characters are in the unicode control class category, If no control characters are desired, lines 8-9 could be replaced with: const CONTROL_CHARS_RE = /\p{C}/gu if some control chars are desired I would highly recommend whitelisting those in: const CONTROL_CHARS_RE = /\p{C}/gu //node automatically orders these gu, which is cute, so I pronounce it guh
const CONTROL_CHARS_WHITELIST_RE = /[\u0600-\u08E2]/ //e.g. Arabic format characters, g is not needed for now, but can be used
const MULTIPLE_SPACES_RE = /\s{2,}/g //two or more captures of whitespace
///[...]
return str
.replace(CHECK_MARKS_RE, '')
.replace(CONTROL_CHARS_RE, (reMatchStr) =>
reMatchStr.match(CONTROL_CHARS_WHITELIST_RE) ? reMatchStr : '' )
.replace(MULTIPLE_SPACES_RE, ' ')
.trim() If The above replace function argument can be defined elsewhere to improve readability and formatting. The basic structure is that every control char match is checked to see if it's permitted, and if it is, passed through. Because it breaks down to a ternary, the truthy const controlCharsWhitelist = new Set([
"\uXXXX", //a reason why this is allowed
// ...
]); As all the zero width and nonprinting spaces listed in the ticket's feedback are all found in the control character category, the rough solution here provided would likely fit your needs. Any questions are welcome, or feedback on the presentational or design aspects outlined here. I personally disagree with the approach of truncation in this PR and in the above code, and think you should use the layout control option, however. |
@JamesKoenig wow, thanks for the valuable input! I didn't know about that, and it's also gonna help me in my application. Here's the final regex I'll be using, if anyone ever comes across this: |
Adds a display name sanitation rule which reduces runs of spaces into a single space.
With apologies to rem.