-
Notifications
You must be signed in to change notification settings - Fork 795
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parser vulnerable to Trojan Source attack #12352
Comments
I'd like to have a CI step that rules out that code making it into FSC and derivatives, don't have any such unicode hidden thing. |
It is certainly a novel attack, and one that open source repo's are going to be particularly sensitive to ... we will certainly have to figure out how to react to it. My naiive first thoughts are we need to ensure that:
/cc @dsyme , @vzarytovskii , @jonsequitur , @brettfo , @smoothdeveloper |
Oh the slippery slope of banning more and more Unicode characters from source code. Figure 1. Using a zero-width space. let access_level = "user"
if access_level <> "user" then
printf "You are an admin."
Figure 2. The homoglyphic attack as outlined in the same paper, here we use U+0435 CYRILLIC SMALL LETTER IE let access_level = "user"
if access_level <> "usеr" then
printf "You are an admin."
Figure 3. Another homoglyphic attack, here we have U+1D5BE Mathematical Sans-Serif Small E let access_level = "user"
if access_level <> "us𝖾r" then
printf "You are an admin."
Let's go ban all these characters in source! I can't wait for me having to escape all the special characters when pasting online text. Oh, resource files and text configurations will still face the same attacks anyways. Let's ban these characters from all text readers and config readers too. These characters can burn in hell! /s The proper way to prevent these is
Untested code invite bugs, and such so-called attacks can easily be prevented with proper architecture and testing. |
there are enough magic strings in the compiler, and probably no easy way to figure out if they could create issues such as showcase of that paper. To me, it seems good enough as first measure, that we have sanity check in the CI to rule out that the current and future code exhibit those tricks; I think fixing the compiler itself is another issue, and things @KevinRansom has outlined above gives a good overview of an approach that can be refined once it is set in motion. Step 3 should be a priority and the rest can probably wait to see how the industry is adjusting. Aside, there is also this paper: https://github.com/QiushiWu/qiushiwu.github.io/raw/main/papers/OpenSourceInsecurity.pdf that caused similar stir. I'm all in favor of adding static checks & all, but not so much about approach around "gauging contributions safety based on what owner knows about contributor" (which the paper kind of hints towards, as increasing security); contributions should be taken from whoever, without need for contributor to disclose anything, just on basis of merrit / adequation of the change with the evolution of the language.
I understand the concern, as a person also expressing myself beyond the realm of ASCII :) |
I'd like to see GitHub roll out a solution across the entire platform. |
If getting upstream support works just like that, I'll add, I'd like to see peace on earth :) |
Yeah, it makes sense for tooling to solve it in first place IMO, "fixing" it in compiler will mean a (potential) huge breaking change. I'm not saying we should do nothing though. I guess first step should be a CI leg which will be checking a PR for control characters. |
Oh I've no problem with solving it in the compiler and we should. Absolutely no one uses these characters in source code and if they do I don't care if we force them to use But the problem is so systemic it needs to be solved in GitHub too, who should immediately scan all of GitHub for these characters and emit dependabot issue warnings |
@dsyme Well I'd hate to have certain characters be banned just because they can be abused. You close one avenue of attack, you leave lots more open, this is not fixing anything, unless you ban all the characters as in my comment too. |
@smoothdeveloper I am okay with individual code base owners banning certain characters, anyone can enforce any coding standard. However to ban characters on the compiler level is another thing - even tabs are allowed in string literals and comments. |
To be clear, such characters would be allowed in strings using I'd have no particular problem with banning them in comments, though probably aren't as significant there. |
There is an official Microsoft response to this issue, I provide the text here: Microsoft was made aware of this issue prior to the disclosure, and MSRC made the recommendation that this be addressed in future releases of our code editors. Per our bug bar (https://aka.ms/windowsbugbar) and internal policies, this did not warrant an immediate security update. Teams are aware of this issue, and they are working on improvements to the code editing process while limiting impact to customers that may have legitimate uses for these characters based on the languages used on their systems. |
MSRC is- The Microsoft Security Response Center - https://www.microsoft.com/en-us/msrc |
@dsyme So are we requiring Cyrillic characters or Greek characters be escaped or not? |
I hope there is not a kneejerk reaction to ban or require escaping of certain Unicode characters. That strikes me as a narrow English language centric viewpoint. Surely there is legitimate use in strings when writing applications that will be viewed by non-English users. And coders, while obliged to use English keywords, might prefer to add comments in their native script. Escaping is not a good alternative; it hurts readability, especially in the case of the bidirectional codes in question -- imagine having to write all your strings/comments backwards because you are unable to embed the necessary codes. Perhaps a better and broader approach is to have proper Unicode handling overall, rather than the incomplete support currently in place (for example reliance on legacy System.Char methods that don't understand non-BMP characters - #9600), and having intelligent syntax rules based on real understanding of Unicode semantics. |
If we don't care about the homoglyphic attack and simply switching code across string termination, we can just ban unterminated bidirectionality across string termination as mentioned in the paper. |
Directional overrides are probably more useful in comments than in identifiers, because comments are exactly the kind of place where you’re likely to be mixing scripts, compared to an identifier, which will probably be all in one script for ease of typing. But I do think that the whole “Trojan source” thing is a bit of an imagined security concern. |
Recently paper called Trojan Source: Invisible Vulnerabilities demonstrates an attack against source code. It uses Unicode bi-direcional overrides to misguide the meaning of code to a human reader.
Repro steps
Only selecting text with mouse over condicional that is possible see some different thing.
Here I have an example to reproduce the problem
Expected behavior
Maybe compiler error which message
Invalid unicode character
Actual behavior
You are an admin.
Known workarounds
I don't know
Related information
Crystal lang discussion about this: crystal-lang/crystal#11392
Site about the problem: https://trojansource.codes/
The text was updated successfully, but these errors were encountered: