-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parser vulnerable to Trojan Source attack #11392
Comments
I will argue against this proposal as this is too aggressive. We might take insights from how other languages are handling such cases. Below is an excerpt from how Rust team is doing that:
We will be releasing Rust 1.56.1 today, 2021-11-01, with two new If your code has legitimate uses for the codepoints we recommend replacing them |
Isn't that what Rust is doing, except that they suggest the escape codes to use? |
I don't see much difference, actually. AFAIK there should be no possibilty of such a character being valid anywhere in Crystal source code outside of string literals or comments. As mentioned, in those contexts we could employ a more sophisticated analysis and allow bi-directional control overwrites if they're fully enclosed in the respective literal or comment. But that is an additional feature and requires more discussion on how to implement. The primary focus is to apply a defence against this kind of source code modification. |
FWIW, see also python approach for unicode handling for key words: https://stackoverflow.com/questions/41757886/how-can-i-recognise-non-printable-unicode-characters-in-python |
The difference between the proposed hard ban of these control characters in Crystal and Rust's implementation is the fact that the latter is a lint which throws an error by default (deny-by-default) but it can be overwritten by the user to not raise any error or to raise only a warning. This can be done at the level of a whole program but also at the level of individual modules, single functions or merely specific expressions. Just FYI. |
Yes, I understand Rust uses a different implementation approach. But I see little practical difference. We could consider adding an option ignore or warn instead of failing. In Crystal, however, we do not have a lint system which could offer fine-grained control for where it applies. I figure a global option would be less useful. But is there even a reasonably valid use case for having bi-directional control characters in your Crystal source code? What would make you disable the bi-direction control character validation? |
A even better question is whether or not there's any Crystal code out there with bi-directional control characters. Seems extremely unlikely that there is actual, legitimate, usage of this. |
I'd say we take the most aggressive approach, and if we find the community is having legitimate uses of these characters, then we make an opt-out option. (We can do the opt-out right away, but this is more work to maintain.) |
After some back and forth thinking, we decided to give us a time to discuss this properly, and not to rush it in the coming 1.2.2 version. In a way, we can see three vectors for this type of attack in a language: string literals, identifiers, and comments. With Crystal having only CR-terminating comments, it's not clear that there can be an actual attack using comments, so let's focus on the other two vectors. When it comes to identifiers, this is already discussed in #11216, and there are more Unicode characters than the bidi control ones that are problematic. I don't think there is much of a debate about that we need to properly handle these. As for string literals, an option could be to let the |
I created a PR that warns on those unescaped characters, now that parser warnings are available. Then the follow-up would be escaping them whenever possible, which includes more things than string literals. For demonstration purposes, assume that "😂"
%(😂)
:😂
:"😂"
`😂`
def foo(😂 x)
end
foo(😂: 1)
FOO😂 = 1
%w(😂) would be formatted into: "\u202E"
%(\u202E)
:"\u202E" # note the addition of quotation marks
:"\u202E"
`\u202E`
def foo("\u202E" x)
end
foo("\u202E": 1)
FOO😂 = 1 # unchanged because escape sequences are unavailable here
%w(😂) # ditto #12740 is another example where warnings + formatter are used to phase out deprecated syntax in the language, so we should follow suit here. |
Format + warning sounds good to me. |
To make it clear for everyone: We're not talking about replacing emojis. |
@HertzDevil What about comments? # 😂 Should that become this? # \u202E Comment's don't actively support unicode escape sequences, but then it's just comments 🤷 |
but isn't the problem in this example about bidi chars in comments? p "hello world" # مرحبا بالعالم |
I think that if all the problematic characters to the left of We cannot unconditionally escape all those characters in Crystal code blocks within comments, as that might produce invalid code. And then there are In any case, even if we don't escape the comments automatically, the compiler and formatter shall warn on them every time anyway. By the way, here is an example of an attack that involves no string literals at all, only comments, made possible because these control characters are allowed in identifiers: user_id = 1000
if user_id # Check if admin != 1000
puts "You are an admin!"
end The escaped form is: user_id\u202E\u2066 = 1000
if user_id\u202E\u2066 # Check if admin\u2069\u2066 != 1000
puts "You are an admin!"
end |
For identifiers, I think the compiler should eventually reject bidi control characters (ref #11216). |
There is now a draft Unicode Technical Standard for this: https://www.unicode.org/reports/tr55/#Spoofing-bidi |
Nice to see 😃
|
A recently released paper titled Trojan Source: Invisible Vulnerabilities demonstrates an attack against source code. It uses Unicode bi-directional overrides to disguise the meaning of code to a human reader. This can lead to seemingly harmless code introducing malicious behaviour.
Crystal demonstration
The following code demonstrates a stretched-string attack in Crystal:
https://carc.in/#/r/c6ka
The following code demonstrates a commenting-out attack in Crystal:
https://carc.in/#/r/c6kh
They looks mostly unsuspicious. You wouldn't expect either to print anything. But both programs actually print
You are an admin!
despiteaccess_level = "user"
.The second lines of each program's source code contain a number of Unicode control characters for bi-directional overrides. This is what the parser reads:
The only indicator that something might be off is the syntax highlighting, which should be pretty resistant to being fooled.
Github has already introduced a feature that shows a warning when bi-directional overrides are detected in a file: https://github.blog/changelog/2021-10-31-warning-about-bidirectional-unicode-text/
Mitigation
This vulnerability can be defended easily by disallowing bi-directional control characters in source code.
In many locations, such control characters are already a syntax error. But they are currently valid in comments and string literals. Those are the typical spots for most languages.
However, Crystal's parser currently even accepts Unicode control characters in identifiers, including bi-directional override characters. Restricting the allowed character set in general is another problem and tracked in #11216.
I propose to change the language specification and lexer rules such that valid Crystal source code must not contain any bi-directional control characters, regardles of location.
A more fine-grained approach would be possible as well, but this should be unnecessary considering there are little to no legitimate use cases for bidirectional control characters in Crystal source code (but for some specific exceptions mentioned in the following section).
Workarounds
Bi-directional override characters are legitimate contents for string literals. Instead of encoding them directly in the source code, a proper workaround is to use escape sequences for that.
Bi-directional overrides can technically be legitimate in comments if you want to mix languages with different directions in the comment text. That does not seem like a very important use case, though.
Still, as a further enhancement, bi-directional overrides could potentially be allowed in comments and possibly other locations such as string literals as long as they are fully enclosed inside the comment or literal.
The general vulnerability is tracked as CVE-2021-42574.
The text was updated successfully, but these errors were encountered: