Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix disallow Unicode bi-directional control characters #11393

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 17 additions & 0 deletions spec/compiler/lexer/lexer_spec.cr
Original file line number Diff line number Diff line change
Expand Up @@ -558,4 +558,21 @@ describe "Lexer" do
assert_syntax_error %("\\x1z"), "invalid hex escape"

assert_syntax_error %("hi\\)

# CVE-2021-42574
describe "trojan source" do
straight-shoota marked this conversation as resolved.
Show resolved Hide resolved
['\u202A', '\u202B', '\u202C', '\u202D', '\u202E', '\u2066', '\u2067', '\u2068', '\u2069'].each do |char|
assert_syntax_error %(f#{char}), "Invalid unicode control character: #{char.dump}"
assert_syntax_error %("#{char}"), "Invalid unicode control character: #{char.dump}"
assert_syntax_error %(%w(#{char})), "Invalid unicode control character: #{char.dump}"
assert_syntax_error %(:#{char}), "Invalid unicode control character: #{char.dump}"
assert_syntax_error %(:"#{char}"), "Invalid unicode control character: #{char.dump}"
assert_syntax_error %(%i(#{char})), "Invalid unicode control character: #{char.dump}"
assert_syntax_error %(##{char}), "Invalid unicode control character: #{char.dump}"
assert_syntax_error %(macro foo\n##{char}\nend), "Invalid unicode control character: #{char.dump}"

it_lexes_string char.to_s.dump, char.to_s
it_lexes_char char.dump, char
end
end
end
2 changes: 1 addition & 1 deletion spec/support/syntax.cr
Original file line number Diff line number Diff line change
Expand Up @@ -130,7 +130,7 @@ class Crystal::ASTNode
end

def assert_syntax_error(str, message = nil, line = nil, column = nil, metafile = __FILE__, metaline = __LINE__, metaendline = __END_LINE__)
it "says syntax error on #{str.inspect}", metafile, metaline, metaendline do
it "says syntax error on #{str.dump}", metafile, metaline, metaendline do
begin
parse str
fail "Expected SyntaxException to be raised", metafile, metaline
Expand Down
7 changes: 4 additions & 3 deletions src/compiler/crystal/syntax/lexer.cr
Original file line number Diff line number Diff line change
Expand Up @@ -1349,9 +1349,7 @@ module Crystal
start_pos = current_pos
end

while char != '\n' && char != '\0'
char = next_char_no_column_increment
end
skip_comment

if doc_buffer = @token.doc_buffer
doc_buffer << '\n'
Expand Down Expand Up @@ -3078,6 +3076,9 @@ module Crystal
if error = @reader.error
::raise InvalidByteSequenceError.new("Unexpected byte 0x#{error.to_s(16)} at position #{@reader.pos}, malformed UTF-8")
end
if current_char.in?('\u202A', '\u202B', '\u202C', '\u202D', '\u202E', '\u2066', '\u2067', '\u2068', '\u2069')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do we handle nonbreaking space? (U+00A0) That also seems potentially confusing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, non-breaking space and other invisible characters can be confusing. But at the worst they're just invisible by themselves. They don't manipulate the display of of other visible characters like bidi control characters.
Of course, they should be invalid in most places as well, but I wouldn't necessarily go as far as disallow them entirely. That could be an option, but this definitely needs more discussion to happen in #11216.

raise "Invalid unicode control character: #{current_char.dump}"
end
char
end

Expand Down