-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Carriage return doesn't end // comments #7946
Comments
Discussion on IRC seemed to think that using Unicode character classes for newline determination in general is a reasonable approach, and I agree: https://en.wikipedia.org/wiki/Newline#Unicode |
The best thing to do here would be to look at what other Unicode-aware languages do, and then do that. This is purely for CYA, because I can contrive security risks regardless of which behavior we pick (thanks to idiosyncracies in editors, which we can never control). (I'm not using the actual Unicode code points in these code examples, for purposes of visual illustration.) Here's the risk in our current behavior. Imagine an editor that starts newlines on // imagine that this comment ends in a \r
if user.is_not_the_president { return; }
// the above line is commented out!
launch_all_nuclear_missiles(); Now imagine that we adopt the plan where line comments end on any Unicode newline-equivalent. Per sp3d's Wikipedia link:
So we merely need to imagine an editor on Windows that doesn't recognize one of the less common newline characters, which to a user might look something like: // whatever you do, DO NOT do anything like… send_user_data_to_hackers();
// the … in the above line ends the comment, leaving working code! Though realistically the "…" would probably be rendered as a "▯" (or other "unknown character" symbol), but that doesn't much change the risk here, which is caused by potential disregard for Unicode and incorrect syntax highlighting. |
nominating |
Here's a survey of a few modern languages: C#: Haskell: Java: Javascript [ecmascript]: Python 3: All these languages (excluding the ghc implementation of Haskell) accept at least \r, \n, and \r\n. Are there other languages worth looking at? |
I'd be interested to see the rules for C++, Ruby, and Go (the former so that we know what our potential audience is familiar with, and the latter because they also support Unicode identifiers (and Go, like Rust, requires all source files to be UTF-8)). |
C++: Go: Ruby: Between the latter two, the convention seems to be to consider 0x0d characters as whitespace and 0x0a newlines, such that \n and \r\n newlines work, somewhat similarly to the current Rust behavior--Ruby will consider "0x0d 0x0a" a single token without external whitespace while Go will not, as I understand it. |
Accepted for well-defined |
cc me |
Thanks for all the research on this topic! We discussed this and decided that the use case for anything other than |
We came across this issue recently and were planning on reporting it when we found that you have been long aware. We originally uncovered a similar problem in the Solidity language during our Solidity compiler audit. While this is arguably more severe in a blockchain setting where auditability substitutes trust, we believe that improper handling of We'd love it if you would consider reopening this issue for further discussion :) |
This issue has been fixed by [commit](rust-lang/rust-clippy@8c1c763) This PR is used for close rust-lang#7946(Fixes rust-lang#7946). changelog: Add test for pattern_type_mismatch.
Add test for pattern_type_mismatch. This issue has been fixed by [commit](rust-lang/rust-clippy@8c1c763) This PR is used for close rust-lang#7946(Fixes rust-lang#7946). changelog: Add test for pattern_type_mismatch.
The manual and compiler agree that \n (newline) is the only character which can end // comments. However, a large amount of software such as editors and document viewers consider any of "\r", "\r", and "\r\n" to introduce the start of a new line. In a file that has a \r without a following \n, users can easily be tricked into thinking a line is not commented out when it actually is.
For example, the rust program "//\rfn main() {}" is missing a main function.
The text was updated successfully, but these errors were encountered: