Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

assertion failed: bpos.to_u32() >= mbc.pos.to_u32() + mbc.bytes as u32 #68730

Closed
dwrensha opened this issue Feb 1, 2020 · 0 comments · Fixed by #68735
Closed

assertion failed: bpos.to_u32() >= mbc.pos.to_u32() + mbc.bytes as u32 #68730

dwrensha opened this issue Feb 1, 2020 · 0 comments · Fixed by #68735
Labels
A-diagnostics Area: Messages for errors, warnings, and lints A-parser Area: The parsing of Rust source code to an AST C-bug Category: This is a bug. D-Unicode-unaware Diagnostics: Diagnostics that are unaware of Unicode and trigger codepoint boundary assertions I-ICE Issue: The compiler panicked, giving an Internal Compilation Error (ICE) ❄️ T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@dwrensha
Copy link
Contributor

dwrensha commented Feb 1, 2020

I'm seeing an internal compiler error on the following input (found by fuzz-rustc):

$ echo ZW51bQBlbQDLgsuC | base64 --decode > main.rs
$ rustc main.rs
error: unknown start of token: \u{0}
 --> main.rs:1:5
  |
1 | enumem˂˂
  |     ^

error: unknown start of token: \u{0}
 --> main.rs:1:8
  |
1 | enumem˂˂
  |       ^

error: unknown start of token: \u{2c2}
 --> main.rs:1:9
  |
1 | enumem˂˂
  |       ^
  |
help: Unicode character '˂' (Modifier Letter Left Arrowhead) looks like '<' (Less-Than Sign), but it is not
  |
1 | enumem<˂
  |       ^

error: unknown start of token: \u{2c2}
 --> main.rs:1:10
  |
1 | enumem˂˂
  |        ^
  |
help: Unicode character '˂' (Modifier Letter Left Arrowhead) looks like '<' (Less-Than Sign), but it is not
  |
1 | enumem˂<
  |        ^

thread 'rustc' panicked at 'assertion failed: bpos.to_u32() >= mbc.pos.to_u32() + mbc.bytes as u32', src/librustc_span/source_map.rs:840:17
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

error: internal compiler error: unexpected panic

note: the compiler unexpectedly panicked. this is a bug.

note: we would appreciate a bug report: https://github.com/rust-lang/rust/blob/master/CONTRIBUTING.md#bug-reports

note: rustc 1.42.0-nightly (cd1ef390e 2020-01-31) running on x86_64-unknown-linux-gnu

error: aborting due to 4 previous errors

The same error happens on stable, beta, and nightly.

@jonas-schievink jonas-schievink added A-parser Area: The parsing of Rust source code to an AST C-bug Category: This is a bug. I-ICE Issue: The compiler panicked, giving an Internal Compilation Error (ICE) ❄️ I-nominated T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Feb 1, 2020
@bors bors closed this as completed in 01db581 Feb 3, 2020
Dylan-DPC-zz pushed a commit to Dylan-DPC-zz/rust that referenced this issue Feb 6, 2020
stop using BytePos for computing spans in librustc_parse/parser/mod.rs

Computing spans using logic such as `self.token.span.lo() + BytePos(1)` can cause internal compiler errors like rust-lang#68730 when non-ascii characters are given as input.

rust-lang#68735 partially addressed this problem, but only for one case. Moreover, its usage of `next_point()` does not actually align with what `bump_with()` expects. For example, given the token `>>=`, we should pass the span consisting of the final two characters `>=`, but `next_point()` advances the span beyond the end of the `=`.

This pull request instead computes the start of the new span by doing `start_point(self.token.span).hi()`. This matches `self.token.span.lo() + BytePos(1)` in the common case where the characters are ascii, and it gracefully handles multibyte characters.

Fixes rust-lang#68783.
@jieyouxu jieyouxu added A-diagnostics Area: Messages for errors, warnings, and lints D-Unicode-unaware Diagnostics: Diagnostics that are unaware of Unicode and trigger codepoint boundary assertions labels Aug 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-diagnostics Area: Messages for errors, warnings, and lints A-parser Area: The parsing of Rust source code to an AST C-bug Category: This is a bug. D-Unicode-unaware Diagnostics: Diagnostics that are unaware of Unicode and trigger codepoint boundary assertions I-ICE Issue: The compiler panicked, giving an Internal Compilation Error (ICE) ❄️ T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants