Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better support for Rust fuzzing #1407

Open
Dor1s opened this issue Feb 3, 2020 · 14 comments
Open

Better support for Rust fuzzing #1407

Dor1s opened this issue Feb 3, 2020 · 14 comments
Assignees
Labels
core feature Important feature

Comments

@Dor1s
Copy link
Contributor

Dor1s commented Feb 3, 2020

Similar to #860. There are some crashes reported in https://oss-fuzz.com/testcases?project=wasmtime and it seems that certain stackframes could be skipped (e.g. abort, abort_internal, rust_panic_with_hook, etc).

Not sure who assign this to. Perhaps @jonathanmetzman or maybe anyone else wants to own and champion Rust support //cc @inferno-chromium

@oliverchang
Copy link
Collaborator

@eepeep this could be something that would be nice to get you more familiar with the codebase, and will help Fuchsia once rust fuzzing there is ready. WDYT?

@alexcrichton
Copy link

FWIW the support so far I feel has been fantastic. We've ironed out a few things wrt to the build image (making sure to fuzzers are optimized, making sure optimizations are actually effective, turning on debuginfo, etc), and that's probably got some more tweaks that could happen to it one way or another.

The only thing I think the reporting could really improve on that's Rust specific is the case you called out above. The way cargo fuzz works is that it treats any "panics" in Rust as a fatal bug, where a panic in Rust is pretty similar to a failed assert in C/C++. Currently the cargo fuzz infrastructure translates panic! to call abort() which is how libfuzzer picks it up, but the reporting could, as you mentioned, probably prune all stack frames up to anything with std::panicking in it. The panic message is pretty immediately seen in the logs once you click through as well.

I can try to help out with any Rust-specific questions/etc if necessary as well!

@inferno-chromium inferno-chromium added the core feature Important feature label Mar 21, 2020
@inferno-chromium
Copy link
Collaborator

inferno-chromium commented May 21, 2020

This is done in #1790, #1792. We parse asserts properly, remove all the crap stack frames.

@alexcrichton - i had some questions on stackframes themselves

  1. What are these frames, looks like template functions right

#11 0x55a3e0467625 in _$LT$mp4parse_capi..Mp4parseAvifParser$u20$as$u20$mp4parse_capi..ContextParser$GT$::read::h3b7d4d3db512bcae mp4parse-rust/mp4parse_capi/src/lib.rs:377:9
for https://github.com/mozilla/mp4parse-rust/blob/9d58d7fef2ef90d509ba923c430d143c848cf4e2/mp4parse_capi/src/lib.rs#L377
Right now, we ignore any frames that start with _$LT

#13 0x55a3e0466c07 in mp4parse_capi::mp4parse_new_common::h5bcbdbc95c2c6730 mp4parse-rust/mp4parse_capi/src/lib.rs:467:15
In most stack frames, there is weird hash in every stack frame, e.g. h5bcbdbc95c2c6730 in this one. We right now keep it, but i am very worried if this changes across builds/dates, and then breaks deduplication (since based on crash state). Any way to remove this ?

@inferno-chromium inferno-chromium self-assigned this May 21, 2020
@inferno-chromium
Copy link
Collaborator

@alexcrichton - OSS-Fuzz Rust support is vaslty simplified - google/oss-fuzz#3830 , google/oss-fuzz#3840. Many projects didnt enable sanitizers, now ASan is properly enabled. Any comments to improve more things is appreciated. We dont enable MSan yet, but was hoping if you can see build failures with infra/helper.py for some of those projects to see what build.sh changes are needed.

@alexcrichton
Copy link

@inferno-chromium awesome progress! On the topic of symbols, currently rustc uses a C++-like name-mangling scheme but doesn't match it entirely. What rustc does it has a bunch of path components separated by :: trailed by a compiler-generated hash. Each component uses its own internal escapes (e.g. $LT$ is <). This means that for something like:

_$LT$mp4parse_capi..Mp4parseAvifParser$u20$as$u20$mp4parse_capi..ContextParser$GT$::read::h3b7d4d3db512bcae 
  • Originally this was _ZN... but that's the start of a C++ symbol, so it was automatically demangled. This gave the ::-separated pieces
  • $LT$mp4parse_capi..Mp4parseAvifParser$u20$as$u20$mp4parse_capi..ContextParser$GT$ is the first piece, which actually translates to <mp4parse_capi::Mp4parseAvifParser as mp4parse_capi::ContextParser>, but it requires extra rustc-specific knowledge to know that.
  • The next part read is just a method
  • The final part h3b7d4d3db512bcae is a compiler-synthesized hash.

The tl;dr; is that Rust symbols accidentally look like C++ symbols meaning that a name demangler for C++ will get some better information but doesn't go all the way in demangling Rust. There's a Rust crate, rustc-demangle, which demangles Rust symbols. For the purposes of stack trace collection it should be safe to ignore the compiler-generated hash.

but was hoping if you can see build failures with infra/helper.py for some of those projects to see what build.sh changes are needed.

Sure! Do you have a link to those build failures?

@inferno-chromium
Copy link
Collaborator

@inferno-chromium awesome progress! On the topic of symbols, currently rustc uses a C++-like name-mangling scheme but doesn't match it entirely. What rustc does it has a bunch of path components separated by :: trailed by a compiler-generated hash. Each component uses its own internal escapes (e.g. $LT$ is <). This means that for something like:

_$LT$mp4parse_capi..Mp4parseAvifParser$u20$as$u20$mp4parse_capi..ContextParser$GT$::read::h3b7d4d3db512bcae 
  • Originally this was _ZN... but that's the start of a C++ symbol, so it was automatically demangled. This gave the ::-separated pieces
  • $LT$mp4parse_capi..Mp4parseAvifParser$u20$as$u20$mp4parse_capi..ContextParser$GT$ is the first piece, which actually translates to <mp4parse_capi::Mp4parseAvifParser as mp4parse_capi::ContextParser>, but it requires extra rustc-specific knowledge to know that.
  • The next part read is just a method
  • The final part h3b7d4d3db512bcae is a compiler-synthesized hash.

Thanks for confirming this is a hash. Would really love to have some build flag to disable this feature. So far, it is not changing across builds, so we are fine, otherwise i can look into stripping it in future.

The tl;dr; is that Rust symbols accidentally look like C++ symbols meaning that a name demangler for C++ will get some better information but doesn't go all the way in demangling Rust. There's a Rust crate, rustc-demangle, which demangles Rust symbols. For the purposes of stack trace collection it should be safe to ignore the compiler-generated hash.

Adding a demangler just for these templated function is overkill. Most of the stack frames are demangled fine with just the default llvm symbolizer.

but was hoping if you can see build failures with infra/helper.py for some of those projects to see what build.sh changes are needed.

Sure! Do you have a link to those build failures?

Yes please try any OSS-Fuzz rust project.
E.g.
python infra/helper.py pull_images
python infra/helper.py build_fuzzers --sanitizer memory wasmtime

try wasmtime, or like mp4parse-rust, etc

it break build with errors like
/src/llvm-project/libcxxabi/src/private_typeinfo.cpp:1270: undefined reference to `__msan_warning_noreturn'
Ideally i want to simplify this for all rust projects, e.g. in https://github.com/google/oss-fuzz/blob/master/infra/base-images/base-builder/compile#L71. But can also modify build.sh for all projects if needed.

@alexcrichton
Copy link

Unfortunately the hash cannot be turned off, it's required for correctness to link libraries together in rustc. The compiler assumes it's there to ensure that two same-named libraries linked together don't conflict in symbols.

I agree that a demangler for these is likely overkill, but without a Rust-specific demangler the symbols are always going to be a bit wonky in Rust reports. There's a new symbol mangling scheme as well which is not yet enabled by default but will be eventually, and at that point Rust symbols won't look like C++ ones any more as well.


For building things, I ran:

$ python infra/helper.py pull_images
$ python infra/helper.py build_fuzzers --sanitizer memory mp4parse-rust

but I didn't get any errors?

$ python infra/helper.py pull_images
$ python infra/helper.py build_fuzzers --sanitizer memory wasmtime

and I did indeed get a lot of errors. The issue here though is that a dependency of wasmtime, binaryen, is a C++ project built with CMake. In building binaryen it looks like C/C++ is built with sanitizer intrinsics but they aren't linked in for the final build step.

I think the issue is that these lines should be defined for Rust as well (since Rust projects can include C/C++ code). I can't seem to figure out how to test that though because I can't seem to get the base-builder image to rebuild locally (it's always using a remotely-fetched one). Is there a way that I can rebuild the base-builder locally and build with that?

@inferno-chromium
Copy link
Collaborator

Unfortunately the hash cannot be turned off, it's required for correctness to link libraries together in rustc. The compiler assumes it's there to ensure that two same-named libraries linked together don't conflict in symbols.

I agree that a demangler for these is likely overkill, but without a Rust-specific demangler the symbols are always going to be a bit wonky in Rust reports. There's a new symbol mangling scheme as well which is not yet enabled by default but will be eventually, and at that point Rust symbols won't look like C++ ones any more as well.

For building things, I ran:

$ python infra/helper.py pull_images
$ python infra/helper.py build_fuzzers --sanitizer memory mp4parse-rust

but I didn't get any errors?

$ python infra/helper.py pull_images
$ python infra/helper.py build_fuzzers --sanitizer memory wasmtime

and I did indeed get a lot of errors. The issue here though is that a dependency of wasmtime, binaryen, is a C++ project built with CMake. In building binaryen it looks like C/C++ is built with sanitizer intrinsics but they aren't linked in for the final build step.

I think the issue is that these lines should be defined for Rust as well (since Rust projects can include C/C++ code). I can't seem to figure out how to test that though because I can't seem to get the base-builder image to rebuild locally (it's always using a remotely-fetched one). Is there a way that I can rebuild the base-builder locally and build with that?

It did work and let me reenable them. Dont know why other project owners disabled it in first place.

@inferno-chromium
Copy link
Collaborator

@alexcrichton - can you please keep us posted on when you will enable the new demangling scheme, it will break all projects if we dont support this. Can you also keep a build flag to disable that so that transition/migration is easier.

@alexcrichton
Copy link

Symbol mangling changes are tracked at rust-lang/rust#60705. I don't currently know the intentions of the compiler team about preserving an option to opt-in to the old scheme.

@inferno-chromium
Copy link
Collaborator

Thanks @alexcrichton , have subscribed to that.

From the point of work remaining, i think adding the demangler is the last thing left. Is rustc-demangle the official one or another one in plans for the new scheme ?

@alexcrichton
Copy link

The project AFAIK hasn't blessed something as an official demangler, but rustc-demangle is as close as anything can be to being official without actually being official. It's used by the Rust standard library, for example, to demangle backtraces.

@inferno-chromium
Copy link
Collaborator

@alexcrichton - that is good enough for us :)

@alexcrichton
Copy link

If it's helpful that crate also has a C API for calling into the demangler.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core feature Important feature
Projects
None yet
Development

No branches or pull requests

4 participants