Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nallocfuzz: fuzzing engine to test allocations failure #9902

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

catenacyber
Copy link
Contributor

Hello @inferno-chromium @jonathanmetzman @oliverchang @alan32liu

Here is a new pull request to find new vulnerabilities : when allocations fail.

This is proposed through a fuzzing engine cf https://github.com/catenacyber/nallocfuzz
This fuzzing engine is simply libFuzzer using LLVMFuzzerRunDriver with

  • alloc hooks where we sometimes return NULL, and the rest of times alloc normally
  • LLVMFuzzerTestOneInput is wrapped into a function that makes the fuzz crash and alloc failures reproducible from the second run (hence the choice of a fuzzing engine)

This work is inspired by this issue in Suricata https://redmine.openinfosecfoundation.org/issues/5701 (still private for now) where an allocation failure incomplete handling led to a NULL pointer dereference (in rust code).
The issue was fixed here OISF/suricata#8379 (by setting some size to 0 as well as the pointer to NULL)

After fixing the fuzz targets cf curl/curl-fuzzer#74 this engine has found a first bug in curl after a few hours of fuzzing from the seed corpus cf curl/curl#10733 (and led to many variant findings)

What do you think of it ?

You can test it with

python3 infra/helper.py build_image --no-pull base-builder
python3 infra/helper.py build_image --no-pull curl
python3 infra/helper.py build_fuzzers curl --engine=nallocfuzz

(or with other projects)

@DavidKorczynski
Copy link
Collaborator

DavidKorczynski commented Mar 10, 2023

Very cool!!

We've used a similar technique in fluent-bit as well https://github.com/fluent/fluent-bit/blob/55fc5673fc6a5e52fa49ba37cd21c44c2680408f/include/fluent-bit/flb_mem.h#L67-L74 where it has found probably 10+ null dereferences.

An interesting experience was it also increased code coverage a lot, e.g. this calltree has a lot of red areas: https://storage.googleapis.com/oss-fuzz-introspector/fluent-bit/inspector-report/20230110/fuzz_report.html#Fuzzer:-flb-it-fuzz-config_map_fuzzer_OSSFUZZ but after adding possibility of failing allocations the callgraph has a lot more code explored https://storage.googleapis.com/oss-fuzz-introspector/fluent-bit/inspector-report/20230310/fuzz_report.html#Fuzzer:-flb-it-fuzz-config_map_fuzzer_OSSFUZZ

From the fluent-bit experience, we rolled this technique out gradually as it can trigger a bit of issues.

Another OSS-Fuzz project has used it too #8302 and from the comment it seems the technique found some issues.

Should be an engine though? Maybe a custom-sanitizer would be more appropriate? or, simply something that will be enabled in ASAN runs?

@catenacyber
Copy link
Contributor Author

Very cool!!

:-)

We've used this technique in fluent-bit as well https://github.com/fluent/fluent-bit/blob/55fc5673fc6a5e52fa49ba37cd21c44c2680408f/include/fluent-bit/flb_mem.h#L67-L74 where it has found probably 10+ null dereferences.

I have also done this kind of thing for one project, but this nalloc fuzz is now generic

From the fluent-bit experience, we rolled this technique out gradually as it can trigger a bit of issues.

I find that fuzz targets often have more bugs mishandling allocations failures than the projects being fuzzed

Another OSS-Fuzz project has used it too #8302 and from the comment it seems the technique found some issues.

Thanks for the reference

Should be an engine though? Maybe a custom-sanitizer would be more appropriate? or, simply something that will be enabled in ASAN runs?

The complexity to make it a sanitizer (or an ASAN option) is the need to hook/wrap LLVMFuzzerTestOneInput to have reproducible allocation failures and crashes (instead of just failing every Nth allocation)
How would you do that @DavidKorczynski ?

@catenacyber
Copy link
Contributor Author

By the way @DavidKorczynski you should also add that to realloc in fluent bit ;-)

@DavidKorczynski
Copy link
Collaborator

The complexity to make it a sanitizer (or an ASAN option) is the need to hook/wrap LLVMFuzzerTestOneInput to have reproducible allocation failures and crashes (instead of just failing every Nth allocation)
How would you do that @DavidKorczynski ?

Hmm -- am not sure tbh. In this engine, is the probability of failing an allocation determined by some value in the fuzzer input (i.e. const uint8_t *data)?

In the Fluent Bit case the two important things we looked for were (1) that the fuzzer can control the probability of when to fail allocations and (2) that the forcing of allocation failures are included in the code coverage reports as it's useful to see which error handling code has been covered/not covered.

We achieved (1) in the Fluent Bit case by using the first bytes of data from the fuzzer to determine how often to fail -- am not sure how I would do this in a general manner but I have a feeling I'd prefer a more light approach than a new engine. Maybe some pre-processing? Is this something that could perhaps be introduced in ASAN itself?

In order to get this included in the code coverage report ((2) above) would the code coverage generation have to be run using the new engine?

@catenacyber
Copy link
Contributor Author

is the probability of failing an allocation determined by some value in the fuzzer input (i.e. const uint8_t *data)?

Kind of, a crc32 is computed from the fuzzing data to seed some pseudo random for the allocations failures

that the fuzzer can control the probability of when to fail allocations

There is an environment variable to do that with nallocfuzz

Coverage reports are indeed not tested yet.
I guess it should work, but you will have to merge the coverage reports from nallocfuzz and other engines to get the full coverage...

@evverx
Copy link
Contributor

evverx commented Mar 12, 2023

Very cool!!

Agreed. Those codepaths are usually undertested (or aren't tested at all). I haven't taken a look at how it's implemented yet but the implementation details aside I agree that it would be really useful.

I find that fuzz targets often have more bugs mishandling allocations failures than the projects being fuzzed

I think there are a lot of fuzz targets where something like

m = malloc(...)
assert(m)

is used because they don't expect malloc to fail (especially when they reject giant blobs that can't reach functions being fuzzed in practice to avoid running into timeouts). They should probably be adjusted first to prevent false positives from popping up.

Also there are projects where allocation failures are considered fatal and handled by panicking or something like that. There should probably be a way to turn this off for projects like that.

@catenacyber
Copy link
Contributor Author

Thanks @evverx

I find that fuzz targets often have more bugs mishandling allocations failures than the projects being fuzzed

I think there are a lot of fuzz targets where something like

m = malloc(...)
assert(m)

Indeed like openssl for instance.
I find there are generally more "bugs" in the fuzz targets than in the software being fuzzed itself cf curl/curl-fuzzer#74 for another example.

is used because they don't expect malloc to fail (especially when they reject giant blobs that can't reach functions being fuzzed in practice to avoid running into timeouts). They should probably be adjusted first to prevent false positives from popping up.

Also there are projects where allocation failures are considered fatal and handled by panicking or something like that. There should probably be a way to turn this off for projects like that.

I think this should be off by default, and enabled by only the projects which are ready for it

@evverx
Copy link
Contributor

evverx commented Mar 13, 2023

I think this should be off by default, and enabled by only the projects which are ready for it

Agreed. I just thought that it would be opt-out but indeed it would be better if it was opt-in.

cc @mrc0mmand just in case. Looking at systemd/systemd#21872 it seems you played with allocation failures in systemd.

@catenacyber
Copy link
Contributor Author

cc @IvanNardi as I see that there is fuzz_ndpi_reader_alloc_fail

@IvanNardi
Copy link
Contributor

IvanNardi commented Mar 26, 2023

cc @IvanNardi as I see that there is fuzz_ndpi_reader_alloc_fail

Thanks for pointing me this thread: interesting stuff.

nDPI already uses some simple logic to fuzz allocation failures (see ntop/nDPI@ada4fe4 and ntop/nDPI@5e8c1eb); an important detail is the ability to enable/disable this feature at runtime via a trivial API: this way, for example, you can be sure that the "init/configuration" phase is always fine and trigger the failures only when processing the data (if you want that)

As expected, testing allocations failure triggered a huge number of issues in the error paths (NULL deferences or simple leaks) and greatly improved the coverage.

+1 on trying to integrate somehow alloc failures into oss-fuzz/asan/sanitizer/..., but definitely above my area of expertise

@catenacyber
Copy link
Contributor Author

Friendly ping oss-fuzz people @jonathanmetzman @oliverchang @alan32liu ?

@mrc0mmand
Copy link

mrc0mmand commented May 20, 2023

I finally got time to play around with this and it's definitely an interesting tool! I attempted to do something similar (albeit in a very dumbed down way) for systemd a while back and it yielded quite interesting results.

A quick note: I had to monkey-patch[0] the infra helper, otherwise it kept fetching the "official" systemd build image instead of using the just built one when running build_fuzzers, not sure why.

The first main issue I see is that the fuzzers need to be written in a way that accepts allocation failure as a "success". For example, in systemd we make heavy use of assert() in tests/fuzzers to weed out unexpected behavior, which is (unfortunately) in this case allocation failure as well. So running most of the systemd fuzzers with Nallocfuzz throws a SIGABRT right at the start, e.g.:

./fuzz-bus-match 
INFO: Running with entropic power schedule (0xFF, 100).
INFO: Seed: 4082894902
INFO: Loaded 2 modules   (82764 inline 8-bit counters): 82726 [0x7f48276c2650, 0x7f48276d6976), 38 [0x5ffe98, 0x5ffebe), 
INFO: Loaded 2 PC tables (82764 PCs): 82726 [0x7f48276d6978,0x7f4827819bd8), 38 [0x5bb2f0,0x5bb550), 
INFO: -max_len is not provided; libFuzzer will not generate inputs larger than 4096 bytes
INFO: A corpus is not provided, starting from an empty corpus
#2	INITED cov: 108 ft: 109 corp: 1/1b exec/s: 0 rss: 44Mb
#3	NEW    cov: 116 ft: 129 corp: 2/2b lim: 4 exec/s: 0 rss: 44Mb L: 1/1 MS: 1 ChangeByte-
	NEW_FUNC[1/1]: 0x7f48272277c0 in bus_match_node_type_from_string /work/build/../../src/systemd/src/libsystemd/sd-bus/bus-match.c:566
#22	NEW    cov: 133 ft: 147 corp: 3/4b lim: 4 exec/s: 0 rss: 45Mb L: 2/2 MS: 4 ChangeBit-ChangeByte-ChangeBit-InsertByte-
Assertion 'g = open_memstream_unlocked(&out, &out_size)' failed at src/libsystemd/sd-bus/fuzz-bus-match.c:41, function LLVMFuzzerTestOneInput(). Aborting.
NULL alloc in 37 run: calloc(8192) 
#1 0x4df212 in (null) (null):0
#2 0x4decb7 in (null) (null):0
#3 0x4deb9f in (null) (null):0
#4 0x7f48268997d7 in (null) (null):0
#5 0x472b15 in __interceptor_open_memstream /src/llvm-project/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc:6253
#6 0x7f4827106fb1 in (null) /work/build/../../src/systemd/src/basic/fileio.c:99
#7 0x4de391 in LLVMFuzzerTestOneInput /work/build/../../src/systemd/src/libsystemd/sd-bus/fuzz-bus-match.c:41
#8 0x4def28 in (null) (null):0
#9 0x4fdeb3 in ExecuteCallback /work/llvm-stage2//src/llvm-project/compiler-rt/lib/fuzzer/FuzzerLoop.cpp:611
#10 0x4fd69a in RunOne /work/llvm-stage2//src/llvm-project/compiler-rt/lib/fuzzer/FuzzerLoop.cpp:514
#11 0x4fed69 in MutateAndTestOne /work/llvm-stage2//src/llvm-project/compiler-rt/lib/fuzzer/FuzzerLoop.cpp:757
#12 0x4ffa35 in Loop /work/llvm-stage2//src/llvm-project/compiler-rt/lib/fuzzer/FuzzerLoop.cpp:895
#13 0x4eed9f in FuzzerDriver /work/llvm-stage2//src/llvm-project/compiler-rt/lib/fuzzer/FuzzerDriver.cpp:912
#14 0x4ef668 in (null) /work/llvm-stage2//src/llvm-project/compiler-rt/lib/fuzzer/FuzzerDriver.cpp:925
#15 0x4df185 in (null) (null):0
#16 0x7f4826832082 in (null) (null):0
#17 0x41f85d in (null) (null):0
#18 0xffffffffffffffff in (null) (null):0

AddressSanitizer:DEADLYSIGNAL
=================================================================
==14==ERROR: AddressSanitizer: ABRT on unknown address 0x00000000000e (pc 0x7f482685100b bp 0x7ffd6f8c1a30 sp 0x7ffd6f8c17e0 T0)
SCARINESS: 10 (signal)
    #0 0x7f482685100b in raise (/lib/x86_64-linux-gnu/libc.so.6+0x4300b) (BuildId: 1878e6b475720c7c51969e69ab2d276fae6d1dee)
    #1 0x7f4826830858 in abort (/lib/x86_64-linux-gnu/libc.so.6+0x22858) (BuildId: 1878e6b475720c7c51969e69ab2d276fae6d1dee)
    #2 0x7f4827146399 in log_assert_failed /work/build/../../src/systemd/src/basic/log.c:929:9
    #3 0x4dea76 in LLVMFuzzerTestOneInput /work/build/../../src/systemd/src/libsystemd/sd-bus/fuzz-bus-match.c:41:17
    #4 0x4def28 in NaloFuzzerTestOneInput (/build/fuzz-bus-match+0x4def28)
    #5 0x4fdeb3 in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerLoop.cpp:611:15
    #6 0x4fd69a in fuzzer::Fuzzer::RunOne(unsigned char const*, unsigned long, bool, fuzzer::InputInfo*, bool, bool*) /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerLoop.cpp:514:3
    #7 0x4fed69 in fuzzer::Fuzzer::MutateAndTestOne() /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerLoop.cpp:757:19
    #8 0x4ffa35 in fuzzer::Fuzzer::Loop(std::__Fuzzer::vector<fuzzer::SizedFile, std::__Fuzzer::allocator<fuzzer::SizedFile> >&) /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerLoop.cpp:895:5
    #9 0x4eed9f in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerDriver.cpp:912:6
    #10 0x4ef668 in LLVMFuzzerRunDriver /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerDriver.cpp:925:10
    #11 0x4df185 in main (/build/fuzz-bus-match+0x4df185)
    #12 0x7f4826832082 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x24082) (BuildId: 1878e6b475720c7c51969e69ab2d276fae6d1dee)
    #13 0x41f85d in _start (/build/fuzz-bus-match+0x41f85d)

One option would be to ignore SIGABRTs and only look for SIGSEGVs, but unfortunately ASan throws SIGABRT if it finds something interesting as well.

Now for some results: when poking around, Nallocfuzz managed to trigger a couple of interesting issues in systemd (systemd/systemd#27719), so this is definitely something worth pursuing further.

Apart from the issues above I encountered a couple of segfaults I'm not sure who to blame for (yet), but I'll try to sort them out in the next couple of days.

Anyway, that's all for now, thanks for the nifty tool!

[0]

diff --git a/infra/helper.py b/infra/helper.py
index 050f1ed5..414f66ec 100755
--- a/infra/helper.py
+++ b/infra/helper.py
@@ -679,8 +679,7 @@ def docker_run(run_args, print_output=True, architecture='x86_64'):
   """Calls `docker run`."""
   platform = 'linux/arm64' if architecture == 'aarch64' else 'linux/amd64'
   command = [
-      'docker', 'run', '--rm', '--privileged', '--shm-size=2g', '--platform',
-      platform
+      'docker', 'run', '--rm', '--privileged', '--shm-size=2g'
   ]
   # Support environments with a TTY.
   if sys.stdin.isatty():

@evverx
Copy link
Contributor

evverx commented May 20, 2023

I had to monkey-patch[0] the infra helper, otherwise it kept fetching the "official" systemd build image instead of using the just built one when running build_fuzzers, not sure why

My guess would be that podman was used instead of docker. Welcome to the club :-) #4774 (comment)

@mrc0mmand
Copy link

mrc0mmand commented May 20, 2023

I had to monkey-patch[0] the infra helper, otherwise it kept fetching the "official" systemd build image instead of using the just built one when running build_fuzzers, not sure why

My guess would be that podman was used instead of docker. Welcome to the club :-) #4774 (comment)

Bingo :) And I have a weird sense of deja-vu that you've already told me about this issue in the past, hah.

Anyway, as for the segfaults I don't now who to blame for, all of them have the same root cause - if fopen() fails, it might not set the errno correctly, so any code that does:

int foo(...) {
...
    f = fopen("foo", "r");
    if (!f)
        return -errno;
...

and later on checks only if foo(...) >= 0 will die horribly once it tries to use the file object. For example:

#include <assert.h>
#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <string.h>

#include "alloc.h"

int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
    FILE *f = NULL;

    errno = 0;
    f = fopen("foo", "r");
    if (!f) {
        assert(errno != 0);
        return 0;
    }

    fclose(f);

    return 0;
}
# clang -fsanitize=address,fuzzer-no-link foo.c alloc.c nallocfuzz.a -o foo
# ./foo 
INFO: Running with entropic power schedule (0xFF, 100).
INFO: Seed: 935361499
INFO: Loaded 1 modules   (5 inline 8-bit counters): 5 [0x5f9088, 0x5f908d), 
INFO: Loaded 1 PC tables (5 PCs): 5 [0x5b6a00,0x5b6a50), 
INFO: -max_len is not provided; libFuzzer will not generate inputs larger than 4096 bytes
INFO: A corpus is not provided, starting from an empty corpus
#2	INITED cov: 2 ft: 3 corp: 1/1b exec/s: 0 rss: 29Mb
#982	NEW    cov: 3 ft: 4 corp: 2/8b lim: 11 exec/s: 0 rss: 38Mb L: 7/7 MS: 5 ChangeBit-ChangeByte-ChangeBit-
...
foo: foo.c:16: int LLVMFuzzerTestOneInput(const uint8_t *, size_t): Assertion `errno != 0' failed.
NULL alloc in 988 run: malloc(472) 
...

==12212== ERROR: libFuzzer: deadly signal
    #0 0x4aac71 in __sanitizer_print_stack_trace /src/llvm-project/compiler-rt/lib/asan/asan_stack.cpp:87:3
    #1 0x515178 in fuzzer::PrintStackTrace() /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerUtil.cpp:210:5
    #2 0x4f9e53 in fuzzer::Fuzzer::CrashCallback() /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerLoop.cpp:233:3
    #3 0x4dbeff in fuzz_nalloc_sig_handler nallocfuzz.c
    #4 0x7f1d0390141f  (/lib/x86_64-linux-gnu/libpthread.so.0+0x1441f) (BuildId: 7b4536f41cdaa5888408e82d0836e33dcf436466)
    #5 0x7f1d035c400a in raise (/lib/x86_64-linux-gnu/libc.so.6+0x4300a) (BuildId: 1878e6b475720c7c51969e69ab2d276fae6d1dee)
    #6 0x7f1d035a3858 in abort (/lib/x86_64-linux-gnu/libc.so.6+0x22858) (BuildId: 1878e6b475720c7c51969e69ab2d276fae6d1dee)
    #7 0x7f1d035a3728  (/lib/x86_64-linux-gnu/libc.so.6+0x22728) (BuildId: 1878e6b475720c7c51969e69ab2d276fae6d1dee)
    #8 0x7f1d035b4fd5 in __assert_fail (/lib/x86_64-linux-gnu/libc.so.6+0x33fd5) (BuildId: 1878e6b475720c7c51969e69ab2d276fae6d1dee)
    #9 0x4dbacb in LLVMFuzzerTestOneInput (/root/nallocfuzz/foo+0x4dbacb)
    #10 0x4dbf88 in NaloFuzzerTestOneInput (/root/nallocfuzz/foo+0x4dbf88)
    #11 0x4fb3f3 in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerLoop.cpp:611:15
    #12 0x4fabda in fuzzer::Fuzzer::RunOne(unsigned char const*, unsigned long, bool, fuzzer::InputInfo*, bool, bool*) /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerLoop.cpp:514:3
    #13 0x4fc2a9 in fuzzer::Fuzzer::MutateAndTestOne() /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerLoop.cpp:757:19
    #14 0x4fcf75 in fuzzer::Fuzzer::Loop(std::__Fuzzer::vector<fuzzer::SizedFile, std::__Fuzzer::allocator<fuzzer::SizedFile> >&) /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerLoop.cpp:895:5
    #15 0x4ec2df in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerDriver.cpp:912:6
    #16 0x4ecba8 in LLVMFuzzerRunDriver /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerDriver.cpp:925:10
    #17 0x4dc1e5 in main (/root/nallocfuzz/foo+0x4dc1e5)
    #18 0x7f1d035a5082 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x24082) (BuildId: 1878e6b475720c7c51969e69ab2d276fae6d1dee)
    #19 0x41f75d in _start (/root/nallocfuzz/foo+0x41f75d)

NOTE: libFuzzer has rudimentary signal handlers.
      Combine libFuzzer with AddressSanitizer or similar for better crash reports.
SUMMARY: libFuzzer: deadly signal
MS: 1 CopyPart-; base unit: 3957096f2318b26dac2adb7dd856877aa8727861
0xe1,0xe1,0xe1,0xe1,0xe1,0xe1,0xe1,
\341\341\341\341\341\341\341
artifact_prefix='./'; Test unit written to ./crash-3957096f2318b26dac2adb7dd856877aa8727861
Base64: 4eHh4eHh4Q==

Which is both interesting and unfortunate, as it breaks a lot of stuff.

@evverx
Copy link
Contributor

evverx commented May 20, 2023

if fopen() fails, it might not set the errno correctly, so any code that does

Looks like glibc relies on malloc setting ENOMEM when it fails and it doesn't seem to happen with nallocfuzz.

SUSv2 requires malloc(), calloc(), and realloc() to set errno to ENOMEM upon failure. Glibc assumes that this is done (and the glibc versions of these routines do this); if you use a private
malloc implementation that does not set errno, then certain library routines may fail without having a reason in errno.

@mrc0mmand
Copy link

if fopen() fails, it might not set the errno correctly, so any code that does

Looks like glibc relies on malloc setting ENOMEM when it fails and it doesn't seem to happen with nallocfuzz.

SUSv2 requires malloc(), calloc(), and realloc() to set errno to ENOMEM upon failure. Glibc assumes that this is done (and the glibc versions of these routines do this); if you use a private
malloc implementation that does not set errno, then certain library routines may fail without having a reason in errno.

Interesting, didn't know about this. And that would also explain why I didn't encounter it with my experiments, since I always returned malloc(-1) instead of just NULL when mimicking allocation failure, so the errno was properly set.

So, with a simple Nallocfuzz patch[0] and by replacing asserts with less aggressive error handling in one of the systemd fuzzers, everything seems to be working and running correctly, nice!

[0]

diff
diff --git a/nallocfuzz.c b/nallocfuzz.c
index 21f9330..fb1e254 100644
--- a/nallocfuzz.c
+++ b/nallocfuzz.c
@@ -1,3 +1,4 @@
+#include <errno.h>
 #include <stdint.h>
 #include <stdlib.h>
 #include <string.h>
@@ -235,6 +236,7 @@ static bool fuzz_nalloc_fail(size_t size, const char *op) {
 
 void *calloc(size_t nmemb, size_t size) {
     if (fuzz_nalloc_fail(size, "calloc")) {
+        errno = ENOMEM;
         return NULL;
     }
     return __interceptor_calloc(nmemb, size);
@@ -242,6 +244,7 @@ void *calloc(size_t nmemb, size_t size) {
 
 void *malloc(size_t size) {
     if (fuzz_nalloc_fail(size, "malloc")) {
+        errno = ENOMEM;
         return NULL;
     }
     return __interceptor_malloc(size);
@@ -249,6 +252,7 @@ void *malloc(size_t size) {
 
 void *realloc(void *ptr, size_t size) {
     if (fuzz_nalloc_fail(size, "realloc")) {
+        errno = ENOMEM;
         return NULL;
     }
     return __interceptor_realloc(ptr, size);

@evverx
Copy link
Contributor

evverx commented May 21, 2023

@mrc0mmand it would be great if you could point it to the "networkd" fuzz targets. With issues like systemd/systemd#25883 and systemd/systemd#25891 (where huge memory leaks can be triggered by friendly neighbours sending router advertisements) the scenario where memory allocations fail isn't far-fetched. I think another scenario where the OOM-killer destroys networkd (or everything slows down) is more likely though but who knows how people configure their systems? :-)

@mrc0mmand
Copy link

mrc0mmand commented May 21, 2023

@mrc0mmand it would be great if you could point it to the "networkd" fuzz targets. With issues like systemd/systemd#25883 and systemd/systemd#25891 (where huge memory leaks can be triggered by friendly neighbours sending router advertisements) the scenario where memory allocations fail isn't far-fetched. I think another scenario where the OOM-killer destroys networkd (or everything slows down) is more likely though but who knows how people configure their systems? :-)

I'm slowly going through all the systemd fuzzers and managed hit another non-systemd issue, this time in asprintf() - there's apparently something going really wrong when one of the internal allocations fails. I managed to "isolate" it to following example:

#define _GNU_SOURCE

#include <assert.h>
#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <string.h>

void freep(void *p) {
    free(*(void**) p);
}

int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
    __attribute__((__cleanup__(freep))) char *str = NULL, *a = NULL;

    str = calloc(size + 1, sizeof(*str));
    if (!str)
        return 0;

    memcpy(str, data, size);
    str[size] = 0;
    assert(strlen(str) >= 0);

    asprintf(&a, "foo %s bar", str);

    return 0;
}
# clang -std=gnu99 -fsanitize=address,fuzzer-no-link foo.c nallocfuzz.a -o foo
# ./foo 
INFO: Running with entropic power schedule (0xFF, 100).
INFO: Seed: 542608753
INFO: Loaded 1 modules   (6 inline 8-bit counters): 6 [0x5f9048, 0x5f904e), 
INFO: Loaded 1 PC tables (6 PCs): 6 [0x5b6a00,0x5b6a60), 
INFO: -max_len is not provided; libFuzzer will not generate inputs larger than 4096 bytes
INFO: A corpus is not provided, starting from an empty corpus
#2	INITED cov: 3 ft: 4 corp: 1/1b exec/s: 0 rss: 29Mb
#263	NEW    cov: 4 ft: 5 corp: 2/3b lim: 6 exec/s: 0 rss: 36Mb L: 2/2 MS: 1 InsertByte-
#571	NEW    cov: 5 ft: 6 corp: 3/8b lim: 8 exec/s: 0 rss: 36Mb L: 5/5 MS: 3 ChangeBit-CopyPart-InsertRepeatedBytes-
#8584	REDUCE cov: 5 ft: 6 corp: 3/7b lim: 86 exec/s: 0 rss: 39Mb L: 4/4 MS: 3 ChangeByte-EraseBytes-ChangeBit-
#23733	REDUCE cov: 5 ft: 6 corp: 3/6b lim: 233 exec/s: 0 rss: 45Mb L: 3/3 MS: 4 ShuffleBytes-InsertByte-ShuffleBytes-EraseBytes-
#33872	REDUCE cov: 5 ft: 6 corp: 3/5b lim: 333 exec/s: 0 rss: 49Mb L: 1/3 MS: 4 EraseBytes-ChangeByte-ChangeByte-ChangeByte-
#46234	REDUCE cov: 5 ft: 6 corp: 3/4b lim: 453 exec/s: 0 rss: 55Mb L: 2/2 MS: 2 EraseBytes-ChangeByte-
#54031	REDUCE cov: 5 ft: 6 corp: 3/3b lim: 526 exec/s: 0 rss: 56Mb L: 1/1 MS: 2 EraseBytes-ChangeBinInt-
=================================================================
==12346==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x60b0004abc64 at pc 0x000000441dea bp 0x7ffce58d39c0 sp 0x7ffce58d3160
WRITE of size 101 at 0x60b0004abc64 thread T0
SCARINESS: 45 (multi-byte-write-heap-buffer-overflow)
    #0 0x441de9 in __interceptor_vasprintf /src/llvm-project/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc:1693:1
    #1 0x442b03 in asprintf /src/llvm-project/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc:1745:1
    #2 0x4dbd06 in LLVMFuzzerTestOneInput (/root/nallocfuzz/foo+0x4dbd06)
    #3 0x4dc248 in NaloFuzzerTestOneInput (/root/nallocfuzz/foo+0x4dc248)
    #4 0x4fb6b3 in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerLoop.cpp:611:15
    #5 0x4fae9a in fuzzer::Fuzzer::RunOne(unsigned char const*, unsigned long, bool, fuzzer::InputInfo*, bool, bool*) /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerLoop.cpp:514:3
    #6 0x4fc569 in fuzzer::Fuzzer::MutateAndTestOne() /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerLoop.cpp:757:19
    #7 0x4fd235 in fuzzer::Fuzzer::Loop(std::__Fuzzer::vector<fuzzer::SizedFile, std::__Fuzzer::allocator<fuzzer::SizedFile> >&) /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerLoop.cpp:895:5
    #8 0x4ec59f in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerDriver.cpp:912:6
    #9 0x4ece68 in LLVMFuzzerRunDriver /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerDriver.cpp:925:10
    #10 0x4dc4a5 in main (/root/nallocfuzz/foo+0x4dc4a5)
    #11 0x7f0f9a393082 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x24082) (BuildId: 1878e6b475720c7c51969e69ab2d276fae6d1dee)
    #12 0x41f75d in _start (/root/nallocfuzz/foo+0x41f75d)

DEDUP_TOKEN: __interceptor_vasprintf--asprintf--LLVMFuzzerTestOneInput
0x60b0004abc64 is located 0 bytes to the right of 100-byte region [0x60b0004abc00,0x60b0004abc64)
allocated by thread T0 here:
    #0 0x4a0b96 in __interceptor_malloc /src/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:69:3
    #1 0x4dc059 in malloc (/root/nallocfuzz/foo+0x4dc059)
    #2 0x7f0f9a3fab3d  (/lib/x86_64-linux-gnu/libc.so.6+0x8bb3d) (BuildId: 1878e6b475720c7c51969e69ab2d276fae6d1dee)
    #3 0x442b03 in asprintf /src/llvm-project/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc:1745:1
    #4 0x4dc248 in NaloFuzzerTestOneInput (/root/nallocfuzz/foo+0x4dc248)
    #5 0x4fb6b3 in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long) /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerLoop.cpp:611:15
    #6 0x4fae9a in fuzzer::Fuzzer::RunOne(unsigned char const*, unsigned long, bool, fuzzer::InputInfo*, bool, bool*) /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerLoop.cpp:514:3
    #7 0x4fc569 in fuzzer::Fuzzer::MutateAndTestOne() /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerLoop.cpp:757:19
    #8 0x4fd235 in fuzzer::Fuzzer::Loop(std::__Fuzzer::vector<fuzzer::SizedFile, std::__Fuzzer::allocator<fuzzer::SizedFile> >&) /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerLoop.cpp:895:5
    #9 0x4ec59f in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long)) /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerDriver.cpp:912:6
    #10 0x4ece68 in LLVMFuzzerRunDriver /src/llvm-project/compiler-rt/lib/fuzzer/FuzzerDriver.cpp:925:10
    #11 0x4dc4a5 in main (/root/nallocfuzz/foo+0x4dc4a5)
    #12 0x7f0f9a393082 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x24082) (BuildId: 1878e6b475720c7c51969e69ab2d276fae6d1dee)

DEDUP_TOKEN: __interceptor_malloc--malloc--
SUMMARY: AddressSanitizer: heap-buffer-overflow /src/llvm-project/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc:1693:1 in __interceptor_vasprintf
Shadow bytes around the buggy address:
  0x0c168008d730: fd fd fd fd fd fa fa fa fa fa fa fa fa fa fd fd
  0x0c168008d740: fd fd fd fd fd fd fd fd fd fd fd fa fa fa fa fa
  0x0c168008d750: fa fa fa fa fd fd fd fd fd fd fd fd fd fd fd fd
  0x0c168008d760: fd fa fa fa fa fa fa fa fa fa fd fd fd fd fd fd
  0x0c168008d770: fd fd fd fd fd fd fd fa fa fa fa fa fa fa fa fa
=>0x0c168008d780: 00 00 00 00 00 00 00 00 00 00 00 00[04]fa fa fa
  0x0c168008d790: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c168008d7a0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c168008d7b0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c168008d7c0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c168008d7d0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==12346==ABORTING

(This was hit by one of the resolved fuzzers - fuzz-resource-record.)

Looks like we're getting all the good stuff :)

@evverx
Copy link
Contributor

evverx commented May 21, 2023

WRITE of size 101 at 0x60b0004abc64 thread T0

I wonder if it's always 101? Looking at vasprintf it seems it has something to do with init_string_size = 100 and the subsequent reallocs. As far as I can see glibc handles allocation failures properly so it seems to be a nallocfuzz issue.

@mrc0mmand
Copy link

WRITE of size 101 at 0x60b0004abc64 thread T0

I wonder if it's always 101? Looking at vasprintf it seems it has something to do with init_string_size = 100 and the subsequent reallocs. As far as I can see glibc handles allocation failures properly so it seems to be a nallocfuzz issue.

Yep, it's always 101.

@evverx
Copy link
Contributor

evverx commented May 21, 2023

I guess before rolling it out globally issues like that should be weeded out first. Until then dns_resource_record_to_string can probably be commented out to get it around.

@evverx
Copy link
Contributor

evverx commented May 21, 2023

To judge from https://sourceware.org/git/?p=glibc.git;a=commit;h=af7f4165512ea242b5f711ee03a04f6afe22232d that whole thing was rewritten recently so whatever it is it is no longer relevant upstream :-)

@evverx
Copy link
Contributor

evverx commented May 21, 2023

As far as I can see glibc handles allocation failures properly so it seems to be a nallocfuzz issue

Having taking a closer look it seems it's a bug in glibc after all. realloc at https://elixir.bootlin.com/glibc/glibc-2.36.9000/source/libio/vasprintf.c#L79 is the culprit in the sense that its result isn't exactly checked. It was fixed in https://sourceware.org/git/?p=glibc.git;a=commit;h=af7f4165512ea242b5f711ee03a04f6afe22232d. Looks like it's the first buffer overflow nallocfuzz has found. Congratulations!

That being said I'm not sure I can say that it's easy to track down the sequences of mallocs/reallocs leading to crashes and the ASan backtraces aren't particularly helpful. I'm not sure how it can be improved though.

@evverx
Copy link
Contributor

evverx commented May 22, 2023

I guess another question would be how to make its findings reproducible. As far as I can tell in its current form it can't reliably reproduce crashes so if it was integrated into, say, some sort of CI its findings couldn't be used for regression testing (or for reproducing the same issues with the same inputs).

@evverx
Copy link
Contributor

evverx commented May 24, 2023

You can try to reproduce old suricata reports : I think the last public one with rust is the one in the brotli crate

The mysterious crash with no backtrace I mentioned was from 2021 as far as I can remember and I didn't mean to say that they are still like that :-) I actually somewhat keep track of Suricata because it's the only project on OSS-Fuzz where both Rust and C are used so it paved the way and it's probably the only reason why fuzzing stuff written in Rust and C isn't totally painful.

For the coverage, I do not see a better solution, than to run multiple times the corpus with different "options" : options being allocations failure, architecture... (and then merge the different coverages)
Do you see one @evverx ?

I think that "global" coverage reports on OSS-Fuzz can be built by merging several coverage reports. I'm not sure about the local "helper.py coverage" command (where building projects twice would be annoying). Then again I'm not sure whether it makes much sense to special-case it at this point because it isn't the most common scenario.

Anyway let's wait for the OSS-Fuzz maintainers to weigh in here.

@evverx
Copy link
Contributor

evverx commented May 24, 2023

In the meantime @mrc0mmand keeps finding various systemd issues using Nallocfuzz. They keep coming so instead of linking this PR to all the them https://github.com/search?q=repo%3Asystemd%2Fsystemd+Nallocfuzz&type=code can be used. Hopefully it should help to decide whether Nallocfuzz should be integrated into OSS-Fuzz or not.

@catenacyber
Copy link
Contributor Author

@mrc0mmand
Copy link

@catenacyber I wonder - given my past experiments, would it be possible to cover reallocarray() as well, to have all the fundamental allocation functions covered?

@catenacyber
Copy link
Contributor Author

I wonder - given my past experiments, would it be possible to cover reallocarray() as well, to have all the fundamental allocation functions covered?

Nice, I did not know this one, just added ;-)

@DonggeLiu
Copy link
Contributor

/gcbrun trial_build.py flac fluent-bit libpng ndpi suricata systemd --fuzzing-engine nallocfuzz

@DonggeLiu
Copy link
Contributor

Running a trial build to test it in the gcloud environment : )

IvanNardi added a commit to IvanNardi/nDPI that referenced this pull request May 29, 2023
Some low hanging fruits found using nallocfuzz.
See: https://github.com/catenacyber/nallocfuzz
See: google/oss-fuzz#9902

Most of these errors are quite trivial to fix; the only exception is the
stuff in the uthash.
If the insertion fails (because of an allocation failure), we need to
avoid some memory leaks. But the only way to check if the `HASH_ADD_*`
failed, is to perform a new lookup: a bit costly, but we don't use that
code in any critical data-path.
IvanNardi added a commit to IvanNardi/nDPI that referenced this pull request May 29, 2023
Some low hanging fruits found using nallocfuzz.
See: https://github.com/catenacyber/nallocfuzz
See: google/oss-fuzz#9902

Most of these errors are quite trivial to fix; the only exception is the
stuff in the uthash.
If the insertion fails (because of an allocation failure), we need to
avoid some memory leaks. But the only way to check if the `HASH_ADD_*`
failed, is to perform a new lookup: a bit costly, but we don't use that
code in any critical data-path.
IvanNardi added a commit to ntop/nDPI that referenced this pull request May 29, 2023
Some low hanging fruits found using nallocfuzz.
See: https://github.com/catenacyber/nallocfuzz
See: google/oss-fuzz#9902

Most of these errors are quite trivial to fix; the only exception is the
stuff in the uthash.
If the insertion fails (because of an allocation failure), we need to
avoid some memory leaks. But the only way to check if the `HASH_ADD_*`
failed, is to perform a new lookup: a bit costly, but we don't use that
code in any critical data-path.
@catenacyber
Copy link
Contributor Author

@alan32liu looks good, right ?

@DonggeLiu
Copy link
Contributor

@alan32liu looks good, right ?

Yep, it passed the GCB trial build.
Waiting for @oliverchang to review the PR when he has the time : )

@oliverchang
Copy link
Collaborator

It's quite costly to support custom fuzzing engines on our end. Could we ask that this be integrated into FuzzBench instead? We have further work planned for this year to get the benefits of all FuzzBench engines into OSS-Fuzz.

@catenacyber
Copy link
Contributor Author

Thanks for your answer Oliver

It's quite costly to support custom fuzzing engines on our end.

Oh. How so ?

We have further work planned for this year to get the benefits of all FuzzBench engines into OSS-Fuzz.

Does this include optional opt-in for some of these FuzzBench engines ?
I do not think every project will want NallocFuzz

@evverx
Copy link
Contributor

evverx commented Jun 11, 2023

It's quite costly to support custom fuzzing engines on our end.

Can't argue with that. But it's costly for projects like systemd to support this locally as well.

I think @mrc0mmand already maintains stuff that can be replaced with #7343 and that's not even a fuzzing engine. Stuff like systemd/systemd#26151, systemd/systemd#23894, systemd/systemd#23873 doesn't go through OSS-Fuzz either because custom stuff was rejected (though in that particular case I agree it would be too costly).

We have further work planned for this year to get the benefits of all FuzzBench engines into OSS-Fuzz

I'm not sure I understand how it would work but if the same issues would be reported by OSS-Fuzz as usual I think it should be fine.

1480c1 pushed a commit to m-ab-s/aom that referenced this pull request Jun 24, 2023
init_decoder() should not leave ctx->frame_worker partially allocated.
It should fully allocate ctx->frame_worker on success, and set
ctx->frame_worker to NULL on failure.

This bug was found by Philippe Antoine <[email protected]> using
nallocfuzz (see google/oss-fuzz#9902).

Bug: aomedia:3458
Change-Id: I1ab5bb26e396f2f1d9f7e42f570563403f0e2be2
@IvanNardi
Copy link
Contributor

IvanNardi commented Apr 26, 2024

Any updates on this topic?

@catenacyber
Copy link
Contributor Author

@IvanNardi I have no news on oss-fuzz side, but I have been running this on some projects lately like google/wuffs#135 (comment)

cyh5272 pushed a commit to cyh5272/aom that referenced this pull request May 6, 2024
init_decoder() should not leave ctx->frame_worker partially allocated.
It should fully allocate ctx->frame_worker on success, and set
ctx->frame_worker to NULL on failure.

This bug was found by Philippe Antoine <[email protected]> using
nallocfuzz (see google/oss-fuzz#9902).

Change-Id: I1ab5bb26e396f2f1d9f7e42f570563403f0e2be2
cyh5272 pushed a commit to cyh5272/aom that referenced this pull request May 6, 2024
init_decoder() should not leave ctx->frame_worker partially allocated.
It should fully allocate ctx->frame_worker on success, and set
ctx->frame_worker to NULL on failure.

This bug was found by Philippe Antoine <[email protected]> using
nallocfuzz (see google/oss-fuzz#9902).

Change-Id: I1ab5bb26e396f2f1d9f7e42f570563403f0e2be2
IvanNardi added a commit to IvanNardi/nDPI that referenced this pull request May 7, 2024
```
SCARINESS: 12 (1-byte-read-heap-buffer-overflow)
    #0 0x557f3a5b5100 in ndpi_get_host_domain /src/ndpi/src/lib/ndpi_domains.c:158:8
    ntop#1 0x557f3a59b561 in ndpi_check_dga_name /src/ndpi/src/lib/ndpi_main.c:10412:17
    ntop#2 0x557f3a51163a in process_chlo /src/ndpi/src/lib/protocols/quic.c:1467:7
    ntop#3 0x557f3a469f4b in LLVMFuzzerTestOneInput /src/ndpi/fuzz/fuzz_quic_get_crypto_data.c:44:7
    ntop#4 0x557f3a46abc8 in NaloFuzzerTestOneInput (/out/fuzz_quic_get_crypto_data+0x4cfbc8)
```

Some notes about the leak: if the insertion into the uthash fails (because of an
allocation failure), we need to free the just allocated entry. But the only
way to check if the `HASH_ADD_*` failed, is to perform a new lookup: a bit
costly, but we don't use that code in the fast-path.
See also efb261a

Credits for finding the issues to Philippe Antoine (@catenacyber) and its
`nallocfuzz` fuzzing engine
See: https://github.com/catenacyber/nallocfuzz
See: google/oss-fuzz#9902
IvanNardi added a commit to ntop/nDPI that referenced this pull request May 8, 2024
```
SCARINESS: 12 (1-byte-read-heap-buffer-overflow)
    #0 0x557f3a5b5100 in ndpi_get_host_domain /src/ndpi/src/lib/ndpi_domains.c:158:8
    #1 0x557f3a59b561 in ndpi_check_dga_name /src/ndpi/src/lib/ndpi_main.c:10412:17
    #2 0x557f3a51163a in process_chlo /src/ndpi/src/lib/protocols/quic.c:1467:7
    #3 0x557f3a469f4b in LLVMFuzzerTestOneInput /src/ndpi/fuzz/fuzz_quic_get_crypto_data.c:44:7
    #4 0x557f3a46abc8 in NaloFuzzerTestOneInput (/out/fuzz_quic_get_crypto_data+0x4cfbc8)
```

Some notes about the leak: if the insertion into the uthash fails (because of an
allocation failure), we need to free the just allocated entry. But the only
way to check if the `HASH_ADD_*` failed, is to perform a new lookup: a bit
costly, but we don't use that code in the fast-path.
See also efb261a

Credits for finding the issues to Philippe Antoine (@catenacyber) and his
`nallocfuzz` fuzzing engine
See: https://github.com/catenacyber/nallocfuzz
See: google/oss-fuzz#9902
@catenacyber
Copy link
Contributor Author

@oliverchang do you want to do something with this ?
Could it be just documenting the dirty hack to use custom fuzzing engine like https://github.com/google/oss-fuzz/blob/master/projects/suricata/build.sh#L94 ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants