Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gctest fails on Serenity OS #688

Open
ivmai opened this issue Dec 14, 2024 · 14 comments
Open

gctest fails on Serenity OS #688

ivmai opened this issue Dec 14, 2024 · 14 comments

Comments

@ivmai
Copy link
Owner

ivmai commented Dec 14, 2024

Source: master (437c08c)
Also observed on v8.2.8+patches (SerenityOS/serenity@9201208)
Host: Serenity/x86_64

Reported in issue #685 by @linusg:

I updated the port here: https://github.com/linusg/serenity/tree/ports-bdwgc-update
However I can't tell if everything works correctly because gctest doesn't seem to work with the current patches either.

@linusg
Copy link
Contributor

linusg commented Dec 14, 2024

Crash with serenity's bdwgc patches before upstreaming:

5.362 [#0 gctest(44:44)]: PV(error) fault in Region(0x00000020010bbd40)[0] at V0x0000000000145f90
5.362 [#0 gctest(44:44)]: Terminating gctest(44) due to signal 6
5.362 [#0 Finalizer Task(5:5)]: Backtrace:
0x00000000deadc0de  Kernel::ProcessorBase<Kernel::Processor>::switch_context(Kernel::Thread*&, Kernel::Thread*&) + 0x39f
0x00000000deadc0de  Kernel::Scheduler::context_switch(Kernel::Thread*) + 0x1ce
0x00000000deadc0de  Kernel::Scheduler::pick_next() + 0x123
0x00000000deadc0de  Kernel::ProcessorBase<Kernel::Processor>::clear_critical() + 0xcc
0x00000000deadc0de  Kernel::Thread::die_if_needed() + 0x267
0x00000000deadc0de  syscall_handler + 0x115d
0x00000000deadc0de  syscall_entry + 0x54
0x000000000cd8c45e  /usr/lib/libc.so: .text + 0x45e
0x000000093f49a416  /home/anon/gctest: .text + 0x14416
0x0000000b1777e00e  Signal trampoline + 0xe
0x000000093f48b738  /home/anon/gctest: .text + 0x5738
0x000000093f48c325  /home/anon/gctest: .text + 0x6325
0x000000093f499311  /home/anon/gctest: .text + 0x13311
0x000000093f48afe9  /home/anon/gctest: .text + 0x4fe9
0x000000093f487ce5  /home/anon/gctest: .text + 0x1ce5
0x000000093f487fe5  /home/anon/gctest: .text + 0x1fe5
5.371 [#0 Finalizer Task(5:5)]: Generating coredump for pid: 44
5.377 CrashDaemon(33): New coredump file: /tmp/coredump/gctest_44_1734203876
6.025 CrashReporter(45:46): Generating backtrace took 499 ms
6.025 CrashReporter(45:46): --- Backtrace for thread #0 (TID 44) ---
6.025 CrashReporter(45:46): 0x0000001284dd102b: [/usr/lib/libsystem.so] syscall2 +0xb (syscall.cpp:25 => syscall.cpp:24)
6.025 CrashReporter(45:46): 0x000000000cd8c45d: [/usr/lib/libc.so] abort +0x26 (stdlib.cpp:387)
6.025 CrashReporter(45:46): 0x000000093f49a415: [/home/anon/gctest] GC_write_fault_handler +0x1e5 (os_dep.c:3430)
6.025 CrashReporter(45:46): 0x0000000b1777e00d: ???
6.029 CrashReporter(45:46): 0x000000093f48b737: [/home/anon/gctest] GC_finish_collection +0x207 (alloc.c:1214)
6.029 CrashReporter(45:46): 0x000000093f48c324: [/home/anon/gctest] GC_try_to_collect_inner +0x104 (alloc.c:636)
6.029 CrashReporter(45:46): 0x000000093f499310: [/home/anon/gctest] GC_enable_incremental +0x230 (misc.c:1441)
6.029 CrashReporter(45:46): 0x000000093f48afe8: [/home/anon/gctest] enable_incremental_mode +0x8 (test.c:1941)
6.029 CrashReporter(45:46): 0x000000093f487ce4: [/home/anon/gctest] main +0x84 (test.c:2423)
6.029 CrashReporter(45:46): 0x000000093f487fe4: [/home/anon/gctest] _entry +0x24 (crt0.cpp:47)

image

Crash with the mentioned banch using bdwgc master after upstreaming:

5.545 [#0 gctest(44:44)]: PV(error) fault in Region(0x00000020010bbe40)[0] at V0x0000000000140f20
5.545 [#0 gctest(44:44)]: Terminating gctest(44) due to signal 6
5.545 [#0 Finalizer Task(5:5)]: Backtrace:
0x00000000deadc0de  Kernel::ProcessorBase<Kernel::Processor>::switch_context(Kernel::Thread*&, Kernel::Thread*&) + 0x39f
0x00000000deadc0de  Kernel::Scheduler::context_switch(Kernel::Thread*) + 0x1ce
0x00000000deadc0de  Kernel::Scheduler::pick_next() + 0x123
0x00000000deadc0de  Kernel::ProcessorBase<Kernel::Processor>::clear_critical() + 0xcc
0x00000000deadc0de  Kernel::Thread::die_if_needed() + 0x267
0x00000000deadc0de  syscall_handler + 0x115d
0x00000000deadc0de  syscall_entry + 0x54
0x0000000d42c9545e  /usr/lib/libc.so: .text + 0x45e
0x000000113667397d  /home/anon/gctest: .text + 0x1497d
0x0000001873cf800e  Signal trampoline + 0xe
0x0000001136664f33  /home/anon/gctest: .text + 0x5f33
0x0000001136665435  /home/anon/gctest: .text + 0x6435
0x000000113666c79a  /home/anon/gctest: .text + 0xd79a
0x000000113666cc7a  /home/anon/gctest: .text + 0xdc7a
0x0000001136661a34  /home/anon/gctest: .text + 0x2a34
0x0000001136662635  /home/anon/gctest: .text + 0x3635
5.551 [#0 Finalizer Task(5:5)]: Generating coredump for pid: 44
5.591 CrashDaemon(33): New coredump file: /tmp/coredump/gctest_44_1734204139
6.296 CrashReporter(45:46): Generating backtrace took 586 ms
6.300 CrashReporter(45:46): --- Backtrace for thread #0 (TID 44) ---
6.300 CrashReporter(45:46): 0x00000006292e502b: [/usr/lib/libsystem.so] syscall2 +0xb (syscall.cpp:25 => syscall.cpp:24)
6.300 CrashReporter(45:46): 0x0000000d42c9545d: [/usr/lib/libc.so] abort +0x26 (stdlib.cpp:387)
6.300 CrashReporter(45:46): 0x000000113667397c: [/home/anon/gctest] GC_write_fault_handler +0x21c (os_dep.c:3462)
6.300 CrashReporter(45:46): 0x0000001873cf800d: ???
6.300 CrashReporter(45:46): 0x0000001136664f32: [/home/anon/gctest] GC_finish_collection +0x1a2 (alloc.c:1336)
6.300 CrashReporter(45:46): 0x0000001136665434: [/home/anon/gctest] GC_collect_a_little_inner +0x234 (alloc.c:581)
6.304 CrashReporter(45:46): 0x000000113666c799: [/home/anon/gctest] GC_alloc_large +0xe9 (malloc.c:65)
6.304 CrashReporter(45:46): 0x000000113666cc79: [/home/anon/gctest] GC_generic_malloc_aligned +0x219 (malloc.c:287)
6.304 CrashReporter(45:46): 0x0000001136661a33: [/home/anon/gctest] main +0xdc3 (gctest.c:1815)
6.304 CrashReporter(45:46): 0x0000001136662634: [/home/anon/gctest] _entry +0x24 (crt0.cpp:47)

image

@ivmai
Copy link
Owner Author

ivmai commented Dec 14, 2024

Okay, the above 2 crashes are identical. A null pointer dereference occurs.

GC_finish_collection +0x1a2 (alloc.c:1336)

Hmm, this line contains GC_start_reclaim(FALSE), where is the null pointer dereferenced?

I recommend, during investigation of this crash, to pass CFLAGS_EXTRA="-O0 -DGC_DISABLE_INCREMENTAL" to cmake or configure, so that not to intercept SIGSEGV by GC_write_fault_handler (as well as turn off compiler optimizations).

@ivmai
Copy link
Owner Author

ivmai commented Dec 14, 2024

Also, it might be good to start investigation with the minimal version of collector, like this:

gcc -I include -o gctest -O0 -g -D GC_ASSERTIONS -D SMALL_CONFIG tests/gctest.c extra/gc.c && ./gctest

If it works, then test it in the multi-threaded mode:

gcc -I include -o gctest -O0 -g -D GC_ASSERTIONS -D SMALL_CONFIG -D GC_THREADS -D GC_BUILTIN_ATOMIC tests/gctest.c extra/gc.c && ./gctest

If it works, then test it w/o SMALL_CONFIG, and so on: add -D USE_MUNMAP, add -D GC_GCJ_SUPPORT, add -D ENABLE_DISCLAIM, add -D THREAD_LOCAL_ALLOC.

@linusg
Copy link
Contributor

linusg commented Dec 14, 2024

Thanks for the guidance! With -O0 -DGC_DISABLE_INCREMENTAL:

8.170 CrashReporter(45:46): 0x00000001d83e902b: [/usr/lib/libsystem.so] syscall2 +0xb (syscall.cpp:25 => syscall.cpp:24)
8.170 CrashReporter(45:46): 0x000000121a1d645d: [/usr/lib/libc.so] abort +0x26 (stdlib.cpp:387)
8.170 CrashReporter(45:46): 0x000000194da22765: [/home/anon/gctest] block_unmap_inner +0xea (os_dep.c:2794)
8.170 CrashReporter(45:46): 0x000000194da2280e: [/home/anon/gctest] GC_unmap +0x57 (os_dep.c:2830)
8.170 CrashReporter(45:46): 0x000000194da2b0d3: [/home/anon/gctest] GC_unmap_old +0x8d (allchblk.c:542)
8.170 CrashReporter(45:46): 0x000000194da13101: [/home/anon/gctest] GC_collect_or_expand +0x294 (alloc.c:1880)
8.174 CrashReporter(45:46): 0x000000194da19d2f: [/home/anon/gctest] GC_alloc_large +0x16c (malloc.c:76)
8.174 CrashReporter(45:46): 0x000000194da1a56f: [/home/anon/gctest] GC_generic_malloc_aligned +0x222 (malloc.c:287)
8.174 CrashReporter(45:46): 0x000000194da1b111: [/home/anon/gctest] GC_generic_malloc_ignore_off_page +0x27 (mallocx.c:211)
8.174 CrashReporter(45:46): 0x000000194da0f657: [/home/anon/gctest] run_one_test +0xdbe (gctest.c:1815)
8.174 CrashReporter(45:46): 0x000000194da10676: [/home/anon/gctest] thr_run_one_test +0x10 (gctest.c:2605)
8.174 CrashReporter(45:46): 0x000000194da1093d: [/home/anon/gctest] main +0x27a (gctest.c:2740)
8.178 CrashReporter(45:46): 0x000000194da0bc04: [/home/anon/gctest] _entry +0x24 (crt0.cpp:47)

image

errno 22 is EINVAL, the implementation of madvise is here: https://github.com/SerenityOS/serenity/blob/b88cd185a0ec40fc10405b555507aa6f0aab8222/Kernel/Syscalls/mmap.cpp#L355-L388

Looks like only MADV_SET_VOLATILE and MADV_SET_NONVOLATILE are supported, not MADV_DONTNEED.

And indeed, this patch makes it work:

--- a/os_dep.c
+++ b/os_dep.c
@@ -2770,7 +2770,7 @@ block_unmap_inner(ptr_t start_addr, size_t len)
 #    ifdef SN_TARGET_PS3
     ps3_free_mem(start_addr, len);
 #    elif defined(AIX) || defined(COSMO) || defined(CYGWIN32) \
-        || defined(HPUX) || defined(SERENITY)                 \
+        || defined(HPUX)                                      \
         || (defined(LINUX) && !defined(PREFER_MMAP_PROT_NONE))
     /* On AIX, mmap(PROT_NONE) fails with ENOMEM unless the       */
     /* environment variable XPG_SUS_ENV is set to ON.             */

image

@linusg
Copy link
Contributor

linusg commented Dec 14, 2024

  • -O0 -g -D GC_ASSERTIONS -D SMALL_CONFIG works
  • -O0 -g -D GC_ASSERTIONS -D SMALL_CONFIG -D GC_THREADS -D GC_BUILTIN_ATOMIC works
  • -O0 -g -D GC_ASSERTIONS -D GC_THREADS -D GC_BUILTIN_ATOMIC crashes with the null deref:
4.926 CrashReporter(45:46): 0x0000000aa3c2c02b: [/usr/lib/libsystem.so] syscall2 +0xb (syscall.cpp:25 => syscall.cpp:24)
4.926 CrashReporter(45:46): 0x00000000f611145d: [/usr/lib/libc.so] abort +0x26 (stdlib.cpp:387)
4.926 CrashReporter(45:46): 0x0000001ff7c199ee: [/home/anon/gctest] GC_write_fault_handler +0x123 (os_dep.c:3462)
4.926 CrashReporter(45:46): 0x0000000f1868400d: ???
4.926 CrashReporter(45:46): 0x0000001ff7c1cba2: [/home/anon/gctest] GC_start_reclaim +0x1e0 (reclaim.c:720)
4.926 CrashReporter(45:46): 0x0000001ff7bfdb51: [/home/anon/gctest] GC_finish_collection +0x2f6 (alloc.c:1336)
4.926 CrashReporter(45:46): 0x0000001ff7bfbf9d: [/home/anon/gctest] GC_maybe_gc +0x1d7 (alloc.c:581)
4.926 CrashReporter(45:46): 0x0000001ff7bfc940: [/home/anon/gctest] GC_collect_a_little_inner +0x297 (alloc.c:815)
4.930 CrashReporter(45:46): 0x0000001ff7c0a5e2: [/home/anon/gctest] GC_alloc_large +0x300 (malloc.c:65)
4.930 CrashReporter(45:46): 0x0000001ff7c0b538: [/home/anon/gctest] GC_generic_malloc_aligned +0x3e5 (malloc.c:287)
4.930 CrashReporter(45:46): 0x0000001ff7c0cad7: [/home/anon/gctest] GC_generic_malloc_ignore_off_page +0x27 (mallocx.c:211)
4.930 CrashReporter(45:46): 0x0000001ff7bf9518: [/home/anon/gctest] run_one_test +0xcd7 (gctest.c:1815)
4.930 CrashReporter(45:46): 0x0000001ff7bfa5f9: [/home/anon/gctest] thr_run_one_test +0x10 (gctest.c:2605)
4.930 CrashReporter(45:46): 0x0000001ff7bfa8c0: [/home/anon/gctest] main +0x27a (gctest.c:2740)
4.933 CrashReporter(45:46): 0x0000001ff7bf5c04: [/home/anon/gctest] _entry +0x24 (crt0.cpp:47)

@ivmai
Copy link
Owner Author

ivmai commented Dec 15, 2024

-O0 -g -D GC_ASSERTIONS -D GC_THREADS -D GC_BUILTIN_ATOMIC crashes with the null deref

Please remove `define MPROTECT_VDB' from SERENITY section in include/private/gcconfig.h

@ivmai
Copy link
Owner Author

ivmai commented Dec 15, 2024

Also, I wonder about printed "using 1 marker threads" - are you running on multicore? If yes, then there is an issue with getting amount of CPU cores in pthread_support.c. See GC_get_nprocs() implementation, I think you should enable the one based on sysconf(_SC_NPROCESSORS_ONLN), line 1093, if it works.

@ivmai
Copy link
Owner Author

ivmai commented Dec 15, 2024

And indeed, this patch makes it work (about madvise)

Good, but please check that the memory is really returned to OS. For this, you could set a break point in debugger near exit, and check how many memory is committed (compared to virtual process space). See the gctest message "Obtained ... bytes from OS".

There are several mechanisms to return (unmap) memory to OS:

  • mmap(PROT_NONE) // default one
  • madvise(MADV_DONTNEED) only // used on Linux
  • mprotect(PROT_NONE) only // used on Cygwin
  • madvise(MADV_DONTNEED) + mprotect(PROT_NONE)

I think mmap(PROT_NONE) should work on Serenity, otherwise check "mprotect(PROT_NONE) only" variant.

@ivmai ivmai closed this as completed Dec 15, 2024
@ivmai ivmai reopened this Dec 15, 2024
@linusg
Copy link
Contributor

linusg commented Dec 15, 2024

Please remove `define MPROTECT_VDB' from SERENITY section in include/private/gcconfig.h

Thanks, that does make gctest work with no additional build flags!

Also, I wonder about printed "using 1 marker threads" - are you running on multicore?

Running with a single core in QEMU, something with enabling SMP seems to be broken in the run script. I wouldn't worry about that for now but good catch :)

Good, but please check that the memory is really returned to OS.

I'm not certain that's the case, I can see 10 MiB of committed memory being retained after check_heap_stats but before exit (I put a sleep there) via the system monitor. Printing out the Obtained ... bytes from OS message again at that point gives me the exact same numbers (~10 MiB, the majority unmapped).

image
image

@ivmai
Copy link
Owner Author

ivmai commented Dec 15, 2024

Good! As I understand you did 2 changes (related to madvise and to MRPOTECT_VDB), and now "make check" works, right? Could you please open a PR here?

linusg added a commit to linusg/bdwgc that referenced this issue Dec 15, 2024
This causes a null pointer dereference crash when building without
SMALL_CONFIG.

See: ivmai#688
linusg added a commit to linusg/bdwgc that referenced this issue Dec 15, 2024
@linusg
Copy link
Contributor

linusg commented Dec 15, 2024

Correct! I have to run the tests manually due to cross-compiling but it all seems to work:

image

Could you please open a PR here?

Sure - #689

linusg added a commit to linusg/bdwgc that referenced this issue Dec 15, 2024
This causes a null pointer dereference crash when building without
SMALL_CONFIG.

See: ivmai#688
linusg added a commit to linusg/bdwgc that referenced this issue Dec 15, 2024
@ivmai
Copy link
Owner Author

ivmai commented Dec 15, 2024

@linusg
Copy link
Contributor

linusg commented Dec 15, 2024

Having the proposed patches, we do not need SerenityOS/serenity@master/Ports/bdwgc/patches/0005-Make-the-collector-build-with-threads.patch, right?

The test runner still crashes with anything other than NTHREADS=0 so intend to keep that part of the patchset, but that's definitely a bug in Serenity and should be solved on our end.

Specifically: https://github.com/linusg/serenity/blob/7388ec372ed27da9736b1513aaf201e2502c7e75/Ports/bdwgc/patches/0001-gctest-Set-NTHREADS-to-0.patch

@ivmai
Copy link
Owner Author

ivmai commented Dec 15, 2024

Okay, got it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants