Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assertion failure in disclaim_test if manual VDB #235

Closed
ivmai opened this issue Sep 10, 2018 · 16 comments
Closed

Assertion failure in disclaim_test if manual VDB #235

ivmai opened this issue Sep 10, 2018 · 16 comments

Comments

@ivmai
Copy link
Owner

ivmai commented Sep 10, 2018

Build link: https://travis-ci.org/ivmai/bdwgc/jobs/425865062
Source: master (or 8.0.0)

Host: Linux/x64
How to build and run: ./configure --enable-gc-assertions --disable-munmap && make check CFLAGS_EXTRA="-D TEST_MANUAL_VDB"

Output (of disclaim_test):
Threaded disclaim test.
Assertion failure, line 119: !p->car || is_pair(p->car)

ivmai referenced this issue Sep 10, 2018
* tests/disclaim_test.c (my_assert): Call fflush().
* tests/disclaim_test.c (pair_s): Replace is_valid with "magic" field.
* tests/disclaim_test.c (pair_magic): New const static variable.
* tests/disclaim_test.c (is_pair): New function.
* tests/disclaim_test.c (pair_dct, pair_new): Set "magic" field and
use is_pair() instead of is_valid for assertion checking.
@ivmai
Copy link
Owner Author

ivmai commented Sep 10, 2018

If compiled with DEBUG_DISCLAIM_DESTRUCT, the output as follows ("car" is destroyed before "p") :

Threaded disclaim test.
...
Construct 0xe88d58 = ((nil), (nil))
...
Construct 0xe8c248 = ((nil), (nil))
...
Construct 0xe8ec08 = (0xe8c248, 0xe88d58)
...
Destruct 0xe8c248 = ((nil), (nil))
...
Destruct 0xe8ec08 = (0xe8c248, 0xe88d58)
Assertion failure, line 119: !p->car || is_pair(p->car)

@paurkedal
Copy link
Contributor

I cannot reproduce this myself from cac8158 with the given options on Ubuntu 18.04, but: A comment in mark.c gives me the suspicion that I might have failed to follow the logic of the GC cycles and that the following might be needed:

diff --git a/mark.c b/mark.c
index b1d59026..a7951374 100644
--- a/mark.c
+++ b/mark.c
@@ -1969,6 +1969,11 @@ STATIC struct hblk * GC_push_next_marked(struct hblk *h)
         if (NULL == h) ABORT("Bad HDR() definition");
 #     endif
     }
+#   ifdef ENABLE_DISCLAIM
+      if ((hhdr -> hb_flags & MARK_UNCONDITIONALLY) != 0) {
+        GC_push_unconditionally(h, hhdr);
+      } else
+#   endif
     GC_push_marked(h, hhdr);
     return(h + OBJ_SZ_TO_BLOCKS(hhdr -> hb_sz));
 }

It's a wild guess though.

@ivmai
Copy link
Owner Author

ivmai commented Sep 11, 2018

I cannot reproduce this myself

To reproduce it, I run ./disclaim_test in a loop in 3 terminals in parallel (and got my_assert violation after around 100 iterations.

@ivmai
Copy link
Owner Author

ivmai commented Sep 11, 2018

I think the assertion was because I resurrected the object by GC_ptr_store_and_dirty(&p->car, cd) in pair_dct. Now I've removed the store barrier which was not actually needed because we reclaim p after pair_dct completion (now the code is p->car = 0).

@ivmai
Copy link
Owner Author

ivmai commented Sep 11, 2018

After the change (GC_ptr_store_and_dirty -> p->car=0), I got the following error:
Assertion failure, line 124: p->checksum == checksum

(It is reproduced after several dozens of disclaim_test runs even in a more simple configuration: ./configure --enable-gc-assertions --disable-parallel-mark --disable-munmap --disable-handle-fork --disable-thread-local-alloc && make check CFLAGS_EXTRA="-O0 -g -D TEST_MANUAL_VDB")

@ivmai
Copy link
Owner Author

ivmai commented Sep 11, 2018

And, again, sometimes I see:
Assertion failure, line 120: !p->cdr || is_pair(p->cdr)

p is valid but p->cdr is already reclaimed (zero'ed, the 1st word points to the next element in the free list.)

@ivmai
Copy link
Owner Author

ivmai commented Sep 11, 2018

Not reproduced w/o -D TEST_MANUAL_VDB

@ivmai
Copy link
Owner Author

ivmai commented Sep 11, 2018

A comment in mark.c gives me the suspicion that I might have failed to follow the logic of the GC cycles and that the following might be needed

I will check it tomorrow.

@paurkedal
Copy link
Contributor

I managed to trigger the error with 3 parallel runs as you suggested. My adjustments to mark.c did not make a difference, though.

@paurkedal
Copy link
Contributor

I think the issue is that GC_ptr_store_and_dirty calls set_pht_entry_from_index without locking. I believe the latter (|=) translates to separate read and write, so if another thread does the same between the two operations on the same word, that thread's dirty bit will be lost.

I'm running some long test now with the following:

diff --git a/include/private/gc_priv.h b/include/private/gc_priv.h
index 4f2d3085..4d2fd7a0 100644
--- a/include/private/gc_priv.h
+++ b/include/private/gc_priv.h
@@ -2197,7 +2197,8 @@ GC_EXTERN GC_bool GC_print_back_height;
 # define GC_auto_incremental (GC_incremental && !GC_manual_vdb)
 
   GC_INNER void GC_dirty_inner(const void *p); /* does not require locking */
-# define GC_dirty(p) (GC_manual_vdb ? GC_dirty_inner(p) : (void)0)
+  GC_API_PRIV void GC_dirty_async(const void *p);
+# define GC_dirty(p) (GC_manual_vdb ? GC_dirty_async(p) : (void)0)
 # define REACHABLE_AFTER_DIRTY(p) GC_reachable_here(p)
 #endif /* !GC_DISABLE_INCREMENTAL */
 
diff --git a/os_dep.c b/os_dep.c
index 03cd7ecb..efbf8708 100644
--- a/os_dep.c
+++ b/os_dep.c
@@ -3723,6 +3723,18 @@ GC_INNER GC_bool GC_dirty_init(void)
     set_pht_entry_from_index(GC_dirty_pages, index); /* FIXME: concurrent */
   }
 
+  GC_API_PRIV void GC_dirty_async(const void *p)
+  {
+    word index = PHT_HASH(p);
+
+#   if defined(MPROTECT_VDB)
+      /* Do not update GC_dirty_pages if it should be followed by the   */
+      /* page unprotection.                                             */
+      GC_ASSERT(GC_manual_vdb);
+#   endif
+    async_set_pht_entry_from_index(GC_dirty_pages, index); /* FIXME: concurrent */
+  }
+
   /* Retrieve system dirty bits for the heap to a local buffer (unless  */
   /* output_unneeded).  Restore the systems notion of which pages are   */
   /* dirty.  We assume that either the world is stopped or it is OK to  */

As you can see there is a FIXME which made me look bit closer.

@ivmai
Copy link
Owner Author

ivmai commented Sep 15, 2018

I see. I will check it as well but I'm not sure async_set_pht_entry_from_index has enough thread-safety for GC_dirty. It seems to me that we need AO_or here.

To @hboehm: what do you think?

@paurkedal
Copy link
Contributor

paurkedal commented Sep 15, 2018

The issue hasn't popped up after the change, but it also eventually deadlocks, so there is probably a double locking involved. AO_or should avoid that, so I'll try it.

@ivmai
Copy link
Owner Author

ivmai commented Sep 22, 2018

Yes, async_set_pht_entry_from_index should fix the race except there could be a deadlock if it is called also from the write fault handler but GC_dirty_inner and write fault handler usage should be mutually exclusive - if manual VDB is used then MPROTECT_VDB functionality is off, and vice versa. So, I don't understand the reason of the deadlock.
On Monday, I will try to reproduce this deadlock.

@ivmai
Copy link
Owner Author

ivmai commented Sep 22, 2018

AO_or is better but not always available.

@ivmai
Copy link
Owner Author

ivmai commented Sep 24, 2018

but it also eventually deadlocks, so there is probably a double locking involved.

Yes, this happens if thread is suspended in the middle of async_set_pht_entry_from_index.

ivmai added a commit that referenced this issue Sep 25, 2018
Issue #235 (bdwgc).

* darwin_stop_world.c [!GC_NO_THREADS_DISCOVERY]
(GC_suspend_thread_list): Call GC_acquire_dirty_lock and
GC_release_dirty_lock around thread_suspend().
* darwin_stop_world.c (GC_stop_world): Likewise.
* include/private/gc_priv.h (set_pht_entry_from_index_safe): Remove
unused macro.
* include/private/gc_priv.h [THREADS] (GC_acquire_dirty_lock,
GC_release_dirty_lock): Define new macro; move comment from
win32_threads.c.
* include/private/gc_priv.h [THREADS && !GC_DISABLE_INCREMENTAL]
(GC_fault_handler_lock): Declare global variable (even if not
MPROTECT_VDB).
* os_dep.c [!GC_DISABLE_INCREMENTAL] (async_set_pht_entry_from_index):
Define even if not MPROTECT_VDB; reformat comments.
* os_dep.c [!GC_DISABLE_INCREMENTAL && AO_HAVE_test_and_set_acquire]
(GC_fault_handler_lock): Likewise.
* os_dep.c [THREADS && AO_HAVE_test_and_set_acquire]
(async_set_pht_entry_from_index): Use GC_acquire_dirty_lock and
GC_release_dirty_lock; remove wrong comment.
* os_dep.c [THREADS && !AO_HAVE_test_and_set_acquire]
(currently_updating, async_set_pht_entry_from_index): Remove incorrect
implementation.
* os_dep.c [!GC_DISABLE_INCREMENTAL] (GC_dirty_inner): Call
async_set_pht_entry_from_index instead of set_pht_entry_from_index;
remove FIXME.
* pthread_stop_world.c [GC_ENABLE_SUSPEND_THREAD] (GC_suspend_thread):
Refine comment for AO_store_release call; call GC_acquire_dirty_lock
and GC_release_dirty_lock around a block of RAISE_SIGNAL() and
sem_wait().
* pthread_stop_world.c [GC_OPENBSD_UTHREADS] (GC_suspend_all): Call
GC_acquire_dirty_lock and GC_release_dirty_lock around
pthread_suspend_np().
* pthread_stop_world.c [!GC_OPENBSD_UTHREADS] (GC_suspend_all): Add
comment.
* pthread_stop_world.c [!GC_OPENBSD_UTHREADS && !NACL] (GC_stop_world):
If GC_manual_vdb then call GC_acquire_dirty_lock and
GC_release_dirty_lock around a block of GC_suspend_all() and
suspend_restart_barrier(); add comment.
* win32_threads.c (GC_suspend): Use GC_acquire_dirty_lock and
GC_release_dirty_lock (regardless of MPROTECT_VDB).
@ivmai
Copy link
Owner Author

ivmai commented Sep 25, 2018

Fixed by 0c0e4cd.

@ivmai ivmai closed this as completed Sep 25, 2018
ivmai added a commit that referenced this issue Sep 26, 2018
(fix of commit 0c0e4cd)

Issue #235 (bdwgc).

* pthread_stop_world.c [GC_ENABLE_SUSPEND_THREAD] (GC_suspend_thread):
Remove unneeded comment for AO_store_release() call; invoke
GC_acquire_dirty_lock() and GC_release_dirty_lock() only if
GC_manual_vdb; add comment for GC_acquire_dirty_lock() call.
* pthread_stop_world.c [NACL] (GC_suspend_all): If GC_manual_vdb then
call GC_acquire_dirty_lock() and GC_release_dirty_lock() around the
code which ensures parking of threads.
* pthread_support.c [CAN_HANDLE_FORK] (fork_prepare_proc): Call
GC_acquire_dirty_lock().
* pthread_support.c [CAN_HANDLE_FORK] (fork_parent_proc,
fork_child_proc): Call GC_release_dirty_lock().
ivmai added a commit that referenced this issue Sep 28, 2018
(a cherry-pick of commits 7305e56, 0c0e4cd, d6db3f0 from 'master')

Issue #235 (bdwgc).

* darwin_stop_world.c [!GC_NO_THREADS_DISCOVERY]
(GC_suspend_thread_list): Call GC_acquire_dirty_lock and
GC_release_dirty_lock around thread_suspend().
* darwin_stop_world.c (GC_stop_world): Likewise.
* include/private/gc_priv.h (clear_pht_entry_from_index,
set_pht_entry_from_index_safe): Remove unused macro.
* include/private/gc_priv.h [THREADS] (GC_acquire_dirty_lock,
GC_release_dirty_lock): Define new macro; move comment from
win32_threads.c.
* include/private/gc_priv.h [THREADS && !GC_DISABLE_INCREMENTAL]
(GC_fault_handler_lock): Declare global variable (even if not
MPROTECT_VDB).
* os_dep.c [!GC_DISABLE_INCREMENTAL] (async_set_pht_entry_from_index):
Define even if not MPROTECT_VDB; reformat comments.
* os_dep.c [!GC_DISABLE_INCREMENTAL && AO_HAVE_test_and_set_acquire]
(GC_fault_handler_lock): Likewise.
* os_dep.c [THREADS && AO_HAVE_test_and_set_acquire]
(async_set_pht_entry_from_index): Use GC_acquire_dirty_lock and
GC_release_dirty_lock; remove wrong comment.
* os_dep.c [THREADS && !AO_HAVE_test_and_set_acquire]
(currently_updating, async_set_pht_entry_from_index): Remove incorrect
implementation.
* os_dep.c [!GC_DISABLE_INCREMENTAL] (GC_dirty_inner): Call
async_set_pht_entry_from_index instead of set_pht_entry_from_index;
remove FIXME.
* pthread_stop_world.c [GC_ENABLE_SUSPEND_THREAD] (GC_suspend_thread):
If GC_manual_vdb then call GC_acquire_dirty_lock and
GC_release_dirty_lock around a block of RAISE_SIGNAL() and sem_wait().
* pthread_stop_world.c [GC_OPENBSD_UTHREADS] (GC_suspend_all): Call
GC_acquire_dirty_lock and GC_release_dirty_lock around
pthread_suspend_np().
* pthread_stop_world.c [NACL] (GC_suspend_all): If GC_manual_vdb then
call GC_acquire_dirty_lock() and GC_release_dirty_lock() around the
code which ensures parking of threads.
* pthread_stop_world.c [!GC_OPENBSD_UTHREADS] (GC_suspend_all): Add
comment.
* pthread_stop_world.c [!GC_OPENBSD_UTHREADS && !NACL] (GC_stop_world):
If GC_manual_vdb then call GC_acquire_dirty_lock and
GC_release_dirty_lock around a block of GC_suspend_all() and
suspend_restart_barrier(); add comment.
* pthread_support.c [CAN_HANDLE_FORK] (fork_prepare_proc): Call
GC_acquire_dirty_lock().
* pthread_support.c [CAN_HANDLE_FORK] (fork_parent_proc,
fork_child_proc): Call GC_release_dirty_lock().
* win32_threads.c (GC_suspend): Use GC_acquire_dirty_lock and
GC_release_dirty_lock (regardless of MPROTECT_VDB).
ivmai added a commit that referenced this issue Oct 1, 2018
(back-port of commit 815dde3 from 'release-8_0')

Issue #235 (bdwgc).

* darwin_stop_world.c [!GC_NO_THREADS_DISCOVERY]
(GC_suspend_thread_list): Call GC_acquire_dirty_lock and
GC_release_dirty_lock around thread_suspend().
* darwin_stop_world.c (GC_stop_world): Likewise.
* include/private/gc_priv.h (clear_pht_entry_from_index,
set_pht_entry_from_index_safe): Remove unused macro.
* include/private/gc_priv.h [THREADS] (GC_acquire_dirty_lock,
GC_release_dirty_lock): Define new macro; move comment from
win32_threads.c.
* include/private/gc_priv.h [THREADS && (MANUAL_VDB || MPROTECT_VDB)]
(GC_fault_handler_lock): Declare global variable (even if not
MPROTECT_VDB).
* os_dep.c [MANUAL_VDB] (async_set_pht_entry_from_index): Define as
for MPROTECT_VDB case; reformat comments.
* os_dep.c [MANUAL_VDB && AO_HAVE_test_and_set_acquire]
(GC_fault_handler_lock): Likewise.
* os_dep.c [THREADS && AO_HAVE_test_and_set_acquire]
(async_set_pht_entry_from_index): Use GC_acquire_dirty_lock and
GC_release_dirty_lock; remove wrong comment.
* os_dep.c [THREADS && !AO_HAVE_test_and_set_acquire]
(currently_updating, async_set_pht_entry_from_index): Remove incorrect
implementation.
* pthread_stop_world.c [GC_ENABLE_SUSPEND_THREAD && MANUAL_VDB]
(GC_suspend_thread): Call GC_acquire_dirty_lock and
GC_release_dirty_lock around a block of RAISE_SIGNAL() and sem_wait().
* pthread_stop_world.c [GC_OPENBSD_UTHREADS] (GC_suspend_all): Call
GC_acquire_dirty_lock and GC_release_dirty_lock around
pthread_suspend_np().
* pthread_stop_world.c [NACL && MANUAL_VDB] (GC_suspend_all):
Call GC_acquire_dirty_lock() and GC_release_dirty_lock() around the
code which ensures parking of threads.
* pthread_stop_world.c [!GC_OPENBSD_UTHREADS] (GC_suspend_all): Add
comment.
* pthread_stop_world.c [!GC_OPENBSD_UTHREADS && !NACL && MANUAL_VDB]
(GC_stop_world): Call GC_acquire_dirty_lock and
GC_release_dirty_lock around a block of GC_suspend_all() and
suspend_restart_barrier(); add comment.
* pthread_support.c [CAN_HANDLE_FORK] (fork_prepare_proc): Call
GC_acquire_dirty_lock().
* pthread_support.c [CAN_HANDLE_FORK] (fork_parent_proc,
fork_child_proc): Call GC_release_dirty_lock().
* win32_threads.c (GC_suspend): Use GC_acquire_dirty_lock and
GC_release_dirty_lock (regardless of MPROTECT_VDB).
ivmai added a commit that referenced this issue Oct 2, 2018
(back-port of commit 26e9e59 from 'release-7_6')

Issue #235 (bdwgc).

* darwin_stop_world.c [!GC_NO_THREADS_DISCOVERY]
(GC_suspend_thread_list): Call GC_acquire_dirty_lock and
GC_release_dirty_lock around thread_suspend().
* darwin_stop_world.c (GC_stop_world): Likewise.
* include/private/gc_priv.h [THREADS] (GC_acquire_dirty_lock,
GC_release_dirty_lock): Define new macro; move comment from
win32_threads.c.
* include/private/gc_priv.h [THREADS && (MANUAL_VDB || MPROTECT_VDB)]
(GC_fault_handler_lock): Declare global variable (even if not
MPROTECT_VDB).
* os_dep.c [MANUAL_VDB] (async_set_pht_entry_from_index): Define as
for MPROTECT_VDB case; reformat comments.
* os_dep.c [MANUAL_VDB && AO_HAVE_test_and_set_acquire]
(GC_fault_handler_lock): Likewise.
* os_dep.c [THREADS && AO_HAVE_test_and_set_acquire]
(async_set_pht_entry_from_index): Use GC_acquire_dirty_lock and
GC_release_dirty_lock; remove wrong comment.
* pthread_stop_world.c [GC_ENABLE_SUSPEND_THREAD && MANUAL_VDB]
(GC_suspend_thread): Call GC_acquire_dirty_lock and
GC_release_dirty_lock around a block of pthread_kill() and sem_wait().
* pthread_stop_world.c [GC_OPENBSD_UTHREADS] (GC_suspend_all): Call
GC_acquire_dirty_lock and GC_release_dirty_lock around
pthread_suspend_np().
* pthread_stop_world.c [NACL && MANUAL_VDB] (GC_suspend_all):
Call GC_acquire_dirty_lock() and GC_release_dirty_lock() around the
code which ensures parking of threads.
* pthread_stop_world.c [!GC_OPENBSD_UTHREADS] (GC_suspend_all): Add
comment.
* pthread_stop_world.c [!GC_OPENBSD_UTHREADS && !NACL && MANUAL_VDB]
(GC_stop_world): Call GC_acquire_dirty_lock and
GC_release_dirty_lock around a block of GC_suspend_all() and
suspend_restart_barrier(); add comment.
* pthread_support.c [CAN_HANDLE_FORK] (fork_prepare_proc): Call
GC_acquire_dirty_lock().
* pthread_support.c [CAN_HANDLE_FORK] (fork_parent_proc,
fork_child_proc): Call GC_release_dirty_lock().
* win32_threads.c (GC_suspend): Use GC_acquire_dirty_lock and
GC_release_dirty_lock (regardless of MPROTECT_VDB).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants