-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FATAL: ThreadSanitizer CHECK failed #950
Comments
At the very least you may disable the deadlock detector ( If you have a reasonably small reproducer we may try to get this fixed. |
Thanks Kostya. I've already disabled the deadlock detector, but wanted to enable it if possible. It looks like the problem may be too many mutexes? That's quite possible, as the code that triggers this is an internal stress test pgm that creates many objects. If so, I could try building locally with a different size for all_locks_with_contexts_ -- do you know where that is set? Also, on a related note, it also looks like TSAN reports what appear to be false positives w/recursive mutexes -- is that true? Thanks again! |
The code in question is in lib/sanitizer_common/sanitizer_deadlock_detector.h I have to admit that I don't remember fine details of this code any more (haven't touched since 2014). |
@WallStProg you showed some "destroy of a locked mutex" reports. I suspect they cause the unbounded number of mutexes locked by a thread. POSIX is very clear on this:
http://pubs.opengroup.org/onlinepubs/009695399/functions/pthread_mutex_destroy.html |
It appears that the CHECK failure is in fact caused by creating > 64 mutexes in a single thread -- unusual, but this code is a stress test that does that deliberately. I hear you about the "destroy locked" reports, and I'm in the process of fixing that. (I inherited this code, which has been running for quite some time w/no apparent issues, but I agree that UB is not OK). For now, I've disabled the problematic tests when running w/"detect_deadlocks=1". Thanks! |
TSAN limits the number of simultaneous lock acquisitions in a single thread to 64 when using the deadlock detector[1]. However, compaction can select up to 128 (128MB budget / 1MB min rowset size) rowsets in a single op. kudu-tool-test's TestNonRandomWorkloadLoadgen almost always hits TSAN's limit when the KUDU-1400 changes following this patch are applied. This patch prevents this by limiting the number of rowsets selected for a compaction to 32 when running under TSAN. I ran the test with the KUDU-1400 changes on top and saw 97/100 failures. With the change, I saw 100 successes. [1]: google/sanitizers#950 Change-Id: I01ad4ba3a13995c194c3308d72c1eb9b611ef766 Reviewed-on: http://gerrit.cloudera.org:8080/11885 Tested-by: Kudu Jenkins Reviewed-by: Adar Dembo <[email protected]> Reviewed-by: Andrew Wong <[email protected]>
TSAN limits the number of simultaneous lock acquisitions in a single thread to 64 when using the deadlock detector[1]. However, compaction can select up to 128 (128MB budget / 1MB min rowset size) rowsets in a single op. kudu-tool-test's TestNonRandomWorkloadLoadgen almost always hits TSAN's limit when the KUDU-1400 changes following this patch are applied. This patch prevents this by limiting the number of rowsets selected for a compaction to 32 when running under TSAN. I ran the test with the KUDU-1400 changes on top and saw 97/100 failures. With the change, I saw 100 successes. [1]: google/sanitizers#950 Change-Id: I01ad4ba3a13995c194c3308d72c1eb9b611ef766 Reviewed-on: http://gerrit.cloudera.org:8080/11885 Tested-by: Kudu Jenkins Reviewed-by: Adar Dembo <[email protected]> Reviewed-by: Andrew Wong <[email protected]>
As we hold a mutex for our custom C++ Node, when calling reentrant backward from custom C++ function, we will cocurrently holding many mutexes up to MAX_DEPTH. TSAN only allow 65 mutexes at once, otherwise it will complain. This PR lower the limit according to TSAN. TSAN Reference: google/sanitizers#950 [ghstack-poisoned]
As we hold a mutex for our custom C++ Node, when calling reentrant backward from custom C++ function, we will cocurrently holding many mutexes up to MAX_DEPTH. TSAN only allow 65 mutexes at once, otherwise it will complain. This PR lower the limit according to TSAN. TSAN Reference: google/sanitizers#950 ghstack-source-id: de61c260ea671025b486c0118af782efdf07aab3 Pull Request resolved: #36745
Summary: Pull Request resolved: #36745 As we hold a mutex for our custom C++ Node, when calling reentrant backward from custom C++ function, we will cocurrently holding many mutexes up to MAX_DEPTH. TSAN only allow 65 mutexes at once, otherwise it will complain. This PR lower the limit according to TSAN. TSAN Reference: google/sanitizers#950 Test Plan: Imported from OSS Differential Revision: D21072604 Pulled By: wanchaol fbshipit-source-id: 99cd1acab41a203d834fa4947f4e6f0ffd2e70f2
…utexes Under TSan you can lock only not more then 64 mutexes from one thread at once [1] [2], while RESTART REPLICAS can acquire more (it depends on the number of replicated tables). [1]: google/sanitizers#950 (comment) [2]: https://github.com/llvm/llvm-project/blob/b02eab9058e58782fca32dd8b1e53c27ed93f866/compiler-rt/lib/sanitizer_common/sanitizer_deadlock_detector.h#L67 And since stress tests run tests in parallel, you can have more then 64 ReplicatedMergeTree tables at once (even though it is unlikely). Fix this by using RESTART REPLICA table over RESTART REPLICAS.
This commit adds a new GitHub Actions workflow that checks Jolt with ThreadSanitizer under Ubuntu & Clang. * Replaces usage of atomic_thread_fences in Reference.h and JobSystem.h with regular atomic ops when building under TSAN, as it does not support fences and unlikely ever will do so. * Limits the max number of mutexes to use under TSAN to work around a TSAN limitation, see: google/sanitizers#950. * Replaces Semaphore::mCount with an atomic int that's used in relaxed mode. Co-authored-by: Jorrit Rouwe <[email protected]>
…read can hold (#116409) I've run into an issue where TSan can't be used on some code without turning off deadlock detection because a thread tries to hold too many mutexes. It would be preferable to be able to use deadlock detection as that is a major benefit of TSan. Its mentioned in google/sanitizers#950 that the 64 mutex limit was an arbitrary number. I've increased it to 128 and all the tests still pass. Considering the increasing number of cores on CPUs and how programs can now use more threads to take advantage of it, I think raising the limit to 128 would be some good future proofing --------- Co-authored-by: Vitaly Buka <[email protected]>
I'm getting the following error consistently w/several of my test pgms when built w/TSAN.
Any ideas on how to work around this would be much appreciated -- thanks in advance!
The text was updated successfully, but these errors were encountered: