-
Notifications
You must be signed in to change notification settings - Fork 6.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Race condition on all SetOptions on "block_based_table_factory" #10079
Comments
@mrambacher Is this something you can look into? The race condition seems rather "baked in" to the Configurable interface with GetOptionsPtr. |
@pdillinger I will see if I can reproduce this on my laptop. This may be related to the IsMutable #8928, which attempted to avoid some of these race conditions, but that is just conjecture at this point. |
Summary: I wanted to make filter_policy a mutable option, but the SetOptions race fix proposed in facebook#10480 doesn't work for object options like filter_policy. (Race still reported.) I found that when the options objects are copied for persisting to file, the table factory (and thus options) is not copied, therefore leading to a race between persisting the options and subsequent updates with SetOptions(). It appears that fixing this fixes the race that was being reported with SetOptions() in the Fixes facebook#10079 Intended follow-up: See facebook#10021 Test Plan: unit test added, crash test updated with more SetOptions and higher probability of calling it. Ran TSAN blackbox crashtest for hours with options to amplify potential race (see facebook#10079)
Summary: I wanted to make filter_policy a mutable option, but the SetOptions race fix proposed in facebook#10480 doesn't work for object options like filter_policy. (Race still reported.) I found that when the options objects are copied for persisting to file, the table factory (and thus options) is not copied, therefore leading to a race between persisting the options and subsequent updates with SetOptions(). It appears that fixing this fixes the race that was being reported with SetOptions() in the Fixes facebook#10079 Intended follow-up: See facebook#10021 Test Plan: unit test added, crash test updated with more SetOptions and higher probability of calling it. Ran TSAN blackbox crashtest for hours with options to amplify potential race (see facebook#10079)
Summary: An internal user wants to be able to dynamically switch between Bloom and Ribbon filters, without a custom FilterPolicy. Making `filter_policy` mutable would actually make issue facebook#10079 worse, because it would be a race on a pointer field, not just on scalars. As a reasonable compromise until that is fixed, I am enabling dynamic control over Bloom vs. Ribbon choice by making RibbonFilterPolicy::bloom_before_level mutable, and doing that safely by using an atomic. I've also slightly tweaked the interpretation of that field so that setting it to INT_MAX really means "always Bloom." Test Plan: unit tests added/extended. crash test updated for SetOptions call and tested under TSAN with amplified probability (lower set_options_one_in).
…11838) Summary: An internal user wants to be able to dynamically switch between Bloom and Ribbon filters, without a custom FilterPolicy. Making `filter_policy` mutable would actually make issue #10079 worse, because it would be a race on a pointer field, not just on scalars. As a reasonable compromise until that is fixed, I am enabling dynamic control over Bloom vs. Ribbon choice by making RibbonFilterPolicy::bloom_before_level mutable, and doing that safely by using an atomic. I've also slightly tweaked the interpretation of that field so that setting it to INT_MAX really means "always Bloom." Pull Request resolved: #11838 Test Plan: unit tests added/extended. crash test updated for SetOptions call and tested under TSAN with amplified probability (lower set_options_one_in). Reviewed By: ajkr Differential Revision: D49296284 Pulled By: pdillinger fbshipit-source-id: e4251c077510df9a9c719876f482448c0d15402a
Summary: In follow-up to #11922, fix a race in functions like CreateColumnFamily and SetDBOptions where the DB reports one option setting but a different one is left in effect. To fix, we can add an extra mutex around these rare operations. We don't want to hold the DB mutex during I/O or other slow things because of the many purposes it serves, but a mutex more limited to these cases should be fine. I believe this would fix a write-write race in #10079 but not the read-write race. Intended follow-up to this: * Should be able to remove write thread synchronization from DBImpl::WriteOptionsFile Pull Request resolved: #11929 Test Plan: Added two mini-stress style regression tests that fail with >1% probability before this change: DBOptionsTest::SetStatsDumpPeriodSecRace ColumnFamilyTest::CreateAndDropPeriodicRace I haven't reproduced such an inconsistency between in-memory options and on disk latest options, but this change at least improves safety and adds a test anyway: DBOptionsTest::SetStatsDumpPeriodSecRace Reviewed By: ajkr Differential Revision: D50024506 Pulled By: pdillinger fbshipit-source-id: 1e99a9ed4d96fdcf3ac5061ec6b3cee78aecdda4
As noted in #11929, after that change there seems to be only a read-write race (not a write-write race), which should be "generally OK / safe" for non-pointer, 64-bit-or-less options. |
Summary: In theory, there should be no danger in mutability, as table builders and readers work from copies of BlockBasedTableOptions. However, there is currently an unresolved read-write race that affecting SetOptions on BBTO fields. This should be generally acceptable for non-pointer options of 64 bits or less, but a fix is needed to make it mutability general here. See #10079 This change systematically sets all of those "simple" options (and future such options) as mutable. (Resurrecting this PR perhaps preferable to proposed #13063) Pull Request resolved: #10021 Test Plan: Some unit test updates. XXX comment added to stress test code Reviewed By: cbi42 Differential Revision: D64360967 Pulled By: pdillinger fbshipit-source-id: ff220fa778331852fe331b42b76ac4adfcd2d760
Summary: This is setting up for a fix to a data race in SetOptions on BlockBasedTableOptions (BBTO), facebook#10079 The race will be fixed by replacing `table_factory` with a modified copy whenever we want to modify a BBTO field. An argument could be made that this change creates more entaglement between features (e.g. BlobSource <-> MutableCFOptions), rather than (conceptually) minimizing the dependencies of each feature, but * Most of these things already depended on ImmutableOptions * Historically there has been a lot of plumbing (and possible small CPU overhead) involved in adding features that need to reach a lot of places, like `block_protection_bytes_per_key`. Keeping those wrapped up in options simplifies that. * SuperVersion management generally takes care of lifetime management of MutableCFOptions, so is not that difficult. (Crash test agrees so far.) There are some FIXME places where it is known to be unsafe to replace `block_cache` unless/until we handle shared_ptr tracking properly. HOWEVER, replacing `block_cache` is generally dubious, at least while existing users of the old block cache (e.g. table readers) can continue indefinitely. The change to cf_options.cc is essentially just moving code (not changing). I'm not concerned about the performance of copying another shared_ptr with MutableCFOptions, but I left a note about considering an improvement if more shared_ptr are added to it. Test Plan: existing tests, crash test. Unit test DBOptionsTest.GetLatestCFOptions updated with some temporary logic. MemoryTest required some refactoring (simplification) for the change.
Summary: This is setting up for a fix to a data race in SetOptions on BlockBasedTableOptions (BBTO), #10079 The race will be fixed by replacing `table_factory` with a modified copy whenever we want to modify a BBTO field. An argument could be made that this change creates more entaglement between features (e.g. BlobSource <-> MutableCFOptions), rather than (conceptually) minimizing the dependencies of each feature, but * Most of these things already depended on ImmutableOptions * Historically there has been a lot of plumbing (and possible small CPU overhead) involved in adding features that need to reach a lot of places, like `block_protection_bytes_per_key`. Keeping those wrapped up in options simplifies that. * SuperVersion management generally takes care of lifetime management of MutableCFOptions, so is not that difficult. (Crash test agrees so far.) There are some FIXME places where it is known to be unsafe to replace `block_cache` unless/until we handle shared_ptr tracking properly. HOWEVER, replacing `block_cache` is generally dubious, at least while existing users of the old block cache (e.g. table readers) can continue indefinitely. The change to cf_options.cc is essentially just moving code (not changing). I'm not concerned about the performance of copying another shared_ptr with MutableCFOptions, but I left a note about considering an improvement if more shared_ptr are added to it. Pull Request resolved: #13077 Test Plan: existing tests, crash test. Unit test DBOptionsTest.GetLatestCFOptions updated with some temporary logic. MemoryTest required some refactoring (simplification) for the change. Reviewed By: cbi42 Differential Revision: D64546903 Pulled By: pdillinger fbshipit-source-id: 69ae97ce5cf4c01b58edc4c5d4687eb1e5bf5855
Summary: WIP Fixes facebook#10079 Test Plan: added to unit tests and crash test Ran TSAN blackbox crashtest for hours with options to amplify potential race (see facebook#10079)
Summary: WIP Fixes facebook#10079 Test Plan: added to unit tests and crash test Ran TSAN blackbox crashtest for hours with options to amplify potential race (see facebook#10079)
Apply this patch to expand SetOptions testing in stress test and amplify bad behavior:
Run
USE_CLANG=1 COMPILE_WITH_TSAN=1 make blackbox_crash_test
and quickly observe race condition reported.The text was updated successfully, but these errors were encountered: