Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix an assertion failure if an async write callback ran during a write transaction #6661

Merged
merged 1 commit into from
May 26, 2023

Conversation

tgoyne
Copy link
Member

@tgoyne tgoyne commented May 23, 2023

Between when the callback after acquiring the write lock is scheduled and when it's invoked a synchronous write transaction can be begun, and if it's not ended before the next time the scheduler gets to run, the scheduled callback will be invoked inside the write. When this happens we want to just do nothing. Ending the synchronous write transaction will take care of rescheduling the async write it preempted.

Fixes #6649.

@tgoyne tgoyne self-assigned this May 23, 2023
@cla-bot cla-bot bot added the cla: yes label May 23, 2023
Base automatically changed from tg/ub to master May 24, 2023 01:16
…e transaction

Between when the callback after acquiring the write lock is scheduled and when
it's invoked a synchronous write transaction can be begun, and if it's not
ended before the next time the scheduler gets to run, the scheduled callback
will be invoked inside the write. When this happens we want to just do nothing.
Ending the synchronous write transaction will take care of rescheduling the
async write it preempted.
@tgoyne tgoyne force-pushed the tg/async-write-assert branch from 7223509 to 9869e0d Compare May 24, 2023 01:34
@tgoyne tgoyne marked this pull request as ready for review May 24, 2023 15:45
@tgoyne tgoyne requested a review from nicola-cab May 24, 2023 15:45
@tgoyne
Copy link
Member Author

tgoyne commented May 24, 2023

The TSan failure reported here is real, but unrelated to this PR.

Copy link
Member

@nicola-cab nicola-cab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for fixing this.

@@ -789,6 +789,12 @@ void Realm::run_writes()
// writes as we can't add commits while in that state
return;
}
if (is_in_transaction()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this check can cause starvation in case we have a lot of sync writes competing with async writes. I don't know if this is a realm problem, but maybe if we notice issues around this, we might think about adding some heuristic.

@tgoyne tgoyne merged commit 85a4e63 into master May 26, 2023
@tgoyne tgoyne deleted the tg/async-write-assert branch May 26, 2023 15:47
papafe added a commit that referenced this pull request May 30, 2023
…e-core-13.13

* commit '36b2df44453ba5e830369c76ab0683799a287605': (34 commits)
  core release 13.13.0
  Fix an assertion failure if an async write callback ran during a write transaction (#6661)
  Remove catch() clause to prevent truncating stack trace in AsyncOper::do_recycle_and_execute() (#6667)
  Allow geo coordinate numeric argument substitutions (#6663)
  Fix a lock order inversion in tests (#6666)
  Avoid performing unaligned reads in Array::get_chunk()
  Fix a few UBSan failures hit by tests
  Use std::optional to store cached leaves in query nodes (#6653)
  maybe fix a race in a test (#6651)
  Access token refresh for websockets was not updating the location metadata (#6631)
  Updated release notes
  Prepare release
  Updated baas server tag for CI (#6650)
  Geospatial basic queries benchmarks (#6621)
  Geospatial feedback (#6645)
  Add support for early exit in BPlusTree::for_all()
  Filter out unresolved links in Dictionary::get_any()
  Benchmark for full-text search
  geospatial validation of polygons (#6607)
  Fix importing Results with deleted collection
  ...
papafe added a commit that referenced this pull request Jun 7, 2023
Co-authored-by: Christian Melchior <[email protected]>
Co-authored-by: James Stone <[email protected]>
Co-authored-by: realm-ci <[email protected]>
Co-authored-by: Kirill Burtsev <[email protected]>
Co-authored-by: Daniel Tabacaru <[email protected]>
Co-authored-by: Thomas Goyne <[email protected]>
Co-authored-by: Thomas Goyne <[email protected]>
Co-authored-by: Jørgen Edelbo <[email protected]>
Co-authored-by: Michael Wilkerson-Barker <[email protected]>
Co-authored-by: Nicola Cabiddu <[email protected]>
fix entries that went to the wrong change version (#6632)
fix a race in a test (#6651)
Fix a lock order inversion in tests (#6666)
Fix an assertion failure if an async write callback ran during a write transaction (#6661)
kraenhansen added a commit that referenced this pull request Jun 19, 2023
* Updated release notes

* Update catch2 v3.3.2 (#6605)

* Make core infer platform and cpu_arch, while bundle_id must be provided by SDK's (#6612)

* platform and cpu_arch are inferred by core, bundle_id must be provided by SDK's

* update changelog

* Return proper value for X86_64 arch

Co-authored-by: Christian Melchior <[email protected]>

* Get fine-grained platform for Apple devices

* Fix tests

* small fixes

* fix more tests

* Fix mistake in changelog

---------

Co-authored-by: Christian Melchior <[email protected]>

* use consistent rounding, following SERVER-75392 (#6477)

* fix entries that went to the wrong change version (#6632)

* Special-case main thread runloop scheduler

* Improve SectionedResults performance

SectionedResults used a std::map in a few places where the keys are a dense
range (i.e. always [0..map.size())) and so they can be std::vector instead. The
maps keyed on Mixed are now std::unordered_map.

Change notifications now report changes as a `std::vector<IndexSet>` rather
than `std::map<size_t, IndexSet>`. This is slower and uses more memory when the
only sections that changed are near the end of a SectionedResults with a large
number of sections, but is much faster if all sections changed or if the
sections which changed are early in the SectionedResults. Change notifications
now reuse buffers, which increases persistent memory usage slightly but
significant reduces allocations.

Change notifications for a single section now only compute the changes for that
section rather than computing the full changes and then filtering out the
changes for other sections.

* use static_assert rather than a old home rolled one

* fix warning of redefine of CHECK macro

* fix unused function warning

* silence warnings in bid128_to_string

* Introduce BPlusTree::for_all

* Prevent program from crashing when removing backlinks

* Fix broken snapshot of collection of objects

* Fix importing Results with deleted collection

The result should be an empty result, not the whole table.

* geospatial validation of polygons (#6607)

* geospatial validation of polygons

* Loop->Ring, added tests

* use std::unique

* changelog

* Benchmark for full-text search

* Allow to filter benchmark and run only list of specified names
* Add simple benchmark for fulltext search with index

* Filter out unresolved links in Dictionary::get_any()

* Add support for early exit in BPlusTree::for_all()

* Geospatial feedback (#6645)

* verify local results match a server query

* disallow geowithin on top level tables

* fix geo queries with ANY/ALL/NONE

* geospatial validation of points

* rename GeoCenterSphere -> GeoCircle

* review feedback

* better testing and fix any/all/none geospatial

* format

* Geospatial basic queries benchmarks (#6621)

* Add basic benchmarks for Geospatial type and queries

* Less copying in GeoWithinCompare

* Bring back caching of s2 region into Geospatial

* remove transaction overhead from measurements

* a couple small optimizations

* formatting

* simplify geospatial query evaluations

* changelog

---------

Co-authored-by: James Stone <[email protected]>

* Updated baas server tag for CI (#6650)

* Prepare release

* Updated release notes

* Access token refresh for websockets was not updating the location metadata (#6631)

* Always refresh metadata on app login
* Updated changelog
* Always update location when requested; fix c_api test
* Update test to properly evaluate websocket redirections; added one more test
* Updated changelog and fixed compile warning
* Added location checks back to test
* added mutex locking around location updated state and reworked requesting location update to use flag
* clang format and fix incorrect timeout value
* Reworked update location logic a bit and removed unused function
* Free mutex before calling completion on early exit in init_app_metadata

* maybe fix a race in a test (#6651)

* Use std::optional to store cached leaves in query nodes (#6653)

Our use of aligned_storage was basically a complicated manual version of this.
I was hoping this'd have binary size benefits, but it ended up making the
library 100 bytes larger instead. Nonetheless, it greatly simplifies things.

* Fix a few UBSan failures hit by tests

* Avoid performing unaligned reads in Array::get_chunk()

* Fix a lock order inversion in tests (#6666)

The cycle was DaemonThread::m_running_on_change_mutex =>
RealmCoordinator::m_realm_mutex  => SyncManager::m_mutex  =>
RealmCoordinator::s_coordinator_mutex  =>
DaemonThread::m_running_on_change_mutex, and it happened due to
DaemonThread::remove() being called inside RealmCoordinator::clear_cache()
while holding s_coordinator_mutex. Fortunately we don't actually need to be doing that.

As the cycle required RealmCoordinator::clear_all_caches(), this was only
applicable to tests.

* Allow geo coordinate numeric argument substitutions (#6663)

* allow geo coordinate numeric argument substitutions

* review feedback

* explicit cast to address warning

* Remove catch() clause to prevent truncating stack trace in AsyncOper::do_recycle_and_execute() (#6667)

* Fix an assertion failure if an async write callback ran during a write transaction (#6661)

Between when the callback after acquiring the write lock is scheduled and when
it's invoked a synchronous write transaction can be begun, and if it's not
ended before the next time the scheduler gets to run, the scheduled callback
will be invoked inside the write. When this happens we want to just do nothing.
Ending the synchronous write transaction will take care of rescheduling the
async write it preempted.

* core release 13.13.0

* Updated release notes

* Allocate arguments for lists (#6674)

* Small documentation and code fixes (#6672)

* Fix crash when opening FLX realm after client reset failure (#6671)

* Fix crash when opening FLX realm after client reset failure

* Update changelog

* Don't superceed pending subscriptions in case of a client reset failure

* Add test

* Changes after code review

* Support sorting based on values from a dictionary (#5311)

Co-authored-by: Sebastian Valle <[email protected]>
Co-authored-by: James Stone <[email protected]>

* Filter out external sources from Eclipse (#6682)

Indexer has a hard time dealing with Catch2

* Use cross-compilers instead of CentOS image (#6559)

* Use cross-compilers instead of CentOS image

* changelog

* fix bad merge

* refactor toolchain files

* clarify useToolchain exception circumstances

* Remap github URL to ssh to fix BAAS dependency using https:// (#6685)

* core v13.14.0

* Updated release notes

* Switch to building with Xcode 14 (#6647)

* better fix explanation in the changelog for list of args in the query parser (#6692)

* Remove constructor for GeoPoint and GeoPolygon (#6679)

Co-authored-by: Mathias Stearn <[email protected]>

* Fix failing "sync: non-synced metadata table doesn't result in non-additive schema change" tests (#6697)

* Reporting correct error message on HTTP errors for Browser target

* User/Server API key provider becomes a single 'API key' provider (#6696)

* Allow frozen Realms to be opened with additive schema changes (#6693)

* allow frozen Realms to be opened with additive schema changes

* lint

* strengthen tests and comments

* Update src/realm/object-store/shared_realm.cpp

Co-authored-by: Thomas Goyne <[email protected]>

---------

Co-authored-by: Thomas Goyne <[email protected]>

* Reverted minimum swift version to fix failing CI tests (#6706)

* core release v13.15.0

* Updated release notes

* Fix client reset test with invalid query (#6711)

* Fix SessionWrapper use-after-free crash when tearing down sessions (#6676)

* Changed SessionWrapper pointer to bind_ptr; added session ident history
* Fix teardown if client is destroyed before session
* Session no longer holds bind_ptr to SessionWrapper; reverted some changes
* Fixed return and updated some comments
* Don't process errors if session is shutting down
* Added extra checks for session state
* Updates from review
* Updated some finalized checks
* Rolled back some changes
* Added output to ASSERTS and moved session history to unordered_set
* Remove session history entry on normal close
* Updated comment in sync tests

* Add [baas] and [local] tags to object store sync tests to identify the tests that rely on BAAS or not (#6710)

* Use Columns<Link> when property is Dictionary of links (#6705)

If a Dictionary property has links as value type, we can use Columns<Link> to handle
the links instead of the basic Columns<Dictionary>. This has the effect that when we
compare with a single value, we will optimize to use LinksToNode. So we need to make
LinksToNode handle the Dictionary case.

When we compare with a list of links, we must ensure that the list is converted to
a list obj ObjKeys - which is the type that Column<Link> evaluates to.

 Use LinksToNode for lists in QueryParser

* better changelog message for the fix related to queries with list of arguments (#6717)

* Fixes for Emscripten target (Passing header from fetch response. Using Config.path for inMemory Realm) (#6716)

* Fixes for Emscripten target: Passing header for fetch response. Passing the RealmConfig.path to be used for inMemory Realm, this is needed for registering SyncSession

Co-authored-by: Jørgen Edelbo <[email protected]>

* release 13.15.1

* Updated spec.yml to remove User & Server prefix from ApiKey credentials

---------

Co-authored-by: James Stone <[email protected]>
Co-authored-by: realm-ci <[email protected]>
Co-authored-by: Kirill Burtsev <[email protected]>
Co-authored-by: Daniel Tabacaru <[email protected]>
Co-authored-by: Christian Melchior <[email protected]>
Co-authored-by: Thomas Goyne <[email protected]>
Co-authored-by: Thomas Goyne <[email protected]>
Co-authored-by: Jørgen Edelbo <[email protected]>
Co-authored-by: Michael Wilkerson-Barker <[email protected]>
Co-authored-by: Nicola Cabiddu <[email protected]>
Co-authored-by: Sebastian Valle <[email protected]>
Co-authored-by: Yavor Georgiev <[email protected]>
Co-authored-by: Ferdinando Papale <[email protected]>
Co-authored-by: Mathias Stearn <[email protected]>
Co-authored-by: Nabil Hachicha <[email protected]>
Co-authored-by: Finn Schiermer Andersen <[email protected]>
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 21, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants