Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move confirming_set on to ledger and add a shared_mutex for keeping memory objects in sync with the ledger #4567

Closed
wants to merge 16 commits into from

Conversation

clemahieu
Copy link
Contributor

@clemahieu clemahieu commented Apr 16, 2024

This adds a shared_mutex to ledger transactions which allows shared read access and exclusive write access to the ledger. This should fix the problem identified here #4540, and we will have a similar need when synchronizing bounded backlog memory objects. I will break this PR up in to discrete chunks starting with #4566.

The existing ledger operations use store transactions as the synchronization method though this uses an MVCC strategy which doesn't play well when also synchronizing memory objects. Adding a shared_mutex to the transaction means ledger writes are done exclusively to ledger reads while reads can co-exist with other reads.

There are several seemingly unrelated changes added but there is a commonality in their purpose: they addressed deadlocks that occur under the new mutually exclusive write transaction behavior. The changes fall under 2 major categories:

Non-obvious recursive acquisition of ledger read transaction.

While the ledger transactions support shared/concurrent read access, they no longer support recursive read transaction acquisition e.g. acquiring a ledger read transaction while an existing one is on the thread stack. The reason for this is that if a write transaction with exclusive lock access is requested elsewhere in the program between two read transactions requests, the second inner read transaction will be deadlocked. This is due to the behavior of shared_mutex where once unique_lock access is requested, it will block new shared_lock acquisition until the unique_lock can be serviced.

Lock order inversion by holding a read transaction across component boundaries.

A common pattern is for the node to have a component with a thread that accepts input guarded by a mutex and the thread does some database operation during its loop. This means as a general rule components have a lock order of: component-mutex then ledger-mutex.
Components also issue notifications or callbacks during their processing loop which sends data to other components. We run into issues when a component needs to pass data to another component: if the database mutex is being held while dispatching this creates a ledger-mutex then component-mutex lock order inversion which TSAN identifies.
To solve these cases the component must not hold a database transaction while making a callback or notifying observers.

It was inconvenient at first to fix up all these cases, however, it seemed to generally address places that might have caused read transactions to stay open for a long time or give unexpected results because of MVCC behavior instead of mutex behavior.

… and can be expanded to include memory locking.
… within the block operation in order to break lock cycle to rolled_back observers.
# Conflicts:
#	nano/core_test/active_transactions.cpp
#	nano/core_test/conflicts.cpp
#	nano/core_test/election.cpp
#	nano/core_test/election_scheduler.cpp
#	nano/core_test/ledger.cpp
#	nano/core_test/network.cpp
#	nano/core_test/node.cpp
#	nano/node/node.cpp
#	nano/node/scheduler/priority.cpp
#	nano/node/scheduler/priority.hpp
#	nano/rpc_test/rpc.cpp
# Conflicts:
#	nano/core_test/backlog.cpp
#	nano/node/backlog_population.hpp
#	nano/node/node.cpp
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant