-
Notifications
You must be signed in to change notification settings - Fork 165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(docs): prevent deadlocks with streams returned from docs actor #2346
Conversation
So the downside of this is that you will commit write transactions much more often than before. This makes a big difference on mac and probably windows, but not on linux. For some of the iterators (replicas and authors) it might be better to just collect into a vec. Or are we designing for millions of authors or replicas? I guess it is a valid fix nevertheless. But: will this code be retained as we move to willow, or is it going to be replaced? If it is going to be replaced I would say this is fine if it works. |
In willow we will likely have the exact same situation: Iterators from the store have to yield if the consuming stream does not process the items fast enough, so they have to be static as well. |
Not sure what can be done about it short of writing our own storage. If you want a long lived iterator, you need a snapshot. The only thing that comes to mind is to attempt to get a few items and don't create a snapshot if you get all of them. E.g. if there are 10 authors in total, you just make a vec and use into_iter. If after 10 you are not at the end, you need a snapshot. But that's a bit of a rube goldberg construction that you might not want to do initially. |
pub fn snapshot_owned(&mut self) -> Result<ReadOnlyTables> { | ||
// make sure the current transaction is committed | ||
self.flush()?; | ||
assert!(matches!(self.transaction, CurrentTransaction::None)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we are already in a read transaction? (We don't use that frequently, but maybe we should).
Then flush is a noop and this assertion will fail as far as I can see.
E.g. you want multiple iterators without writing something.
Ah, never mind. Flush takes the transaction. But we should maybe not use flush here but just ensure that we are in read mode...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should be able to just open another read txn if you are already in read mode, and then wrap that in a ReadonlyTables...
I fully get the whole snapshot thing, unfortunate as it is. But what I don't get is why you need all that stuff with the single threaded executor. |
We have to yield from the iterator-to-channel loop if the channel is full. If we didn't, the actor can be deadlocked from the user-facing client API. On main the following hangs at the first let mut stream = node.docs.list().await?;
while let Some((id, _)) = stream.try_next().await.unwrap() {
let _doc = node.docs.open(id).await?;
} With the PR and the single threaded executor this works fine because forwarding the |
Ah, now I get it. This was a thread before, no async at all. But now we need the ability to yield, hence an async executor. But we still want things to be single threaded for redb, hence a local async executor? |
Description
Fixes #2345
The iroh-docs actor loop can easily be deadlocked from the client side: If you call any RPC method that returns a stream, and the stream is longer than what the RPC layer buffers, and you call and await any other docs method while consuming the stream, the docs actor will deadlock.
(It will only happen though if the stream is longer than the capacity of the intermediate channel that goes from the actor to the RPC layer, which is why this does not always happen)
This is the case for all methods that return iterators. The solution is twofold:
To be able to spawn the iterators onto a task, they have to be
'static
. Which they can be - but only when operating on snapshots.So this PR fixes the potential for deadlock. It has the downside, however, that whenever calling a docs client function that returns an iterator, the current write transaction will be committed first, which has a perfomance penalty. However this is preferable to deadlocks, IMO.
Breaking Changes
Notes & open questions
This will need tests and likely documentation of the perfomance implications.
Change checklist