Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlock in iroh-docs actor loop #2345

Closed
Frando opened this issue Jun 4, 2024 · 0 comments · Fixed by #2346
Closed

Deadlock in iroh-docs actor loop #2345

Frando opened this issue Jun 4, 2024 · 0 comments · Fixed by #2346

Comments

@Frando
Copy link
Member

Frando commented Jun 4, 2024

Calling iroh.docs.list, or other docs methods that return streams, is deadlock prone at the moment: If there are more items than the RPC layer will buffer a call to any iroh.docs RPC method while consuming the stream will deadlock.

Reason:
iroh-docs internally runs in a synchronous thread loop. When the client calls list_docs or other streaming methods, this sends a message over the RPC interface into that message loop. The docs thread will get an iterator from the store and push all items into a reply channel, until it is done. However it is a synchronous thread, so it cannot do anything else until it is done.

And this may lead to the folllowing deadlock sitution: If, while consuming such a stream, you happen to call any function that goes back into the docs thread, this will - deadlock! Because the docs thread is still not done with processing the previous iterator, it won't reply to the new call, and you won't continue consuming the stream - classical deadlock.

Thanks to @dev-msp for spotting this.

See #2346 for a fix.

github-merge-queue bot pushed a commit that referenced this issue Jun 6, 2024
…2346)

## Description

Fixes #2345 

The iroh-docs actor loop can easily be deadlocked from the client side:
If you call any RPC method that returns a stream, and the stream is
longer than what the RPC layer buffers, and you call and await any other
docs method *while consuming the stream*, the docs actor will deadlock.
(It will only happen though if the stream is longer than the capacity of
the intermediate channel that goes from the actor to the RPC layer,
which is why this does not *always* happen)

This is the case for all methods that return iterators. The solution is
twofold:

* Run single-threaded executor in iroh-docs actor loop
* For actions returning iterators/streams, spawn a task on that executor
to forward the store iterator into the stream, yielding when the
receiver is not consuming fast enough

To be able to spawn the iterators onto a task, they have to be
`'static`. Which they can be - but only when operating on snapshots.

So this PR fixes the potential for deadlock. It has the downside,
however, that whenever calling a docs client function that returns an
iterator, the current write transaction will be committed first, which
has a perfomance penalty. However this is preferable to deadlocks, IMO.

## Breaking Changes

<!-- Optional, if there are any breaking changes document them,
including how to migrate older code. -->

## Notes & open questions

This will need tests and likely documentation of the perfomance
implications.

## Change checklist

- [ ] Self-review.
- [ ] Documentation updates if relevant.
- [ ] Tests if relevant.
- [ ] All breaking changes documented.
@github-project-automation github-project-automation bot moved this to ✅ Done in iroh Jun 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

1 participant