Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(iroh-dns-server): remove accidental blocking from store #2985

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

dignifiedquire
Copy link
Contributor

Description

I found this during investigations of #2972. It turned out that both blocking locks and blocking IO calls are being made in the dns servers store implementation

Breaking Changes

None

Notes & open questions

This might be not optimal, but it is the safest way to get rid of blocking the whole runtime for now.

Change checklist

  • Self-review.
  • Documentation updates following the style guide, if relevant.
  • Tests if relevant.
  • All breaking changes documented.

@dignifiedquire dignifiedquire added this to the v0.29.0 milestone Dec 2, 2024
Copy link

github-actions bot commented Dec 2, 2024

Documentation for this PR has been generated and is available at: https://n0-computer.github.io/iroh/pr/2985/docs/iroh/

Last updated: 2024-12-02T10:26:25Z

Copy link

github-actions bot commented Dec 2, 2024

Netsim report & logs for this PR have been generated and is available at: LOGS
This report will remain available for 3 days.

Last updated for commit: d80ba1a

@rklaehn
Copy link
Contributor

rklaehn commented Dec 2, 2024

Does this lead to a measurable improvement?

I would say the big performance problem is that we are creating a write transaction per op, so we are limited to a small number (~1000/s) of txns per second no matter how beefy the machine is.

If we want to really speed this up we need to do a similar design to what is done in the blob store - batch write transactions and sync in regular intervals. Even though in case of the discovery server we can be more sloppy. E.g. it losing the last second of updates in case of a crash is really no big deal since discovery information gets republished.

@@ -14,7 +14,7 @@ const SIGNED_PACKETS_TABLE: TableDefinition<&SignedPacketsKey, &[u8]> =

#[derive(Debug)]
pub struct SignedPacketStore {
db: Database,
db: Arc<Database>,
Copy link
Contributor

@matheus23 matheus23 Dec 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

redb should really implement Clone for Database.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean Clone? This is how we do it with the protocols, they don't (need to) impl Clone, and people have to wrap them in an Arc... 🤣

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah sorry, I edited my comment :D

@Arqu
Copy link
Collaborator

Arqu commented Dec 2, 2024

I really should try to set up a benchmark for this.

Copy link
Contributor

@matheus23 matheus23 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah good fixes.

I also like @rklaehn's suggestion (happy to quickly add another PR for that if needed).

Also am slightly confused by ZoneCache. Why is one cache an LRU, and the other a TTL?
We wrap them both with a mutex in the end, even though we don't need the mutex for the TTL.
And I don't think we need ZoneCache::cache to be an LruCache, do we? They're both caches, we can use TtlCache for both of them, avoiding the top-level mutex entirely.

@dignifiedquire
Copy link
Contributor Author

f we want to really speed this up we need to do a similar design to what is done in the blob store - batch write transactions and sync in regular intervals. Even though in case of the discovery server we can be more sloppy. E.g. it losing the last second of updates in case of a crash is really no big deal since discovery information gets republished.

I don't think that we should do this, until we have figured out the write amplification issues. Current write amplification is around 10x which means we might need to drop redb anyway. This is just a stopgap until we have done so

@dignifiedquire
Copy link
Contributor Author

Does this lead to a measurable improvement?

Unclear (yet) as this hasn't been deployed yet. But we have seen reports that both read and write calls are quite slow with heavy load

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: 🏗 In progress
Development

Successfully merging this pull request may close these issues.

4 participants