Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

iroh-dns-server: very heavy disk write load #2972

Closed
PaulOlteanu opened this issue Nov 28, 2024 · 4 comments
Closed

iroh-dns-server: very heavy disk write load #2972

PaulOlteanu opened this issue Nov 28, 2024 · 4 comments

Comments

@PaulOlteanu
Copy link
Contributor

PaulOlteanu commented Nov 28, 2024

The iroh dns server seems to do a lot of writing. I have about 30,000 clients at the moment with a republish interval of 2 minutes (so that should average to 250/sec), and I'm seeing a write throughput of about 7Mb/sec from 1,500 writes/sec on the dns server disk.

I'm surprised to see this much load considering the records should be around 1KB, so I'd expect more like 0.25Mb/sec.
This seems so wrong I'm wondering if my Endpoint is somehow misconfigured?

let discovery = ConcurrentDiscovery::from_services(vec![
    Box::new(PkarrPublisher::with_options(
        self.secret_key.clone(),
        self.iroh_config.pkarr_url.clone(),
        30,
        Duration::from_secs(60 * 2),
    )),
    Box::new(DnsDiscovery::new(self.iroh_config.dns_url.clone())),
]);

let endpoint = Endpoint::builder()
    .secret_key(self.secret_key.clone())
    .alpns(vec![ALPN.to_vec()])
    .relay_mode(RelayMode::Custom(relay_map))
    .discovery(Box::new(discovery))
    .transport_config(transport_config)
    .bind()
    .await
    .unwrap();

I don't have prometheus set up to scrape the server metrics yet, but I'll try to see if there's any crazy metrics tomorrow. Update: The number of update requests to pkarr seems to line up with the expected number of clients, so I don't think the number of writes is a result of a large amount of client requests to write

@flub
Copy link
Contributor

flub commented Nov 28, 2024

My first thought is that the writes might be inflated due to the filesystem block size. But I haven't checked and am not familiar with this codebase.

@dignifiedquire
Copy link
Contributor

dignifiedquire commented Nov 28, 2024

Actual size seems to be even just half that

let key = SecretKey::generate();
let node_id = key.public();
let node_info = NodeInfo::new(
    node_id,
    Some("https://my-relay.com".parse().unwrap()),
    [
        "127.0.0.1:1245".parse().unwrap(),
        "127.0.0.1:1246".parse().unwrap(),
        "127.0.0.1:1247".parse().unwrap(),
        "127.0.0.1:1248".parse().unwrap(),
        "[::]:1241".parse().unwrap(),
        "[::]:1242".parse().unwrap(),
        "[::]:1243".parse().unwrap(),
        "[::]:1244".parse().unwrap(),
    ]
    .into_iter()
    .collect(),
);
let ttl = 1024;
let packet = node_info.to_pkarr_signed_packet(&key, ttl).unwrap();

assert_eq!(packet.as_bytes().len(), 450);

So worst case storing ~500 bytes for the value + 32 bytes for the key.


With 1500 writes per second, this should give

  • 1500 * 500 = 750000 B/s = 732 KiB/s

@dignifiedquire
Copy link
Contributor

My first thought is that the writes might be inflated due to the filesystem block size.

That is unlikely, as that wouldn't show up as actual IO ops asfaik, only that things are slow.

github-merge-queue bot pushed a commit that referenced this issue Dec 3, 2024
## Description

I found this during investigations of #2972. It turned out that both
blocking locks and blocking IO calls are being made in the dns servers
store implementation

## Breaking Changes

None

## Notes & open questions

This might be not optimal, but it is the safest way to get rid of
blocking the whole runtime for now.

## Change checklist

- [ ] Self-review.
- [ ] Documentation updates following the [style
guide](https://rust-lang.github.io/rfcs/1574-more-api-documentation-conventions.html#appendix-a-full-conventions-text),
if relevant.
- [ ] Tests if relevant.
- [ ] All breaking changes documented.
@PaulOlteanu
Copy link
Contributor Author

I believe this is fixed with #2995

@github-project-automation github-project-automation bot moved this to ✅ Done in iroh Dec 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

3 participants