Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid collecting all RecordStore records into a Vec in the behaviour #3021

Closed

Conversation

rklaehn
Copy link
Contributor

@rklaehn rklaehn commented Oct 13, 2022

Context:

I have recently joined https://n0.computer/ and am working on https://github.com/n0-computer/iroh .

One of the first issues with iroh I have discovered was that it won't allow adding large files because the default size of the kad RecordStore is too small. We just increased the size of the mem store for now. But I am looking into what it would take to provide a lot of records.

One thing that is going to cause problems is that currently all records are materialized into a Vec<ProviderRecord> in the behaviour. I think that needs to be changed to be able to provide a large number of records. So this PR removes the Cow that is not really being used (all places that consume the Cow immediately call into_owned() on it) and makes the iterator standalone so it can be stored in the behaviour. A nice benefit of this is that the lifetime parameter can also be removed.

Currently the way I obtain a standalone iterator is to just materialize into a Vec inside the MemoryStore impl, so I have just shifted the problem. This obviously would have to be changed in the next step. E.g. in my draft persistent RecordStore I could create a snapshot of the database and pull records out of that snapshot one by one to avoid having a giant Vec<ProviderRecord>.

@rklaehn rklaehn changed the title Record store detached iter Avoid collecting all RecordStore records into a Vec in the behaviour Oct 13, 2022
@rklaehn rklaehn force-pushed the rklaehn/record-store-detached-iter branch from e834d6b to 6711981 Compare October 13, 2022 10:00
@dignifiedquire
Copy link
Member

@mxinden ping, improving this area is pretty blocking for us, would be great to get some feedback

Copy link
Member

@mxinden mxinden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👋

One thing that is going to cause problems

Have you been able to reproduce this in benchmarks?

So this PR removes the Cow that is not really being used (all places that consume the Cow immediately call into_owned() on it) and makes the iterator standalone so it can be stored in the behaviour.

Apart from the RecordStore::records method, right?

so I have just shifted the problem.

Can we introduce this change with the future pull request that actually does the improvement then?

E.g. in my draft persistent RecordStore I could create a snapshot of the database and pull records out of that snapshot one by one to avoid having a giant Vec.

In case this is an in-memory database, there would be no need for a snapshot, given that the NetworkBehaviour event loop is single threaded, correct?

@mxinden ping, improving this area is pretty blocking for us, would be great to get some feedback

Happy to help on improving the interface, but I am having difficulties understanding (a) what the current performance issue is (missing benchmarks) and (b) how these changes would improve the performance issues. See questions above.

All that said, happy to see more interest in improving the RecordStore `trait!

@@ -64,36 +66,33 @@ pub enum Error {
/// content. Just like a regular record, a provider record is distributed
/// to the closest nodes to the key.
///
pub trait RecordStore<'a> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't 'a have been on the individual methods instead of the entire trait definition?

std::mem::swap(&mut skipped, &mut self.skipped);
let record_ttl = self.record_ttl;
let local_id = self.local_id;
let records = Box::new(store.records().filter_map(move |r| {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

store.records now returns an owned Iter. With the filter_map right after we might be cloning many items in vain. Isn't that an issue?

@mergify
Copy link
Contributor

mergify bot commented Mar 22, 2023

This pull request has merge conflicts. Could you please resolve them @rklaehn? 🙏

@thomaseizinger
Copy link
Contributor

Closing this as stale. There is an open discussion about re-designing this interface here: #3035

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants