-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Storing bitswap ledgers on disk #215
Comments
Im not really comfortable introducing a 20ms latency... as thats a significant portion of the time we take for a given block transfer. |
👍
👍
One of the upsides of fetching ledgers from disk is total consistency. this is really important. however, reading latency like that could be really bad. I think we should issue async flushes and otherwise read once from disk and keep in memory. Note that even if it's kept on disk, it is possible to get into inconsistent ledger states if nodes crash. (e.g. first node updating its ledger state on disk crashes immediately after updating it and before externalizing this to the peer) -- NB: a way around this is with a write-ahead log that nodes can inspect to calculate the proper state of ledgers. Not sure yet what the decision should be. In the paper, I noted the importance of ensuring the ledgers are the same. To give you a sense of the trade-offs: in long-lived connections, a tiny discrepancy could be eaten up by one of the parties, but this opens the door to gaming up to that discrepancy threshold (which in equilibrium nodes will do). (refs BitTyrant, BitThief, and PropShare). Conversely, in trusted, single datacenter settings, nodes can benefit significantly from not caring at all. This reinforces the "bitswap-as-barter" idea, meaning that bitswap protocol definition should target defining basic rules of transactions / exchanges, leaving it up to nodes' Strategies to provide further guarantees (like consistency or even network consensus). Bitswap is a big protocol and we won't get it exactly right without being able to test the different use cases. in the interest of time...For now, let's keep it in memory and async-flush an update to disk. On comparing ledgers, we can log ERROR when bitswap ledgers mismatch to see if it's a big problem early on. |
Acknowledged. Moving forward with:
Additionally, would like to get thoughts on these other points.
message Message {
repeated string wantlist = 1;
repeated bytes blocks = 2;
optional Ledger partnerLedger = 3;
}
message Partners {
repeated string peerIds = 1; // b58 encoded
} |
sgtm
sgtm |
This is still on the TODO list. |
Switched to "needs refinement" as we've talked about having a more centralized ledger for other services. This isn't something someone should jump in and help on until we've worked through this a bit. |
@Stebalien is there a place where the discussion about centralized ledgers is happening? Also, it seems to me that given now bitswap is a separate service, that could be used by other clients other than IPFS, I think writing to disc should ideally be optional -- so that datastore is not an required dependency in order to use go-bitswap. |
Not yet. (back channel chatter...)
Agreed. We usually do this by using an in-memory datastore by default. |
Ok in that case I'd suggest the next action item is open an issue for discussions of a centralized ledger. I don't have context for this so I don't feel qualified write a decent issue, but I could just create a blank one if someone else can fill in -- or someone else can volunteer to write it. |
Actually, I think I'm being too hasty here. We do want a central service but this feature is probably useful in the mean-time. Really, I'm getting ahead of myself with the central service as we don't even know what it's going to look like. However, this issue, while valid, still depends on using the ledger for something. We currently ignore it entirely. |
Include rename from: github.com/ipfs/go-libipfs => github.com/ipfs/boxo This migration was reverted: ./blocks => github.com/ipfs/go-block-format Migrated repos: - github.com/ipfs/interface-go-ipfs-core => ./coreiface - github.com/ipfs/go-pinning-service-http-client => ./pinning/remote/client - github.com/ipfs/go-path => ./path - github.com/ipfs/go-namesys => ./namesys - github.com/ipfs/go-mfs => ./mfs - github.com/ipfs/go-ipfs-provider => ./provider - github.com/ipfs/go-ipfs-pinner => ./pinning/pinner - github.com/ipfs/go-ipfs-keystore => ./keystore - github.com/ipfs/go-filestore => ./filestore - github.com/ipfs/go-ipns => ./ipns - github.com/ipfs/go-blockservice => ./blockservice - github.com/ipfs/go-ipfs-chunker => ./chunker - github.com/ipfs/go-fetcher => ./fetcher - github.com/ipfs/go-ipfs-blockstore => ./blockstore - github.com/ipfs/go-ipfs-posinfo => ./filestore/posinfo - github.com/ipfs/go-ipfs-util => ./util - github.com/ipfs/go-ipfs-ds-help => ./datastore/dshelp - github.com/ipfs/go-verifcid => ./verifcid - github.com/ipfs/go-ipfs-exchange-offline => ./exchange/offline - github.com/ipfs/go-ipfs-routing => ./routing - github.com/ipfs/go-ipfs-exchange-interface => ./exchange - github.com/ipfs/go-unixfs => ./ipld/unixfs - github.com/ipfs/go-merkledag => ./ipld/merkledag - github.com/ipld/go-car => ./ipld/car Fixes #215 Updates #202 This commit was moved from ipfs/boxo@038bdd2
Presently, ledgers are only held in memory. Let this issue track implementation of persisted ledgers.
Ledgers could be stored to datastore using a unique key:
and data could be stored as a []byte using the proto described here:
Two options
a) keep ledgers in memory (introduces book-keeping/consistency)
Can be implemented using an async actor that selects on a timer and iterates over map of in-memory ledgers, writing entries to the datastore.
On the upside, bitswap transactions are faster. Downside: data loss in the event of a crash. Slightly more complexity in the implementation.
b) fetch ledgers from disk
On the upside, very simple Get and Put operations.
Downside: more costly.
With leveldb sync disabled, pay the cost of getting the data to the operating system buffer cache. I don't have figures off the top of my head, do we pay the price of a syscall?
With sync option enabled, mean latency could be in 5-20ms range. This prices feels a bit high for a potentially-hot code path.
NB: cost may be small compared to long-distance RTTs, but may be noticeable for communication within a single datacenter.
NB2: Many bitswap operations will already require multiple datastore
Get
operations. In those cases, this constitutes a mere incremental increase.Thoughts? @whyrusleeping @jbenet
References:
leveldb sync
https://github.com/google/leveldb/blob/master/include/leveldb/options.h#L170
The text was updated successfully, but these errors were encountered: