Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accounts checkpoint is too slow and big #2499

Closed
aeyakovenko opened this issue Jan 20, 2019 · 4 comments
Closed

Accounts checkpoint is too slow and big #2499

aeyakovenko opened this issue Jan 20, 2019 · 4 comments

Comments

@aeyakovenko
Copy link
Member

aeyakovenko commented Jan 20, 2019

Problem

Overlayed accounts, see #2289, is going to be slow. Get requests require a check in every checkpoint before failing. For testnet-perf, each checkpoint is going to have a full copy of the accounts in memory, which is going to get really huge, so persistent storage might be necessary.

This proposes a single data structure for all the accounts and forks without a need to do the overlay merge/fork that #2289 is doing.

Proposed Solution

A data structure that allows for this mode of operation:

  • Concurrent single thread append with many concurrent readers.
  • Exclusive resize or truncate from the start.

Lets call it AppendVec. The underlying memory can be memory mapped to a file if T is serializable. Multiple AppendVecs with a shared index will allow for concurrent commits without blocking reads, which will sequentially write to memory, ssd or disk, and should be as fast as the hardware allow for. The only required in memory data structure with a write lock is the index, which should be fast to update.

To garbage collect, data can be re-appended to defrag and truncated from the start.

The proposed AccountDB would allow for

  • multiple readers
  • multiple writers
  • persistent backed memory
//just an offset into the AccountDB::storage vector
type AppendVecId = usize;

AccountDB {
   //index
   //for each Pubkey, the account for a specific fork is in a specific AppendVec at a specific index
   index: RwLock<HashMap<Pubkey,  HashMap<Fork, (AppendVecId, usize)>>,
   //storage list
   storage: RwLock<Vec<Arc<AppendVec<Account>>>>
}
  • Concurrent transaction processing thread that is expected to do commits will acquire a vector via
    AccountDB::acquire_append_storage() -> Arc<AppendVec<Account>>
  • each append will return an index, these can be concurrent with reads
{
  // commit the accounts while allowing for reads
  // only 1 thread can append at a time, so the `append_vec` is like an append lock
  // it should be held to commit all the accounts at once
  let appender = append_vec.append().unwrap();
  let index = appender.append(account)
  (append_vec.id, index)
}
// write to the index (update all the accounts with 1 write lock, not held concurrently with append)
db.index.write().unwrap().entry(pubkey).insert(fork, appnd_vec.id, 
  • reads require 3 steps
fn get_account(&self, pubkey: Account, current_fork: Fork) -> Account {
  let (_,appendvec, index) = { 
     let vec = self.index.read().unwrap().get(pubkey);
     // find most recent fork that is an ancestor of `current_fork`
     // ancestor check should be fast, single hashtable lookup
     let (_,appendvec, index) = ...
   };
   //separate lock can then concurrently read
   self.storage().read()[appendvec][index]
}
  • To bootstrap the index from a persistent store of AppendVec's, the entries should also include a "commit counter". A single global atomic that tracks the number of commits to the entire data store.
    So the latest commit for each fork entry would be indexed.

This might be a speedier way to write to the ledger too.

tag: @garious @sambley @rob-solana @carllin @sakridge

@sakridge sakridge changed the title Accounts checkpoint is to slow and big Accounts checkpoint is too slow and big Jan 21, 2019
@aeyakovenko
Copy link
Member Author

@aeyakovenko
Copy link
Member Author

aeyakovenko commented Jan 27, 2019

@sambley @sakridge

So serde serialize/deserialize are not all that fast. Ideally we can use something like this

The VM would need a writable entry, so we could clone it to a new location or just copy it locally into a structure that has a Vec for userdata.

@jackcmay is CoW in the VM easy to implement?

@sakridge
Copy link
Member

I experimented a bit with sakridge@aea2639

I used std::ptr::write & std::ptr::read which seems pretty fast. Writing a 256MB vector, I see ~1GByte/s to write and about the same for random read performance on sagan.

@sakridge
Copy link
Member

sakridge commented May 6, 2019

I think this can be considered done.. @aeyakovenko do you agree?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants