Accounts checkpoint is too slow and big #2499

aeyakovenko · 2019-01-20T05:31:54Z

Problem

Overlayed accounts, see #2289, is going to be slow. Get requests require a check in every checkpoint before failing. For testnet-perf, each checkpoint is going to have a full copy of the accounts in memory, which is going to get really huge, so persistent storage might be necessary.

This proposes a single data structure for all the accounts and forks without a need to do the overlay merge/fork that #2289 is doing.

Proposed Solution

A data structure that allows for this mode of operation:

Concurrent single thread append with many concurrent readers.
Exclusive resize or truncate from the start.

Lets call it AppendVec. The underlying memory can be memory mapped to a file if T is serializable. Multiple AppendVecs with a shared index will allow for concurrent commits without blocking reads, which will sequentially write to memory, ssd or disk, and should be as fast as the hardware allow for. The only required in memory data structure with a write lock is the index, which should be fast to update.

To garbage collect, data can be re-appended to defrag and truncated from the start.

The proposed AccountDB would allow for

multiple readers
multiple writers
persistent backed memory

//just an offset into the AccountDB::storage vector
type AppendVecId = usize;

AccountDB {
   //index
   //for each Pubkey, the account for a specific fork is in a specific AppendVec at a specific index
   index: RwLock<HashMap<Pubkey,  HashMap<Fork, (AppendVecId, usize)>>,
   //storage list
   storage: RwLock<Vec<Arc<AppendVec<Account>>>>
}

Concurrent transaction processing thread that is expected to do commits will acquire a vector via
AccountDB::acquire_append_storage() -> Arc<AppendVec<Account>>
each append will return an index, these can be concurrent with reads

{
  // commit the accounts while allowing for reads
  // only 1 thread can append at a time, so the `append_vec` is like an append lock
  // it should be held to commit all the accounts at once
  let appender = append_vec.append().unwrap();
  let index = appender.append(account)
  (append_vec.id, index)
}
// write to the index (update all the accounts with 1 write lock, not held concurrently with append)
db.index.write().unwrap().entry(pubkey).insert(fork, appnd_vec.id,

reads require 3 steps

fn get_account(&self, pubkey: Account, current_fork: Fork) -> Account {
  let (_,appendvec, index) = { 
     let vec = self.index.read().unwrap().get(pubkey);
     // find most recent fork that is an ancestor of `current_fork`
     // ancestor check should be fast, single hashtable lookup
     let (_,appendvec, index) = ...
   };
   //separate lock can then concurrently read
   self.storage().read()[appendvec][index]
}

To bootstrap the index from a persistent store of AppendVec's, the entries should also include a "commit counter". A single global atomic that tracks the number of commits to the entire data store.
So the latest commit for each fork entry would be indexed.

This might be a speedier way to write to the ledger too.

tag: @garious @sambley @rob-solana @carllin @sakridge

The text was updated successfully, but these errors were encountered:

aeyakovenko · 2019-01-25T14:47:35Z

https://github.com/solana-labs/solana/projects/11

aeyakovenko · 2019-01-27T15:58:47Z

@sambley @sakridge

So serde serialize/deserialize are not all that fast. Ideally we can use something like this

https://docs.rs/memmap/0.6.2/memmap/struct.Mmap.html
and for the userdata cast it into a slice without a clone
https://doc.rust-lang.org/std/slice/fn.from_raw_parts.html
for Read Only accounts.

The VM would need a writable entry, so we could clone it to a new location or just copy it locally into a structure that has a Vec for userdata.

@jackcmay is CoW in the VM easy to implement?

sakridge · 2019-01-28T05:36:17Z

I experimented a bit with sakridge@aea2639

I used std::ptr::write & std::ptr::read which seems pretty fast. Writing a 256MB vector, I see ~1GByte/s to write and about the same for random read performance on sagan.

sakridge · 2019-05-06T15:37:14Z

I think this can be considered done.. @aeyakovenko do you agree?

sakridge changed the title ~~Accounts checkpoint is to slow and big~~ Accounts checkpoint is too slow and big Jan 21, 2019

garious mentioned this issue Jan 25, 2019

Move account data to persistent storage #2279

Merged

3 tasks

aeyakovenko mentioned this issue Jan 25, 2019

[DEPRECATED] split into smaller tasks: Need a proposal for rollback and rotation #2162

Closed

20 tasks

aeyakovenko mentioned this issue Feb 15, 2019

Add design for a persistent Account storage #2769

Closed

sakridge mentioned this issue Feb 26, 2019

Persistent account store #2952

Closed

mvines added this to the Silver Strand v0.14.0 milestone Apr 8, 2019

aeyakovenko closed this as completed May 6, 2019

This was referenced May 9, 2024

[Snyk] Upgrade @solana/web3.js from 1.31.0 to 1.91.6 Balantion2020/Balantion#2371

Open

[Snyk] Upgrade @solana/web3.js from 1.31.0 to 1.91.7 Balantion2020/Balantion#2374

Open

[Snyk] Upgrade @solana/web3.js from 1.31.0 to 1.91.7 Balantion2020/Balantion#2375

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Accounts checkpoint is too slow and big #2499

Accounts checkpoint is too slow and big #2499

aeyakovenko commented Jan 20, 2019 •

edited

Loading

aeyakovenko commented Jan 25, 2019

aeyakovenko commented Jan 27, 2019 •

edited

Loading

sakridge commented Jan 28, 2019

sakridge commented May 6, 2019

Accounts checkpoint is too slow and big #2499

Accounts checkpoint is too slow and big #2499

Comments

aeyakovenko commented Jan 20, 2019 • edited Loading

Problem

Proposed Solution

aeyakovenko commented Jan 25, 2019

aeyakovenko commented Jan 27, 2019 • edited Loading

sakridge commented Jan 28, 2019

sakridge commented May 6, 2019

aeyakovenko commented Jan 20, 2019 •

edited

Loading

aeyakovenko commented Jan 27, 2019 •

edited

Loading