by prefix iterating #303

benoitc · 2019-11-20T08:21:30Z

I'm looking at replacing rocksdb in some cases and I wonder howw prefix iterating is actually optimised (ie if i have one binary key or better a tuple/list, how to iterate from a part of it). Where I should look to optimise it if neeeded?

martinsumner · 2019-11-20T20:00:48Z

Keys are specified for dialyzer as: binary()|string()|{binary(), binary()}

leveled/src/leveled_codec.erl

Lines 59 to 62 in 22e7328

    
           -type key() ::  
        
                   binary()|string()|{binary(), binary()}. 
        
                   % Keys SHOULD be binary() 
        
                   % string() support is a legacy of old tests

They could in theory though, be any erlang term. The ordering of keys in the ledger is erlang term order - so if you fold over keys it will be in that order.

There is no special optimisation for prefix at present. There are two optimisations of keyfold, both based on objet metadata - an optimisation for last_modified_date and an optimisation for merkle_tree segment of a key.

Any fold over keys is a fold organised by the penciller. The penciller has about 30K recently added keys in memory (just in a gb_tree like structure), and this has no optimisations. Any less recent key is stored in the LSM tree of SST files, and it is within the SST files that optimisations can be made. It is assumed that the folds in the in-memory part are always fast enough.

There is nothing configurable for you, the developer here, but there are other optimisation examples should you wish to fork and implement your own optimisation.

The SST files are immutable, so any hints useful to optimisation don't need to worry about modifications to the state fo the file.

Hints useful to optimisation can be loaded into the state of the leveled_sst process. For example there is fetch_cache on the loop state which has a slots of recently fetched objects (keyed by hash of the LedgerKey) - so if an application keeps requesting the same key it can be returned from the cache:

leveled/src/leveled_sst.erl

Line 193 in 22e7328

fetch_cache = array:new([{size, ?CACHE_SIZE}]),

The SST file is divided into 256 slots (where slot is made up of 5 ordered blocks of keys). The most expensive operation is generally the fetch of the block from disk/page-cache and deserialisation for read. So metadata can be added to the slot to avoid reading blocks unnecessarily. For example there is a max(last_modified_date) for all the keys in the slot stored in the slot metadata:

leveled/src/leveled_sst.erl

Lines 1640 to 1648 in 22e7328

    
           case IndexModDate of  
        
               true -> 
        
                   <<B1L:32/integer,  
        
                       B2L:32/integer,  
        
                       B3L:32/integer,  
        
                       B4L:32/integer, 
        
                       B5L:32/integer, 
        
                       LMD:32/integer, 
        
                       PosBinIndex/binary>>;

This allows for queries which are interested in more recent data (e.g. which have passed a last_mod_date range into the query) to potentially skip whole slots.

Likewise the slot has an index - which uses the same key hash as used for Riak anti-entropy merkle trees - so slots can be skipped when a query is interested in only part of the merkle tree. this is the PosBinIndex.

So in summary, optimisations are currently embedded within the leveled_sst files, and are either kept in the overall file metadata or the slot metadata - so that the overhead of de-serialising whole slots can be potentially avoided. The current optimisations are targeted at the Riak use-case, and there is nothing that a user can change here without code modification. However there is a pattern to follow.

martinsumner · 2019-11-20T20:04:07Z

If you want to optimise by prefix to fold over the sore returning just the unique prefixes (i.e. ignoring all the different suffixes). Then you can follow the pattern used for list buckets:

leveled/src/leveled_runner.erl

Lines 88 to 115 in 22e7328

    
           %% @doc 
        
           %% set Max Buckets to -1 to list all buckets, otherwise will only return 
        
           %% MaxBuckets (use 1 to confirm that there exists any bucket for a given Tag) 
        
           bucket_list(SnapFun, Tag, FoldBucketsFun, InitAcc, MaxBuckets) -> 
        
               Runner =  
        
                   fun() -> 
        
                       {ok, LedgerSnapshot, _JournalSnapshot} = SnapFun(), 
        
                       BucketAcc =  
        
                           get_nextbucket(null, null,  
        
                                           Tag, LedgerSnapshot, [], {0, MaxBuckets}), 
        
                       AfterFun = 
        
                           fun() -> 
        
                               ok = leveled_penciller:pcl_close(LedgerSnapshot) 
        
                           end, 
        
                       FoldRunner = 
        
                           fun() -> 
        
                               lists:foldr(fun({B, _K}, Acc) -> FoldBucketsFun(B, Acc) end, 
        
                                               InitAcc, 
        
                                               BucketAcc) 
        
                                   % Buckets in reverse alphabetical order so foldr 
        
                           end, 
        
                       % For this fold, the fold over the store is actually completed 
        
                       % before results are passed to the FoldBucketsFun to be 
        
                       % accumulated.  Using a throw to exit the fold early will not 
        
                       % in this case save significant time. 
        
                       wrap_runner(FoldRunner, AfterFun) 
        
                   end, 
        
               {async, Runner}.

This code sample drops you in the middle of it a bit, but in essence it queries for the first bucket, and when it finds it adds a <<"0">> then queries for the next bucket.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

by prefix iterating #303

by prefix iterating #303

benoitc commented Nov 20, 2019 •

edited

Loading

martinsumner commented Nov 20, 2019

martinsumner commented Nov 20, 2019

by prefix iterating #303

by prefix iterating #303

Comments

benoitc commented Nov 20, 2019 • edited Loading

martinsumner commented Nov 20, 2019

martinsumner commented Nov 20, 2019

benoitc commented Nov 20, 2019 •

edited

Loading