Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

by prefix iterating #303

Open
benoitc opened this issue Nov 20, 2019 · 2 comments
Open

by prefix iterating #303

benoitc opened this issue Nov 20, 2019 · 2 comments

Comments

@benoitc
Copy link

benoitc commented Nov 20, 2019

I'm looking at replacing rocksdb in some cases and I wonder howw prefix iterating is actually optimised (ie if i have one binary key or better a tuple/list, how to iterate from a part of it). Where I should look to optimise it if neeeded?

@martinsumner
Copy link
Owner

Keys are specified for dialyzer as: binary()|string()|{binary(), binary()}

-type key() ::
binary()|string()|{binary(), binary()}.
% Keys SHOULD be binary()
% string() support is a legacy of old tests

They could in theory though, be any erlang term. The ordering of keys in the ledger is erlang term order - so if you fold over keys it will be in that order.

There is no special optimisation for prefix at present. There are two optimisations of keyfold, both based on objet metadata - an optimisation for last_modified_date and an optimisation for merkle_tree segment of a key.

Any fold over keys is a fold organised by the penciller. The penciller has about 30K recently added keys in memory (just in a gb_tree like structure), and this has no optimisations. Any less recent key is stored in the LSM tree of SST files, and it is within the SST files that optimisations can be made. It is assumed that the folds in the in-memory part are always fast enough.

There is nothing configurable for you, the developer here, but there are other optimisation examples should you wish to fork and implement your own optimisation.

The SST files are immutable, so any hints useful to optimisation don't need to worry about modifications to the state fo the file.

Hints useful to optimisation can be loaded into the state of the leveled_sst process. For example there is fetch_cache on the loop state which has a slots of recently fetched objects (keyed by hash of the LedgerKey) - so if an application keeps requesting the same key it can be returned from the cache:

fetch_cache = array:new([{size, ?CACHE_SIZE}]),

The SST file is divided into 256 slots (where slot is made up of 5 ordered blocks of keys). The most expensive operation is generally the fetch of the block from disk/page-cache and deserialisation for read. So metadata can be added to the slot to avoid reading blocks unnecessarily. For example there is a max(last_modified_date) for all the keys in the slot stored in the slot metadata:

leveled/src/leveled_sst.erl

Lines 1640 to 1648 in 22e7328

case IndexModDate of
true ->
<<B1L:32/integer,
B2L:32/integer,
B3L:32/integer,
B4L:32/integer,
B5L:32/integer,
LMD:32/integer,
PosBinIndex/binary>>;

This allows for queries which are interested in more recent data (e.g. which have passed a last_mod_date range into the query) to potentially skip whole slots.

Likewise the slot has an index - which uses the same key hash as used for Riak anti-entropy merkle trees - so slots can be skipped when a query is interested in only part of the merkle tree. this is the PosBinIndex.

So in summary, optimisations are currently embedded within the leveled_sst files, and are either kept in the overall file metadata or the slot metadata - so that the overhead of de-serialising whole slots can be potentially avoided. The current optimisations are targeted at the Riak use-case, and there is nothing that a user can change here without code modification. However there is a pattern to follow.

@martinsumner
Copy link
Owner

If you want to optimise by prefix to fold over the sore returning just the unique prefixes (i.e. ignoring all the different suffixes). Then you can follow the pattern used for list buckets:

%% @doc
%% set Max Buckets to -1 to list all buckets, otherwise will only return
%% MaxBuckets (use 1 to confirm that there exists any bucket for a given Tag)
bucket_list(SnapFun, Tag, FoldBucketsFun, InitAcc, MaxBuckets) ->
Runner =
fun() ->
{ok, LedgerSnapshot, _JournalSnapshot} = SnapFun(),
BucketAcc =
get_nextbucket(null, null,
Tag, LedgerSnapshot, [], {0, MaxBuckets}),
AfterFun =
fun() ->
ok = leveled_penciller:pcl_close(LedgerSnapshot)
end,
FoldRunner =
fun() ->
lists:foldr(fun({B, _K}, Acc) -> FoldBucketsFun(B, Acc) end,
InitAcc,
BucketAcc)
% Buckets in reverse alphabetical order so foldr
end,
% For this fold, the fold over the store is actually completed
% before results are passed to the FoldBucketsFun to be
% accumulated. Using a throw to exit the fold early will not
% in this case save significant time.
wrap_runner(FoldRunner, AfterFun)
end,
{async, Runner}.

This code sample drops you in the middle of it a bit, but in essence it queries for the first bucket, and when it finds it adds a <<"0">> then queries for the next bucket.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants