Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

an optimization for FindNextUserEntryInternal #151

Closed
BohuTANG opened this issue May 15, 2014 · 11 comments
Closed

an optimization for FindNextUserEntryInternal #151

BohuTANG opened this issue May 15, 2014 · 11 comments

Comments

@BohuTANG
Copy link

I've seen many innovations on rocksdb based on leveldb.
But, there is a hole we can do to improve.

As in skiplist, the internal key packed as:
{sequence|type|user_key|value}

if one key in skiplist is:

{100|0x01|'key1'|'val1'} -->node0

then, we insert same key with different value:

{101|0x01|'key1'|'val2'} -->node1

So, the same key is in two nodes, when we fall in the 'hot' function 'DBIter::FindNextUserEntryInternal', we have to do as follows:

if (ParseKey(&ikey) && ikey.sequence <= sequence_) {
      if (skipping &&
          user_comparator_->Compare(ikey.user_key, saved_key_.GetKey()) <= 0) {

if we changed the node structure from:

struct node {
   Key key;
}

to

struct node {
   Key keys[];
}

We don't need to do user_comparator_->Compare.
The 'iter->next' means that i am a 'restart'&'difference' key.

The OK version is here:
https://github.com/shuttler/nessDB/blob/master/engine/skiplist.h
https://github.com/shuttler/nessDB/blob/master/engine/skiplist.c

-BohuTANG

@dhruba
Copy link
Contributor

dhruba commented May 15, 2014

I had posted this earlier here:
https://www.facebook.com/groups/rocksdb.dev/permalink/566664156765436/
and people thought that this could be a good thing to do.

If you could provide a diff (or a pull request) against the rocksdb code base, that will be a great step to get it reviewed and committed.

The optimization would make sense when a workload inserts the same key multiple times within a relatively short period of time.

@igorcanadi
Copy link
Collaborator

I just realized a problem with this approach -- DBIter is also used on top of table files, not only memtables. I don't think this is fixable.

@BohuTANG
Copy link
Author

@dhruba
yeah, that was my old&simple version to levelDB group.

@igorcanadi
the 'better' way is to reserve the first bit of key packed buffer for multi flag.

@dhruba
Copy link
Contributor

dhruba commented May 27, 2014

shuttler: your observation is good. If you can provide a patch (after incorporating a fix for the problem mentioned by Igor), that would be great.

@BohuTANG
Copy link
Author

@dhruba
Cool.
If we use an array, another problem is that since skiplist is read lock-free, we need to add a write lock on the array, or else, it's a tough job.
Any ideas?

@BohuTANG
Copy link
Author

@igorcanadi

As you mentioned:
rescrv/HyperLevelDB@f6fa561

But I guess there is no(or a bit) affects with this patch.
If your compare functions as:

    uint32_t minlen = a->size < b->size ? a->size : b->size;
    r = memcmp(a->data, b->data, minlen);
    if (r == 0)
        return (a->size - b->size);

that is fast enough, because 'memcmp'(glibc) internal is blocks compare.

@iamjinlei
Copy link

this is an interesting idea. I feel the improvement will come from a very specific workload. I'd expect majority of keys will be unique in the memtable?

@BohuTANG
Copy link
Author

@ljinfb
No matter what the unique majority is, we don't need 'user_comparator_->Compare' here,
just to go next, because we're pretty sure the next entry is an restart key.

@dhruba
Copy link
Contributor

dhruba commented Oct 27, 2014

ljinfb: are you saying that this problem is fixed and we do not need this patch?

@iamjinlei
Copy link

@dhruba no, it is not fixed. His patch (link does not seem to work, based on my guess) only applies to skiplist. If we try to remove that compare in DBIter, the same thing needs to be fixed in all memtable formats as well as SSTs. I am not sure if that is possible

@igorcanadi
Copy link
Collaborator

Closing this issue due to inactivity. @BohuTANG if you could provide a pull request and show performance improvements, we will gladly merge it into our source tree.

Little-Wallace pushed a commit to Little-Wallace/rocksdb that referenced this issue Mar 19, 2020
Summary:
Introduce `KeyManagedEncryptedEnv` which wraps around `EncryptedEnv` but provides an `KeyManager` API to enable key management per file. Also implements `AESBlockCipher` with OpenSSL.

Test Plan:
not tested yet. will update.

Signed-off-by: Yi Wu <[email protected]>
BusyJay pushed a commit to BusyJay/rocksdb that referenced this issue Jul 25, 2022
Summary:
Introduce `KeyManagedEncryptedEnv` which wraps around `EncryptedEnv` but provides an `KeyManager` API to enable key management per file. Also implements `AESBlockCipher` with OpenSSL.

Test Plan:
not tested yet. will update.

Signed-off-by: Yi Wu <[email protected]>
Signed-off-by: tabokie <[email protected]>
acelyc111 pushed a commit to acelyc111/rocksdb that referenced this issue Jul 21, 2023
Summary:
Introduce `KeyManagedEncryptedEnv` which wraps around `EncryptedEnv` but provides an `KeyManager` API to enable key management per file. Also implements `AESBlockCipher` with OpenSSL.

Test Plan:
not tested yet. will update.

Signed-off-by: Yi Wu <[email protected]>
Signed-off-by: tabokie <[email protected]>
acelyc111 pushed a commit to acelyc111/rocksdb that referenced this issue Jul 25, 2023
Summary:
Introduce `KeyManagedEncryptedEnv` which wraps around `EncryptedEnv` but provides an `KeyManager` API to enable key management per file. Also implements `AESBlockCipher` with OpenSSL.

Test Plan:
not tested yet. will update.

Signed-off-by: Yi Wu <[email protected]>
Signed-off-by: tabokie <[email protected]>
acelyc111 pushed a commit to acelyc111/rocksdb that referenced this issue Aug 1, 2023
Summary:
Introduce `KeyManagedEncryptedEnv` which wraps around `EncryptedEnv` but provides an `KeyManager` API to enable key management per file. Also implements `AESBlockCipher` with OpenSSL.

Test Plan:
not tested yet. will update.

Signed-off-by: Yi Wu <[email protected]>
Signed-off-by: tabokie <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants