Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Seek iterator with Reverse doesn't work #436

Closed
brk0v opened this issue Mar 9, 2018 · 15 comments
Closed

Seek iterator with Reverse doesn't work #436

brk0v opened this issue Mar 9, 2018 · 15 comments
Labels
kind/question Something requiring a response

Comments

@brk0v
Copy link

brk0v commented Mar 9, 2018

Code:

package main

import (
	"fmt"
	"io/ioutil"
	"log"
	"os"

	"github.com/dgraph-io/badger"
)

func main() {
	dir, err := ioutil.TempDir("", "badger")
	if err != nil {
		log.Fatal(err)
	}
	defer os.RemoveAll(dir)

	opts := badger.DefaultOptions
	opts.Dir = dir
	opts.ValueDir = dir

	db, err := badger.Open(opts)
	if err != nil {
		log.Fatal(err)
	}
	defer db.Close()

	bkey := func(i int) []byte {
		return []byte(fmt.Sprintf("%09d", i))
	}
	bval := func(i int) []byte {
		return []byte(fmt.Sprintf("%025d", i))
	}

	txn := db.NewTransaction(true)

	// Fill in 1000 items
	n := 1000
	for i := 0; i < n; i++ {
		err := txn.Set(bkey(i), bval(i))
		if err != nil {
			log.Fatal(err)
		}
	}

	err = txn.Commit(nil)
	if err != nil {
		log.Fatal(err)
	}

	opt := badger.DefaultIteratorOptions
	opt.PrefetchSize = 10
	opt.Reverse = true

	// Iterate over 1000 items
	var count int
	err = db.View(func(txn *badger.Txn) error {
		it := txn.NewIterator(opt)
		for it.Seek([]byte("0")); it.ValidForPrefix([]byte("0")); it.Next() {
			count++
		}
		return nil
	})
	if err != nil {
		log.Fatal(err)
	}
	fmt.Printf("Counted %d elements", count)
}

This code returns 0 elements in reverse mode and 1000 elements with reverse = false

@janardhan1993 janardhan1993 self-assigned this Mar 12, 2018
@janardhan1993
Copy link

@brk0v That's the expected behaviour. Seek searches for a key greater than or equal to the specifed key. Say you have the following keys "00" ,"01", "02". When you seek for "01" in reverse mode the iteration would start at "00".
Similarly in your case since you are seeking for "0" it will go past all the values and you won't find anything.

@janardhan1993 janardhan1993 added the kind/question Something requiring a response label Mar 12, 2018
@brk0v
Copy link
Author

brk0v commented Mar 12, 2018

Ok. Got it.
My case is to emulate some kind of buckets inside badger db. I was adding a prefix to keys. This prefix is a bucket. But with current iterator logic I can't find the last item in my "bucket".

@manishrjain
Copy link
Contributor

You still can. You just need to choose the seek key wisely. Following from @janardhan1993's example, if you have [00, 01, 02], and you want to start from 01, then you can use 01~ (with tilde character). That way, the seek would stop just before 01, and it the first item would be 01.

@rebootcode
Copy link

rebootcode commented Jun 15, 2018

He wanted to get the last item in the bucket, not the first, @manishrjain - I think, you misunderstood his statement.

Getting the reverse latest data from badgerdb is still Pain IA.

@manishrjain
Copy link
Contributor

It's all about the choice of seek key. If you want the first data from reverse, you can just call Rewind after setting Reverse option.

@rebootcode
Copy link

There is no way to seek reverse and rewind based on prefix.

@manishrjain
Copy link
Contributor

You can set reverse and then Seek to prefix + 0xFF byte. That'd bring you to prefix.

@AlexMackowiak
Copy link

AlexMackowiak commented Sep 2, 2021

@manishrjain I realize this comment is going on 3 years old, but it's subtly incorrect, and reverse iterating over a prefix with badger is still very difficult to get right.

The method suggested in this issue: to find the last key with prefix [0x1, 0x2], set reverse=true and then Seek to [0x1, 0x2, 0xFF]
But this doesn't quite solve the problem when data exists with both the prefix and the 0xFF byte, as in the following example DB:

0x1 0x2 0x3
0x1 0x2 0x4
0x1 0x2 0xFF <----- reverse-Seek([0x1, 0x2, 0xFF])
0x1 0x2 0xFF 0x1
0x1 0x2 0xFF 0x2
0x1 0x3
0x1 0x3 0x1

I believe the solution that accounts for all cases is much more involved.

  1. You need to compute the first key lexicographically greater than [prefix, 0xFF, 0xFF, ..., 0xFF], call this increment(prefix)
    a. increment([0x1, 0x2]) = [0x1, 0x3]
    b. increment([0x1, 0xFF, 0xFF]) = [0x2]
    c. increment([0xFF, 0xFF, 0xFF]) = nil, will need to call it.Rewind() instead
    i. ^ it.Rewind() does NOT work with a prefix AND reverse=True, will need to append one more 0xFF byte than the key length
    ii. So one special case when all bytes are 0xFF: increment([0xFF, 0xFF, 0xFF]) = [0xFF, 0xFF, 0xFF, 0xFF]
  2. Then reverse-Seek on increment(prefix)
  3. Finally, adjust by calling it.Next() if you arrive at a valid key that doesn't match the prefix. In this case, you have overshot the last key with the prefix by exactly one.

DB without exact match on incremented key:

0x1 0x2 0x3
0x1 0x2 0x4
0x1 0x2 0xFF 
0x1 0x2 0xFF 0x1
0x1 0x2 0xFF 0x2 <--- reverse-Seek([0x1, 0x3]), it.Valid() == true, arrive at the correct final key for the prefix [0x1, 0x2]
0x1 0x3 0x1

DB with exact match on incremented key:

0x1 0x2 0x3
0x1 0x2 0x4
0x1 0x2 0xFF 
0x1 0x2 0xFF 0x1
0x1 0x2 0xFF 0x2 <--- after it.Next(), arrive at the correct final key for the prefix [0x1, 0x2]
0x1 0x3 <----- reverse-Seek([0x1, 0x3]), it.Valid() == false, it.Item() != nil
0x1 0x3 0x1

@msackman
Copy link

@AlexMackowiak I don't think what you have will work correctly for case 1c. I've tried ... it doesn't play ball. If your prefix is 0xFF and you want to iterate in reverse, without prior knowledge of the key length, I think it's impossible. It's almost like there's no choice but to reserve part of your key space for "impossible" values so that you can seek to them.
Therefore, IMO, the API / semantics for reverse iteration is quite bad, and needs overhauling.

@AlexMackowiak
Copy link

AlexMackowiak commented Nov 30, 2021

@msackman Yeah you're actually right, I stumbled across this myself in some tests about a week later and I guess I forgot to follow up here. it.Rewind() does not work with a prefix and reverse=true how I expected it should, and I even filed an issue on the forum: https://discuss.dgraph.io/t/iterator-rewind-invalid-with-reverse-true-and-prefix-option-set/15518 but as far as I can tell this is the expected behavior?

I believe the actual case 1c that I ended up writing in my code was just to append as many 0xFF values as (key length +1), and just hope that case doesn't come up much if ever in practice (which is a reasonable assumption for my application).

@msackman
Copy link

@AlexMackowiak Yep. I think in conclusion it's actually impossible to seek to the last item in a database without prior knowledge of the key length. Personally, I'm switching back to BoltDB as that seems to have a much better designed API IMO.

@PeterMcKinnis
Copy link

PeterMcKinnis commented Mar 19, 2022

Just wanted to post correct code for reverse iteration.

func incrementPrefix(prefix []byte) []byte {
	result := make([]byte, len(prefix))
	copy(result, prefix)
	var len = len(prefix)
	for len > 0 {
		if result[len-1] == 0xFF {
			len -= 1
		} else {
			result[len-1] += 1
			break
		}
	}
	return result[0:len]
}

// Seeks to the last key that is valid for the given prefix.  For 
// correct results iterator must be setup to iterate iterate in the
// reverse direction and must not be limited to a certain prefix
func SeekLast(it *badger.Iterator, prefix []byte) {
	i := incrementPrefix(prefix);
	it.Seek(i);
	if (it.Valid() && bytes.Equal(i, it.Item().Key())) {
		it.Next();
	}
}

Example Usage

func example(db *badger.DB, prefix []byte) error {
	return db.View(func(txn *badger.Txn) error {
		opts := badger.DefaultIteratorOptions
		opts.Reverse = true
		var it = txn.NewIterator(opts)
		defer it.Close()
		for SeekLast(it, prefix); it.ValidForPrefix(prefix); it.Next() {
			// Do stuff with item...
		}
		return nil
	})
}

Test

func Test_seekForReverseIterate(t *testing.T) {
	type args struct {
		prefix []byte
	}
	tests := []struct {
		name string
		args args
		want []byte
	}{
		{"Standard Case", args{[]byte{0x05, 0x45, 0x77}}, []byte{0x05, 0x45, 0x78}},
		{"Some 0xFF", args{[]byte{0x05, 0xFF, 0xFF}}, []byte{0x06}},
		{"All 0XFF", args{[]byte{0xFF, 0xFF, 0xFF}}, []byte{}},
		{"Empty", args{[]byte{}}, []byte{}},
	}
	for _, tt := range tests {
		t.Run(tt.name, func(t *testing.T) {
			if got := incrementPrefix(tt.args.prefix); !reflect.DeepEqual(got, tt.want) {
				t.Errorf("seekForReverseIterate() = %v, want %v", got, tt.want)
			}
		})
	}
}


func getKeys(db *badger.DB, prefix []byte) ([][]byte)  {
	result := [][]byte{};
	err := db.View(func(txn *badger.Txn) error {
		opts := badger.DefaultIteratorOptions
		opts.Reverse = true
		var it = txn.NewIterator(opts)
		defer it.Close()
		for SeekLast(it, prefix); it.ValidForPrefix(prefix); it.Next() {
			result = append(result, it.Item().KeyCopy(nil));
		}
		return nil
	})
	if (err != nil) {
		log.Fatal(err);
	}
	return result;
}

func Test_SeekLast(t *testing.T) {

	// construct test database in memory
	db, err := badger.Open(badger.DefaultOptions("").WithInMemory(true));
	if err != nil {
		log.Fatal(err)
	}
	defer db.Close();

	keys := [][]byte{
		{0xFF, 0xFF, 0xFF, 0xFF, 0xFF},
		{0xFF, 0x1, 0xFF},
		{0xFF, 0x1},
		{0xFF},
		{0x2},
		{0x1, 0xFF, 0xFF, 0xFF, 0xFF},
		{0x1, 0x1, 0xFF},
		{0x1, 0x1},
		{0x1},
	}


	err = db.Update(func(txn *badger.Txn) error {
		for _, key := range keys {
			err := txn.Set(key, []byte{});
			if (err != nil) {
				log.Fatal(err.Error());
			}
		}
		return nil;
	});
	if (err != nil) {
		log.Fatal(err);
	}

	// Function to get all keys
	empty := [][]byte{};

	// Do tests
	type args struct {
		prefix []byte
	}
	tests := []struct {
		name string
		args args
		want [][]byte
	}{
		{"Prefix [0x01]", args{[]byte{0x01}}, keys[5:]},
		{"Prefix []", args{[]byte{}}, keys},
		{"Prefix nil", args{nil}, keys},
		{"Prefix [0x00]", args{[]byte{0x00}}, empty},
		{"Prefix [0x01 0x01]", args{[]byte{0x01, 0x01}}, keys[6:8]},
		{"Prefix [0x02]", args{[]byte{0x02}}, keys[4:5]},
		{"Prefix [0xFF]", args{[]byte{0xFF}}, keys[:4]},
		{"Prefix [0xFF 0xFF]", args{[]byte{0xFF, 0xFF}}, keys[:1]},
		{"Prefix [0xFF 0xFF 0xFF 0xFF 0xFF 0xFF]", args{[]byte{0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF}}, empty},
	}
	for _, tt := range tests {
		t.Run(tt.name, func(t *testing.T) {
			if got := getKeys(db, tt.args.prefix); !reflect.DeepEqual(got, tt.want) {
				t.Errorf("getKeys(%v) = %v, want %v", tt.args, got, tt.want)
			}
		})
	}
}

FWIW, I believe Iterator.Seek should behave like SeekLast by default for reverse direction iterators. At least there is a simple work around using the above.

Edit - 3/20/2022

+Updated implementation and test coverage of SeekLast to correctly handle the case where incrementPrefix returns a key present in the store.
+Updated the comment on SeekLast to indicate that it will not work for iterators setup to only iterate over a specific prefix.

@AlexMackowiak
Copy link

AlexMackowiak commented Mar 20, 2022

@PeterMcKinnis I believe this solution is currently missing one case and loses some efficiency over optimal reverse prefix scanning, great idea though we should definitely post some code for a correct implementation!

  1. When the result of incrementPrefix() is itself a key in your database, the resulting iterator will not be valid for the prefix.
    a. This is the "DB with exact match on incremented key" case from my original post, and requires an it.Next() call to fix.

  2. You did legitimately find a workaround to the correctness issue when the prefix is entirely 0xFF!
    a. This workaround is to ditch the badger Prefix IteratorOption entirely and check the prefix match at each key yourself
    i. (Adding opts.Prefix = prefix to Test_SeekLast causes two test failures)
    b. There is one downside to this though, finding the first prefix match in BadgerDB is already decently slow
    and not specifying the Prefix IteratorOption will slow it down even further

The code I ended up writing to account for everything in the reverse prefix case is the following:

options := badger.DefaultIteratorOptions
options.Reverse = true
options.Prefix = rangePrefixKey
it := tx.NewIterator(options)
...
incRangePrefix := incrementPrefix(rangePrefixKey)
if len(incRangePrefix) > 0 {
    it.Seek(incRangePrefix) // Point to the first key <= the range prefix
    if it.Item() != nil && bytes.Equal(it.Item().Key(), incRangePrefix) {
        // Exact match on incremented range prefix, need to move back one key for the last key matching the prefix
        it.Next()
    }
} else {
    // Entire range prefix was just [0xFF, 0xFF, 0xFF, ..., 0xFF], use the last key in the database
    // There's currently a bug with it.Rewind() when in reverse and with a prefix, it does not correctly find the largest key
    // https://discuss.dgraph.io/t/iterator-rewind-invalid-with-reverse-true-and-prefix-option-set/15518
    it.Seek(bytes.Repeat([]byte{0xFF}, len(rangePrefixKey)+1))
}

EDIT: Even this is sometimes invalid for the case where the prefix is all 0xFF. Our goal here is to point the iterator to the last key in the database, workarounds to it.Rewind() being broken are to either know the length of the longest key in the database or make a new iterator without the prefix option set and then it.Rewind() will correctly go to the last key in the database.

@PeterMcKinnis
Copy link

PeterMcKinnis commented Mar 20, 2022

@AlexMackowiak

I updated SeekLast above to address the bug you mentioned. Good catch!

I implemented your solution but, it also fails the case where incrementPrefix is a valid key. Your implementation also works correctly except in noted edge case. It will also be faster.

@AlexMackowiak
Copy link

@PeterMcKinnis

I replaced your SeekLast with my implementation as follows and I don't see any test failures pertaining to that increment case

func SeekLast(it *badger.Iterator, prefix []byte) {
	incRangePrefix := incrementPrefix(prefix)
	if len(incRangePrefix) > 0 {
		it.Seek(incRangePrefix) // Point to the first key <= the range prefix
		if it.Item() != nil && bytes.Equal(it.Item().Key(), incRangePrefix) {
			// Exact match on incremented range prefix, need to move back one key for the last key matching the prefix
			it.Next()
		}
	} else {
		// Entire range prefix was just [0xFF, 0xFF, 0xFF, ..., 0xFF], use the last key in the database
		// There's currently a bug with it.Rewind() when in reverse and with a prefix, it does not correctly find the largest key
		// https://discuss.dgraph.io/t/iterator-rewind-invalid-with-reverse-true-and-prefix-option-set/15518
                it.Seek(bytes.Repeat([]byte{0xFF}, len(rangePrefixKey)+1))
	}
}

I also have extensive test cases on this particular case, although they were written as part of a much larger, proprietary system that I wouldn't be able to share here.

I did not however have test coverage of the following case!

  1. When the prefix is all 0xFF
  2. but another key exists which matches the 0xFF ... 0xFF prefix but is longer than the prefix

My code was incorrect in this case, and to be truly correct you would need to know the length of the longest key in the database, as a workaround to it.Rewind() being unusable when reverse=true and prefix is set. Or alternatively make a new iterator without prefix set, so it's going to get pretty gross either way. I'll update my code posted with a note about this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/question Something requiring a response
Development

No branches or pull requests

7 participants