Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Behaviour difference between old and new merge iterator #1153

Closed
jarifibrahim opened this issue Dec 10, 2019 · 0 comments · Fixed by #1157
Closed

Behaviour difference between old and new merge iterator #1153

jarifibrahim opened this issue Dec 10, 2019 · 0 comments · Fixed by #1157
Assignees
Labels
kind/bug Something is broken. priority/P1 Serious issue that requires eventual attention (can wait a bit) status/accepted We accept to investigate or work on it.

Comments

@jarifibrahim
Copy link
Contributor

jarifibrahim commented Dec 10, 2019

Commit 73ea6e6 introduced a tree-based merge iterator. The existing merge iterator was a heap-based iterator.

The following test (taken from here)shows different behavior on the heap-based merge iterator and the tree-based iterator. The test returns different results on both the implementations.

func TestMergeDuplicates(t *testing.T) {
	it := newSimpleIterator([]string{"1", "1", "1"}, []string{"a1", "a3", "a7"}, false)
	it2 := newSimpleIterator([]string{"1", "1", "1"}, []string{"b2", "b3", "b5"}, false)
	it3 := newSimpleIterator([]string{"1"}, []string{"c1"}, false)
	it4 := newSimpleIterator([]string{"1", "1", "2"}, []string{"d1", "d7", "d9"}, false)
	t.Run("forward", func(t *testing.T) {
		expectedKeys := []string{"1", "2"}
		expectedVals := []string{"a1", "d9"}
		mergeIt := NewMergeIterator([]y.Iterator{it, it2, it3, it4}, false)
		mergeIt.Rewind()
		k, v := getAll(mergeIt)
		require.EqualValues(t, expectedKeys, k)
		require.EqualValues(t, expectedVals, v)
		closeAndCheck(t, mergeIt, 4)
	})

	t.Run("reverse", func(t *testing.T) {
		it.reversed = true
		it2.reversed = true
		it3.reversed = true
		it4.reversed = true
		expectedKeys := []string{"2", "1"}
		expectedVals := []string{"d9", "a7"}
		mergeIt := NewMergeIterator([]y.Iterator{it, it2, it3, it4}, true)
		mergeIt.Rewind()
		k, v := getAll(mergeIt)
		require.EqualValues(t, expectedKeys, k)
		require.EqualValues(t, expectedVals, v)
		closeAndCheck(t, mergeIt, 4)
	})
}

The heap-based approach skips all the keys which are same
https://github.com/dgraph-io/badger/blob/385da9100e3534a5f82b3c37e0990305f8008a3a/y/iterator.go#L226-L233
The result is

k = {1, 2,}

while the tree-based implementation does not skip the same keys.

k = {1, 1, 1, 2}
@jarifibrahim jarifibrahim added kind/bug Something is broken. status/needs-attention This issue needs more eyes on it, more investigation might be required before accepting/rejecting it labels Dec 10, 2019
@jarifibrahim jarifibrahim self-assigned this Dec 10, 2019
@jarifibrahim jarifibrahim added priority/P1 Serious issue that requires eventual attention (can wait a bit) status/accepted We accept to investigate or work on it. and removed status/needs-attention This issue needs more eyes on it, more investigation might be required before accepting/rejecting it labels Dec 12, 2019
jarifibrahim pushed a commit that referenced this issue Dec 18, 2019
Fixes: #1153

The tree-based merge iterator would not skip duplicate keys.
This commit fixes it. 
If merge iterator consists of 3 iterators
```
it1 = [1, 1, 1]
it2 = [1, 1, 1, 1, 2]
it3 = [1, 1, 1, 1, 2]
```
The result should be
```
[1, 2]
```
jarifibrahim pushed a commit that referenced this issue Mar 12, 2020
Fixes: #1153

The tree-based merge iterator would not skip duplicate keys.
This commit fixes it.
If merge iterator consists of 3 iterators
```
it1 = [1, 1, 1]
it2 = [1, 1, 1, 1, 2]
it3 = [1, 1, 1, 1, 2]
```
The result should be
```
[1, 2]
```

(cherry picked from commit 779d9a0)
manishrjain pushed a commit to outcaste-io/outserv that referenced this issue Jul 6, 2022
Fixes: hypermodeinc/badger#1153

The tree-based merge iterator would not skip duplicate keys.
This commit fixes it. 
If merge iterator consists of 3 iterators
```
it1 = [1, 1, 1]
it2 = [1, 1, 1, 1, 2]
it3 = [1, 1, 1, 1, 2]
```
The result should be
```
[1, 2]
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something is broken. priority/P1 Serious issue that requires eventual attention (can wait a bit) status/accepted We accept to investigate or work on it.
Development

Successfully merging a pull request may close this issue.

1 participant