Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: Insert on Duplicate Transaction #1289

Closed
swdee opened this issue Apr 6, 2020 · 6 comments · Fixed by #1328
Closed

Question: Insert on Duplicate Transaction #1289

swdee opened this issue Apr 6, 2020 · 6 comments · Fixed by #1328
Assignees
Labels
kind/bug Something is broken. priority/P1 Serious issue that requires eventual attention (can wait a bit) status/accepted We accept to investigate or work on it.

Comments

@swdee
Copy link

swdee commented Apr 6, 2020

Go version: 1.13.9
Badger version: 2.0.3

Having used RDBMS for many years I am trying to implement Insert on Duplicate functionality. My first implementation was to create a Badger transaction then check if the key exists and if it doesn't insert the key/value. If the key already exists then the transaction is aborted and a duplicate error is returned.

The code for this program;

package main

import (
	"fmt"
	"github.com/dgraph-io/badger/v2"
	"github.com/dgraph-io/badger/v2/options"
	"log"
	"sync"
)

type Repo struct {
	DB *badger.DB
	sync.Mutex
}

type entry struct {
	key int
	val string
}

const (
	duplicate    = true
	notDuplicate = false
)

func main() {

	opts := badger.DefaultOptions("").
		WithInMemory(true).
		WithCompression(options.Snappy)

	db, err := badger.Open(opts)

	if err != nil {
		log.Fatal("Error connecting to DB", err)
	}
	defer db.Close()

	repo := &Repo{DB: db}

	wg := sync.WaitGroup{}

	total := 16
	wg.Add(total)

	for i := 0; i < total; i++ {
		go func(gi int) {
			defer wg.Done()

			has, err := repo.insert(22, "cat")

			if err != nil {
				log.Fatal("insert error:", err)
			}

			if !has {
				fmt.Println("gi=", gi, "NOT DUPLICATE")
			} else {
				fmt.Println("gi=", gi, "was duplicate")
			}
		}(i)
	}

	wg.Wait()
	fmt.Println("done")
}

func (r *Repo) insert(key int, val string) (bool, error) {

	//r.Lock()
	//defer r.Unlock()

	tx := r.DB.NewTransaction(true)
	defer tx.Discard()

	// check if key exists before inserting
	entries, err := r.get(tx, key)

	if err != nil {
		return notDuplicate, err
	}

	if len(entries) > 0 {
		return duplicate, nil
	}

	// no key exists so insert entry
	err = tx.Set(primaryKey(key), []byte(val))

	if err != nil {
		return notDuplicate, err
	}

	return notDuplicate, tx.Commit()
}

func (r *Repo) get(tx *badger.Txn, key int) ([]entry, error) {

	it := tx.NewIterator(badger.DefaultIteratorOptions)
	defer it.Close()

	data := make([]entry, 0)

	prefix := primaryKey(key)

	for it.Seek(prefix); it.ValidForPrefix(prefix); it.Next() {
		item := it.Item()

		val, err := item.ValueCopy(nil)

		if err != nil {
			return nil, err
		}

		data = append(data, entry{key, string(val)})
	}

	return data, nil
}

func primaryKey(key int) []byte {
	return []byte(fmt.Sprintf("store:%d", key))
}

When running this program multiple times, occasionally we get instances where it reports successful insertion multiple times, instead of the expected once.

$ go run dbinsert-collision.go 
gi= 0 NOT DUPLICATE
gi= 15 NOT DUPLICATE
gi= 4 was duplicate
gi= 13 was duplicate
gi= 14 was duplicate
gi= 2 was duplicate
gi= 10 was duplicate
gi= 11 was duplicate
gi= 5 was duplicate
gi= 3 was duplicate
gi= 12 was duplicate
gi= 8 was duplicate
gi= 9 was duplicate
gi= 1 was duplicate
gi= 7 was duplicate
gi= 6 was duplicate
done
badger 2020/04/06 13:49:15 DEBUG: Storing value log head: {Fid:0 Len:0 Offset:0}
badger 2020/04/06 13:49:15 INFO: Got compaction priority: {level:0 score:1.73 dropPrefix:[]}
badger 2020/04/06 13:49:15 INFO: Running for level: 0
badger 2020/04/06 13:49:15 DEBUG: LOG Compact. Added 2 keys. Skipped 1 keys. Iteration took: 2.543024ms
badger 2020/04/06 13:49:15 DEBUG: Discard stats: map[]
badger 2020/04/06 13:49:15 INFO: LOG Compact 0->1, del 1 tables, add 1 tables, took 3.336583ms
badger 2020/04/06 13:49:15 INFO: Compaction for level: 0 DONE
badger 2020/04/06 13:49:15 INFO: Force compaction on level 0 done

$ go run dbinsert-collision.go 
gi= 1 NOT DUPLICATE
gi= 8 was duplicate
gi= 0 was duplicate
gi= 7 was duplicate
gi= 6 was duplicate
gi= 15 was duplicate
gi= 10 was duplicate
gi= 2 was duplicate
gi= 11 was duplicate
gi= 12 was duplicate
gi= 13 was duplicate
gi= 14 was duplicate
gi= 3 was duplicate
gi= 5 was duplicate
gi= 4 was duplicate
gi= 9 was duplicate
done
badger 2020/04/06 13:49:18 DEBUG: Storing value log head: {Fid:0 Len:0 Offset:0}
badger 2020/04/06 13:49:18 INFO: Got compaction priority: {level:0 score:1.73 dropPrefix:[]}
badger 2020/04/06 13:49:18 INFO: Running for level: 0
badger 2020/04/06 13:49:18 DEBUG: LOG Compact. Added 2 keys. Skipped 0 keys. Iteration took: 46.107076ms
badger 2020/04/06 13:49:18 DEBUG: Discard stats: map[]
badger 2020/04/06 13:49:18 INFO: LOG Compact 0->1, del 1 tables, add 1 tables, took 47.327832ms
badger 2020/04/06 13:49:18 INFO: Compaction for level: 0 DONE
badger 2020/04/06 13:49:18 INFO: Force compaction on level 0 done

$ go run dbinsert-collision.go 
gi= 15 NOT DUPLICATE
gi= 0 NOT DUPLICATE
gi= 8 was duplicate
gi= 1 was duplicate
gi= 14 was duplicate
gi= 3 was duplicate
gi= 6 was duplicate
gi= 5 was duplicate
gi= 9 was duplicate
gi= 7 was duplicate
gi= 12 was duplicate
gi= 13 was duplicate
gi= 10 was duplicate
gi= 11 was duplicate
gi= 4 was duplicate
gi= 2 was duplicate
done
badger 2020/04/06 13:49:20 DEBUG: Storing value log head: {Fid:0 Len:0 Offset:0}
badger 2020/04/06 13:49:20 INFO: Got compaction priority: {level:0 score:1.73 dropPrefix:[]}
badger 2020/04/06 13:49:20 INFO: Running for level: 0
badger 2020/04/06 13:49:20 DEBUG: LOG Compact. Added 2 keys. Skipped 1 keys. Iteration took: 2.167897ms
badger 2020/04/06 13:49:20 DEBUG: Discard stats: map[]
badger 2020/04/06 13:49:20 INFO: LOG Compact 0->1, del 1 tables, add 1 tables, took 3.182679ms
badger 2020/04/06 13:49:20 INFO: Compaction for level: 0 DONE
badger 2020/04/06 13:49:20 INFO: Force compaction on level 0 done

If we uncomment the two mutex lock lines in the insert() function, this puts a global lock on the Repo which ensures only one entry is inserted. However why does a Badger Transaction not perform as I am expecting with the above use case?

@jarifibrahim jarifibrahim added the kind/question Something requiring a response label Apr 6, 2020
@jarifibrahim
Copy link
Contributor

Hey @swdee, thank you for raising the issue. This definitely looks odd to me. I'll investigate this further and figure out what's going on.

@jarifibrahim jarifibrahim added the status/accepted We accept to investigate or work on it. label Apr 6, 2020
@swdee
Copy link
Author

swdee commented Apr 13, 2020

@jarifibrahim Have you had a chance to look into this yet? Have you been able to reproduce the issue on your system?

@jarifibrahim
Copy link
Contributor

@swdee It looks like a bug in the iterator to me. I replaced the iterator in your code with DB.Get() and it works fine.

func (r *Repo) insert(key int, val string) (bool, error) {

	//r.Lock()
	//defer r.Unlock()

	tx := r.DB.NewTransaction(true)
	defer tx.Discard()

	e, err := tx.Get(primaryKey(key))
	if err == nil && e != nil {
		return duplicate, nil
	}

	// check if key exists before inserting
	//entries, err := r.get(tx, key)
	//if err != nil {
	//	panic(err) // This should never happen
	//}

	//if len(entries) > 0 {
	//	return duplicate, nil
	//}

	fmt.Println("Key not found. Trying to insert")
	// no key exists so insert entry
	err = tx.Set(primaryKey(key), []byte(val))

	if err != nil {
		panic(err) // This should never happen
	}
	return notDuplicate, tx.Commit()
}

This code returns Transaction Conflict which is as expected.

I'll spend some more time and try to figure out why iterator behaves differently.

@swdee
Copy link
Author

swdee commented Apr 14, 2020

Good to know, I have tested out DB.Get() and confirm the expected ErrConflict. I look forward to the bug fix when available. Thanks.

@swdee
Copy link
Author

swdee commented Apr 14, 2020

I believe this issue meets your Bug Bounty criteria #601

@jarifibrahim jarifibrahim added kind/bug Something is broken. priority/P1 Serious issue that requires eventual attention (can wait a bit) and removed kind/question Something requiring a response labels May 14, 2020
@jarifibrahim jarifibrahim self-assigned this May 14, 2020
jarifibrahim pushed a commit that referenced this issue Jun 1, 2020
The SSI guarantees provided by badger are respected when 
multiple concurrent transactions try to update the same key at 
once using the txn.Get and txn.Set API
```
// txn.Get
// if not found; txn.Set and Commit
```
but the same guarantees fail when an `iterator` is used instead 
of the `txn.Get` API
```
// itr.Seek(key)
// if not itr.Valid(); txn.Set and Commit
```
If the code above is executed via multiple concurrent transactions,
we might overwrite the first update.

The test `TestConflict` added in this PR fails on master but works fine
on this PR.

Fixes #1289
@jarifibrahim
Copy link
Contributor

@swdee The issue has been fixed in master. Please try it out and let me know if it fixes your issue.

jarifibrahim pushed a commit that referenced this issue Oct 2, 2020
The SSI guarantees provided by badger are respected when 
multiple concurrent transactions try to update the same key at 
once using the txn.Get and txn.Set API
```
// txn.Get
// if not found; txn.Set and Commit
```
but the same guarantees fail when an `iterator` is used instead 
of the `txn.Get` API
```
// itr.Seek(key)
// if not itr.Valid(); txn.Set and Commit
```
If the code above is executed via multiple concurrent transactions,
we might overwrite the first update.

The test `TestConflict` added in this PR fails on master but works fine
on this PR.

Fixes #1289
manishrjain pushed a commit to outcaste-io/outserv that referenced this issue Jul 6, 2022
The SSI guarantees provided by badger are respected when 
multiple concurrent transactions try to update the same key at 
once using the txn.Get and txn.Set API
```
// txn.Get
// if not found; txn.Set and Commit
```
but the same guarantees fail when an `iterator` is used instead 
of the `txn.Get` API
```
// itr.Seek(key)
// if not itr.Valid(); txn.Set and Commit
```
If the code above is executed via multiple concurrent transactions,
we might overwrite the first update.

The test `TestConflict` added in this PR fails on master but works fine
on this PR.

Fixes dgraph-io/badger#1289
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something is broken. priority/P1 Serious issue that requires eventual attention (can wait a bit) status/accepted We accept to investigate or work on it.
Development

Successfully merging a pull request may close this issue.

2 participants