Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sequence generates duplicate values #1280

Closed
ghost opened this issue Mar 27, 2020 · 6 comments · Fixed by #1281
Closed

Sequence generates duplicate values #1280

ghost opened this issue Mar 27, 2020 · 6 comments · Fixed by #1281
Labels
area/api Issues related to current API limitations. kind/bug Something is broken. priority/P2 Somehow important but would not block a release. status/accepted We accept to investigate or work on it.

Comments

@ghost
Copy link

ghost commented Mar 27, 2020

What version of Go are you using (go version)?

$ go version
go version go1.14.1 windows/amd64

What version of Badger are you using?

e029e93

Does this issue reproduce with the latest master?

yes

What are the hardware specifications of the machine (RAM, OS, Disk)?

mem:32g
os:windows 10
disk:Samsung SSD 970 EVO Plus

What did you do?

package main

import (
	"fmt"
	"log"
	"os"

	"github.com/dgraph-io/badger"
)

func main() {
	opts := badger.DefaultOptions("./tmp")
	os.RemoveAll(opts.Dir)
	os.MkdirAll(opts.Dir, 0700)
	db, err := badger.Open(opts)
	if err != nil {
		log.Fatal(err)
	}
	defer db.Close()

	key := []byte("key")
	seq1, err := db.GetSequence(key, 2)
	if err != nil {
		panic(err)
	}

	seq2, err := db.GetSequence(key, 2)
	if err != nil {
		panic(err)
	}
	num, err := seq2.Next()
	if err != nil {
		panic(err)
	}

	fmt.Printf("num:%v\r\n", num)

	err = seq2.Release()
	if err != nil {
		panic(err)
	}
	err = seq1.Release()
	if err != nil {
		panic(err)
	}

	seq3, err := db.GetSequence(key, 2)
	if err != nil {
		panic(err)
	}

	for i := 0; i < 5; i++ {
		num, err := seq3.Next()
		if err != nil {
			panic(err)
		}

		fmt.Printf("num:%v\r\n", num)
	}

	seq3.Release()
}

What did you expect to see?

Unique numbers

What did you see instead?

num:2
num:0
num:1
num:2
num:3
num:4

@jarifibrahim
Copy link
Contributor

Hey @GameXG, I made a small modification to you code and it works as expected.

        key := []byte("key")
	seq1, err := db.GetSequence(key, 2)
	if err != nil {
		panic(err)
	}

	seq2, err := db.GetSequence(key, 2)
	if err != nil {
		panic(err)
	}
	num, err := seq2.Next()
	if err != nil {
		panic(err)
	}

	fmt.Printf("num:%v\r\n", num)
      
       // Release seq1 before releasing seq2. 
	err = seq1.Release()
	if err != nil {
		panic(err)
	}

	err = seq2.Release()
	if err != nil {
		panic(err)
	}

	seq3, err := db.GetSequence(key, 2)
	if err != nil {
		panic(err)
	}

	for i := 0; i < 5; i++ {
		num, err := seq3.Next()
		if err != nil {
			panic(err)
		}

		fmt.Printf("num:%v\r\n", num)
	}

	seq3.Release()

and here's the output

num:2
num:3
num:4
num:5
num:6
num:7

It is important that the sequences are released in the order they're fetched. It should be okay if you release only the last sequence object.
Here's what seq.Release() does

badger/db.go

Lines 1232 to 1245 in 4b539b9

func (seq *Sequence) Release() error {
seq.Lock()
defer seq.Unlock()
err := seq.db.Update(func(txn *Txn) error {
var buf [8]byte
binary.BigEndian.PutUint64(buf[:], seq.next)
return txn.SetEntry(NewEntry(seq.key, buf[:]))
})
if err != nil {
return err
}
seq.leased = seq.next
return nil
}

So the last sequence object on which release is called will be stored in the DB and so the next call to seq.Next() will return the following numbers.

@ghost
Copy link
Author

ghost commented Mar 27, 2020

Hi,
Thank you for your reply.

There is no control over the release order when multithreading.
I feel the documentation needs to add warnings or fix this.

@jarifibrahim
Copy link
Contributor

I agree. This issue should be fixed by #1281.

Thank you for fixing it @GameXG

@jarifibrahim jarifibrahim reopened this Mar 27, 2020
@ghost
Copy link
Author

ghost commented Mar 28, 2020

hi, does this question involve #601?
In some cases, the auto-incrementing id will be used as the key. Duplicate keys will overwrite old entries by mistake.

@ghost
Copy link
Author

ghost commented Mar 29, 2020

simulate http concurrent upload data . Add dataList1, dataList2, dataList3 to the database.
dataList2 is completely missing.

package main

import (
	"encoding/binary"
	"encoding/json"
	"fmt"
	"log"
	"os"
	"sync"
	"time"

	"github.com/dgraph-io/badger"
)

type Data struct {
	V string
	// ...
}

func main() {
	opts := badger.DefaultOptions("./tmp/badger")
	_ = os.RemoveAll(opts.Dir)
	_ = os.MkdirAll(opts.Dir, 0700)
	db, err := badger.Open(opts)
	if err != nil {
		log.Fatal(err)
	}
	defer db.Close()

	////////////////////////////////////////////////////
	// simulate http concurrent upload data
	// Add dataList1, dataList2, dataList3 to the database
	////////////////////////////////////////////////////

	dataList1 := make([]*Data, 5)
	dataList2 := make([]*Data, 5)
	dataList3 := make([]*Data, 20)

	for k, _ := range dataList1 {
		dataList1[k] = &Data{V: fmt.Sprintf("dataList1-%v", k)}
	}
	for k, _ := range dataList2 {
		dataList2[k] = &Data{V: fmt.Sprintf("dataList2-%v", k)}
	}
	for k, _ := range dataList3 {
		dataList3[k] = &Data{V: fmt.Sprintf("dataList3-%v", k)}
	}

	wg := sync.WaitGroup{}
	wg.Add(2)

	go func() {
		defer wg.Done()

		// Increased latency makes it easier to reproduce
		err := addDataList(db, dataList1, 100*time.Millisecond)
		if err != nil {
			panic(err)
		}
	}()

	go func() {
		defer wg.Done()

		// Increased latency makes it easier to reproduce
		time.Sleep(10 * time.Millisecond)

		err := addDataList(db, dataList2, 0)
		if err != nil {
			panic(err)
		}
	}()

	wg.Wait()

	err = addDataList(db, dataList3, 0)
	if err != nil {
		panic(err)
	}

	// Output dataList1, dataList2, dataList3
	printDataList(dataList1)
	printDataList(dataList2)
	printDataList(dataList3)

	// Output database data
	err = db.View(func(txn *badger.Txn) error {
		opts := badger.DefaultIteratorOptions
		opts.PrefetchSize = 10
		it := txn.NewIterator(opts)
		defer it.Close()
		for it.Rewind(); it.Valid(); it.Next() {
			item := it.Item()
			//k := item.Key()
			err := item.Value(func(v []byte) error {
				fmt.Printf("%s\n", v)
				return nil
			})
			if err != nil {
				return err
			}
		}
		return nil
	})
	if err != nil {
		panic(err)
	}
}

// http Request
// add data in bulk
func addDataList(db *badger.DB, dataList []*Data, delay time.Duration) error {
	seq, err := db.GetSequence([]byte("dataId"), 10)
	if err != nil {
		return err
	}
	defer seq.Release()

	err = db.Update(func(txn *badger.Txn) error {
		for _, v := range dataList {
			id, err := seq.Next()
			if err != nil {
				return err
			}
			// Use incrementing id as primary key
			key := append([]byte("data-"), uint64ToBytes(id)...)

			value, err := json.Marshal(v)
			if err != nil {
				return err
			}
			err = txn.Set(key, value)
			if err != nil {
				return err
			}
		}

		return nil
	})

	// Increased latency makes it easier to reproduce
	time.Sleep(delay)

	if err != nil {
		return err
	}

	return nil
}

func uint64ToBytes(i uint64) []byte {
	var buf [8]byte
	binary.BigEndian.PutUint64(buf[:], i)
	return buf[:]
}

func bytesToUint64(b []byte) uint64 {
	return binary.BigEndian.Uint64(b)
}

func printDataList(dataList []*Data) {
	for _, v := range dataList {
		fmt.Printf("v:%v\r\n", v.V)
	}
}

@jarifibrahim
Copy link
Contributor

Hey @GameXG, since the sequences were released in a different order, different values were stored in badger. This doesn't mean badger lost any data. All the data that was inserted is stored in badger and can be retrieved.

This isn't a data loss issue but more like an incorrect API use issue (or missing documentation issue). Thank you for looking into the issue and sending a PR for it. We appreciate the effort. :)

@jarifibrahim jarifibrahim added area/api Issues related to current API limitations. kind/bug Something is broken. priority/P2 Somehow important but would not block a release. status/accepted We accept to investigate or work on it. labels Apr 6, 2020
jarifibrahim pushed a commit that referenced this issue Apr 8, 2020
NamanJain8 pushed a commit that referenced this issue Sep 8, 2020
Fix issue #1280

(cherry picked from commit fa94030)
NamanJain8 added a commit that referenced this issue Sep 9, 2020
Fix issue #1280

(cherry picked from commit fa94030)

Co-authored-by: GameXG <[email protected]>
manishrjain pushed a commit to outcaste-io/outserv that referenced this issue Jul 6, 2022
mYmNeo pushed a commit to mYmNeo/badger that referenced this issue Jan 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/api Issues related to current API limitations. kind/bug Something is broken. priority/P2 Somehow important but would not block a release. status/accepted We accept to investigate or work on it.
Development

Successfully merging a pull request may close this issue.

1 participant