Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data loss while reading from DB #1126

Closed
ForCraft2013 opened this issue Nov 19, 2019 · 8 comments
Closed

Data loss while reading from DB #1126

ForCraft2013 opened this issue Nov 19, 2019 · 8 comments
Assignees
Labels
bounty/dataloss These issues are related to the Data Loss Bounty Program kind/bug Something is broken. platform/windows Issues specific to Windows priority/P0 Critical issue that requires immediate attention. status/confirmed The issue has been triaged by still not reproduced.

Comments

@ForCraft2013
Copy link

Hello. I have an issue. Badger loses data while reading data when app suddenly stops, via Ctrl+C (for example). And this happenes when I read data just through db.View (read only mode).

Go: 1.13.4
Badger: 2.0.0
Windows 10 build 1903

After stop, vlog file becomes too big (2 147 483 646 bytes aka 2 GB). And if i run app again, Badger gives an error
During db.vlog.open: Value log truncate required to run DB. This might result in data loss
And finally, If I run Badger with WithTruncate(true) option, data loses. Vlog file size becomes 20 bytes.

I want to use Badger in my app, but 'cause of this problem i can't. Any error/stop while reading can cause lossing data.

The problem can be reproduced with this code (Ctrl+C while program prints keys and values into console):

read.go

func main() {
        db, err := badger.Open(badger.DefaultOptions("tmp/badger").WithSyncWrites(true))
	if err != nil {
		log.Panic(err)
	}
	defer db.Close()

	err := db.View(func(txn *badger.Txn) error {
                opts := badger.DefaultIteratorOptions
                opts.PrefetchSize = 10
		it := txn.NewIterator(opts)
		defer it.Close()

		for it.Rewind(); it.Valid(); it.Next() {
			item := it.Item()
			k := item.Key()
			err := item.Value(func(v []byte) error {
				fmt.Printf("key=%d, value=%s\n", k, v)
				return nil
			})
                        if err != nil {
				return err
			}
		}
		return nil
	})
        if err != nil {
		log.Panic(err)
	}
}

write.go

func main() {
        db, err := badger.Open(badger.DefaultOptions("tmp/badger").WithSyncWrites(true))
	if err != nil {
		log.Panic(err)
	}
	defer db.Close()

	seq, err := db.GetSequence([]byte("0"), 1)
	if err != nil {
		log.Panic(err)
	}
	defer seq.Release()

        file, err := os.Open("some_big_data_file.txt")
	if err != nil {
		return err
	}
	defer file.Close()

	scanner := bufio.NewScanner(file)
	err := db.Update(func(txn *badger.Txn) error {
		for scanner.Scan() {
			line := []byte(strings.TrimSpace(scanner.Text()))
			if len(line) > 0 {
				id, err := seq.Next()
				if err != nil {
					return err
				}
				key := make([]byte, 8)
				binary.BigEndian.PutUint64(key, id)
				err = txn.Set(key, line)
				if err != nil {
					return err
				}
			}
		}
		return scanner.Err()
	})
        if err != nil {
		return err
	}
}
@shekarm
Copy link

shekarm commented Nov 20, 2019

ForCraft2013,

Thank you for reporting the issue. I will see if one of our engineers can take a look at it.

Shekar

@jarifibrahim jarifibrahim added kind/bug Something is broken. platform/windows Issues specific to Windows priority/P0 Critical issue that requires immediate attention. status/confirmed The issue has been triaged by still not reproduced. labels Nov 21, 2019
@jarifibrahim jarifibrahim self-assigned this Nov 21, 2019
@jarifibrahim
Copy link
Contributor

Hey @ForCraft2013

After stop, vlog file becomes too big (2 147 483 646 bytes aka 2 GB). And if i run app again, Badger gives an error
During db.vlog.open: Value log truncate required to run DB. This might result in data loss
And finally, If I run Badger with WithTruncate(true) option, data loses. Vlog file size becomes 20 bytes.

Badger uses vlog files to store data and vlog files are memory-mapped. On windows, we cannot map a file more than its size, so we increase the size of value log file (2 GB in this case). While closing the file we truncate it back to its original size. This happens only on windows.

You were seeing data loss because you were inserting all entries in a single transaction. A transaction is saved when you commit it. Since you were inserting all entries in a single transaction, any crash would mean no data will be saved. I made a small change to your test and I don't see any data loss. Can you try the following test and confirm it, please?

func TestDataLoss(t *testing.T) {
	readData := false
	opt := DefaultOptions("./badger-data").WithSyncWrites(true)
	if readData {
		opt.Truncate = true
	}
	db, err := Open(opt)
	require.NoError(t, err)
	defer db.Close()
	if readData {
		read(db)
	} else {
		write(db)
	}
}
// Write will insert 3 entries
func write(db *DB) {
	for i := 0; i < 2000; i++ {
               // It is important that we create different transactions for each request.
		err := db.Update(func(txn *Txn) error {
			key := []byte(fmt.Sprintf("%d", i))
			fmt.Printf("inserting k: %d\n", i)
			err := txn.Set(key, []byte("barValue"))
			if err != nil {
				return err
			}
			return nil
		})
		if err != nil {
			panic(err)
		}
		// Crash DB after inserting 3 entries
		if i == 2 {
			fmt.Println("Poof! DB crashed!")
			os.Exit(0) // Use os.Exit instead of panic.
		}
	}
}
func read(db *DB) {
	err := db.View(func(txn *Txn) error {
		opts := DefaultIteratorOptions
		it := txn.NewIterator(opts)
		defer it.Close()
		for it.Rewind(); it.Valid(); it.Next() {
			item := it.Item()
			k := item.Key()
			err := item.Value(func(v []byte) error {
				fmt.Printf("key=%s, value=%s\n", k, v)
				return nil
			})
			if err != nil {
				return err
			}
		}
		return nil
	})
	if err != nil {
		log.Panic(err)
	}
}

Here's the output of the test


# First run with read set to false.
$ go test -run TestDataLoss -v
=== RUN   TestDataLoss
badger 2019/11/22 09:20:51 INFO: All 0 tables opened in 0s
inserting k: 0
inserting k: 1
inserting k: 2
Poof! DB crashed!
ok      github.com/dgraph-io/badger/v2  0.268s

# Second run with read set to true.
$ go test -run TestDataLoss -v
=== RUN   TestDataLoss
badger 2019/11/22 09:21:06 INFO: All 0 tables opened in 0s
badger 2019/11/22 09:21:06 INFO: Replaying file id: 0 at offset: 0
badger 2019/11/22 09:21:06 INFO: Replay took: 962.6µs
badger 2019/11/22 09:21:06 DEBUG: Value log discard stats empty
key=0, value=barValue
key=1, value=barValue
key=2, value=barValue
badger 2019/11/22 09:21:06 DEBUG: Storing value log head: {Fid:0 Len:0 Offset:185}
badger 2019/11/22 09:21:06 INFO: Got compaction priority: {level:0 score:1.73 dropPrefix:[]}
badger 2019/11/22 09:21:06 INFO: Running for level: 0
badger 2019/11/22 09:21:06 DEBUG: LOG Compact. Added 4 keys. Skipped 0 keys. Iteration took: 0s
badger 2019/11/22 09:21:06 DEBUG: Discard stats: map[]
badger 2019/11/22 09:21:06 INFO: LOG Compact 0->1, del 1 tables, add 1 tables, took 1.9953ms
badger 2019/11/22 09:21:06 INFO: Compaction for level: 0 DONE
badger 2019/11/22 09:21:06 INFO: Force compaction on level 0 done
--- PASS: TestDataLoss (0.02s)
PASS
ok      github.com/dgraph-io/badger/v2  0.278s

@ForCraft2013
Copy link
Author

arifibrahim, hello. i've tried your code and it's working fine. Even when I removed "crash db on third iteration" and increased number of inserts to 10000.
But when I modified my code to "one insert - one transaction", i still get an error. Maybe the problem in something different? Key-value encoding for example. It's wierd that your code is working fine but my code is not. Сonsidering that code base is the same.

@jarifibrahim
Copy link
Contributor

But when I modified my code to "one insert - one transaction", i still get an error.

@ForCraft2013 What kind of error?

Maybe the problem in something different? Key-value encoding for example. It's wierd that your code is working fine but my code is not. Сonsidering that code base is the same.

It's hard to tell what might be going on without looking at the code. Can you share your code?

@ForCraft2013
Copy link
Author

ForCraft2013 commented Nov 22, 2019

@jarifibrahim, i've found the cause of the problem! If value length is more than 31 bytes then error occurs.
UPD: Сhecked if key is more than 31 bytes and it's working fine. It seems that problem exists only for values.

In your example badger gives the error (without truncate options):

badger 2019/11/22 16:28:58 INFO: All 1 tables opened in 1ms
badger 2019/11/22 16:28:58 INFO: Replaying file id: 0 at offset: 847804
2019/11/22 16:28:58 During db.vlog.open: Value log truncate required to run DB. This might result in data loss
panic: During db.vlog.open: Value log truncate required to run DB. This might result in data loss

With truncate option:

badger 2019/11/22 16:30:10 INFO: All 1 tables opened in 1ms
badger 2019/11/22 16:30:10 INFO: Replaying file id: 0 at offset: 847804
badger 2019/11/22 16:30:10 INFO: Replay took: 2.0028ms
badger 2019/11/22 16:30:10 DEBUG: Value log discard stats empty
2019/11/22 16:30:10 Invalid value pointer offset: 20 greater than current offset: 20
badger 2019/11/22 16:30:10 INFO: Got compaction priority: {level:0 score:1.73 dropPrefix:[]}
panic: Invalid value pointer offset: 20 greater than current offset: 20

I've just a bit modified your code to have time to stop the program while it's printing keys and values into console

func TestDataLoss(t *testing.T) {
	readData := false
	opt := DefaultOptions("./badger-data").WithSyncWrites(true)
	if readData {
		opt.Truncate = true
	}
	db, err := Open(opt)
	require.NoError(t, err)
	defer db.Close()
	if readData {
		read(db)
	} else {
		write(db)
	}
}
// Write will insert 10000 entries
func write(db *DB) {
	for i := 0; i < 10000; i++ {
               // It is important that we create different transactions for each request.
		err := db.Update(func(txn *Txn) error {
			key := []byte(fmt.Sprintf("%d", i))
			fmt.Printf("inserting k: %d\n", i)
                        //32 bytes length and now it's not working
			err := txn.Set(key, []byte("barValuebarValuebarValuebarValue"))
			if err != nil {
				return err
			}
			return nil
		})
		if err != nil {
			panic(err)
		}
		// Crash DB after inserting 3 entries
		//if i == 2 {
		//	fmt.Println("Poof! DB crashed!")
		//	os.Exit(0) // Use os.Exit instead of panic.
		//}
	}
}
func read(db *DB) {
	err := db.View(func(txn *Txn) error {
		opts := DefaultIteratorOptions
		it := txn.NewIterator(opts)
		defer it.Close()
                //Ctrl+C to stop the program while it's running
		for it.Rewind(); it.Valid(); it.Next() {
			item := it.Item()
			k := item.Key()
			err := item.Value(func(v []byte) error {
				fmt.Printf("key=%s, value=%s\n", k, v)
				return nil
			})
			if err != nil {
				return err
			}
		}
		return nil
	})
	if err != nil {
		log.Panic(err)
	}
}

@danielmai
Copy link
Contributor

@ForCraft2013 Thanks for reporting and reproducing this data loss issue on Windows. Please email me (the email in my GitHub profile) and we can discuss the next steps for your data-loss bounty reward.

@danielmai danielmai added the bounty/dataloss These issues are related to the Data Loss Bounty Program label Nov 26, 2019
@ForCraft2013
Copy link
Author

ForCraft2013 commented Nov 27, 2019

@danielmai, I've sent you an email. And I made my real email public in my GitHub profile so you can be sure that i'm writing to you. I'm really glad that the problem will be solved soon.

jarifibrahim pushed a commit that referenced this issue Dec 6, 2019
Windows doesn't allow memory mapping a file to a size greater than the
file's actual size. To circumvent this, we increase the file size by
truncating it.
https://github.com/dgraph-io/badger/blob/f5b63211d7f3e2f5f8b698893313b2a54e4df7de/y/mmap_windows.go#L41-L48

When badger would re-open, we try to replay this "truncated" file.
Since this truncated file consists of all zeros, the replay would
return the last offset as `zero` and then we would truncate the
original file to size `zero`.

The replay function would return `zero` as the last valid offset
which was wrong. The last valid offset is start offset plus the
forward movements of the file offset.
So instead of
https://github.com/dgraph-io/badger/blob/f5b63211d7f3e2f5f8b698893313b2a54e4df7de/value.go#L433
```go
var validEndOff uint32 // notice we're starting from zero, not the start point.
```
we should be doing
```go
var validEndOff uint32 = offset
```
Fixes - #1126
@jarifibrahim
Copy link
Contributor

This has been fixed in badger via 969a8e8 .

Thank you for helping with the issue @ForCraft2013 🎉

jarifibrahim pushed a commit that referenced this issue Mar 12, 2020
Windows doesn't allow memory mapping a file to a size greater than the
file's actual size. To circumvent this, we increase the file size by
truncating it.
https://github.com/dgraph-io/badger/blob/f5b63211d7f3e2f5f8b698893313b2a54e4df7de/y/mmap_windows.go#L41-L48

When badger would re-open, we try to replay this "truncated" file.
Since this truncated file consists of all zeros, the replay would
return the last offset as `zero` and then we would truncate the
original file to size `zero`.

The replay function would return `zero` as the last valid offset
which was wrong. The last valid offset is start offset plus the
forward movements of the file offset.
So instead of
https://github.com/dgraph-io/badger/blob/f5b63211d7f3e2f5f8b698893313b2a54e4df7de/value.go#L433
```go
var validEndOff uint32 // notice we're starting from zero, not the start point.
```
we should be doing
```go
var validEndOff uint32 = offset
```
Fixes - #1126

(cherry picked from commit 969a8e8)
manishrjain pushed a commit to outcaste-io/outserv that referenced this issue Jul 6, 2022
Windows doesn't allow memory mapping a file to a size greater than the
file's actual size. To circumvent this, we increase the file size by
truncating it.
https://github.com/dgraph-io/badger/blob/4b94bddd476fd8adca3d90341163bb4faa6d4ebe/y/mmap_windows.go#L41-L48

When badger would re-open, we try to replay this "truncated" file.
Since this truncated file consists of all zeros, the replay would
return the last offset as `zero` and then we would truncate the
original file to size `zero`.

The replay function would return `zero` as the last valid offset
which was wrong. The last valid offset is start offset plus the
forward movements of the file offset.
So instead of
https://github.com/dgraph-io/badger/blob/4b94bddd476fd8adca3d90341163bb4faa6d4ebe/value.go#L433
```go
var validEndOff uint32 // notice we're starting from zero, not the start point.
```
we should be doing
```go
var validEndOff uint32 = offset
```
Fixes - dgraph-io/badger#1126
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bounty/dataloss These issues are related to the Data Loss Bounty Program kind/bug Something is broken. platform/windows Issues specific to Windows priority/P0 Critical issue that requires immediate attention. status/confirmed The issue has been triaged by still not reproduced.
Development

No branches or pull requests

4 participants