-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data loss while reading from DB #1126
Comments
ForCraft2013, Thank you for reporting the issue. I will see if one of our engineers can take a look at it. Shekar |
Hey @ForCraft2013
Badger uses vlog files to store data and vlog files are You were seeing data loss because you were inserting all entries in a func TestDataLoss(t *testing.T) {
readData := false
opt := DefaultOptions("./badger-data").WithSyncWrites(true)
if readData {
opt.Truncate = true
}
db, err := Open(opt)
require.NoError(t, err)
defer db.Close()
if readData {
read(db)
} else {
write(db)
}
}
// Write will insert 3 entries
func write(db *DB) {
for i := 0; i < 2000; i++ {
// It is important that we create different transactions for each request.
err := db.Update(func(txn *Txn) error {
key := []byte(fmt.Sprintf("%d", i))
fmt.Printf("inserting k: %d\n", i)
err := txn.Set(key, []byte("barValue"))
if err != nil {
return err
}
return nil
})
if err != nil {
panic(err)
}
// Crash DB after inserting 3 entries
if i == 2 {
fmt.Println("Poof! DB crashed!")
os.Exit(0) // Use os.Exit instead of panic.
}
}
}
func read(db *DB) {
err := db.View(func(txn *Txn) error {
opts := DefaultIteratorOptions
it := txn.NewIterator(opts)
defer it.Close()
for it.Rewind(); it.Valid(); it.Next() {
item := it.Item()
k := item.Key()
err := item.Value(func(v []byte) error {
fmt.Printf("key=%s, value=%s\n", k, v)
return nil
})
if err != nil {
return err
}
}
return nil
})
if err != nil {
log.Panic(err)
}
} Here's the output of the test
|
arifibrahim, hello. i've tried your code and it's working fine. Even when I removed "crash db on third iteration" and increased number of inserts to 10000. |
@ForCraft2013 What kind of error?
It's hard to tell what might be going on without looking at the code. Can you share your code? |
@jarifibrahim, i've found the cause of the problem! If value length is more than 31 bytes then error occurs. In your example badger gives the error (without truncate options):
With truncate option:
I've just a bit modified your code to have time to stop the program while it's printing keys and values into console func TestDataLoss(t *testing.T) {
readData := false
opt := DefaultOptions("./badger-data").WithSyncWrites(true)
if readData {
opt.Truncate = true
}
db, err := Open(opt)
require.NoError(t, err)
defer db.Close()
if readData {
read(db)
} else {
write(db)
}
}
// Write will insert 10000 entries
func write(db *DB) {
for i := 0; i < 10000; i++ {
// It is important that we create different transactions for each request.
err := db.Update(func(txn *Txn) error {
key := []byte(fmt.Sprintf("%d", i))
fmt.Printf("inserting k: %d\n", i)
//32 bytes length and now it's not working
err := txn.Set(key, []byte("barValuebarValuebarValuebarValue"))
if err != nil {
return err
}
return nil
})
if err != nil {
panic(err)
}
// Crash DB after inserting 3 entries
//if i == 2 {
// fmt.Println("Poof! DB crashed!")
// os.Exit(0) // Use os.Exit instead of panic.
//}
}
}
func read(db *DB) {
err := db.View(func(txn *Txn) error {
opts := DefaultIteratorOptions
it := txn.NewIterator(opts)
defer it.Close()
//Ctrl+C to stop the program while it's running
for it.Rewind(); it.Valid(); it.Next() {
item := it.Item()
k := item.Key()
err := item.Value(func(v []byte) error {
fmt.Printf("key=%s, value=%s\n", k, v)
return nil
})
if err != nil {
return err
}
}
return nil
})
if err != nil {
log.Panic(err)
}
} |
@ForCraft2013 Thanks for reporting and reproducing this data loss issue on Windows. Please email me (the email in my GitHub profile) and we can discuss the next steps for your data-loss bounty reward. |
@danielmai, I've sent you an email. And I made my real email public in my GitHub profile so you can be sure that i'm writing to you. I'm really glad that the problem will be solved soon. |
Windows doesn't allow memory mapping a file to a size greater than the file's actual size. To circumvent this, we increase the file size by truncating it. https://github.com/dgraph-io/badger/blob/f5b63211d7f3e2f5f8b698893313b2a54e4df7de/y/mmap_windows.go#L41-L48 When badger would re-open, we try to replay this "truncated" file. Since this truncated file consists of all zeros, the replay would return the last offset as `zero` and then we would truncate the original file to size `zero`. The replay function would return `zero` as the last valid offset which was wrong. The last valid offset is start offset plus the forward movements of the file offset. So instead of https://github.com/dgraph-io/badger/blob/f5b63211d7f3e2f5f8b698893313b2a54e4df7de/value.go#L433 ```go var validEndOff uint32 // notice we're starting from zero, not the start point. ``` we should be doing ```go var validEndOff uint32 = offset ``` Fixes - #1126
This has been fixed in badger via 969a8e8 . Thank you for helping with the issue @ForCraft2013 🎉 |
Windows doesn't allow memory mapping a file to a size greater than the file's actual size. To circumvent this, we increase the file size by truncating it. https://github.com/dgraph-io/badger/blob/f5b63211d7f3e2f5f8b698893313b2a54e4df7de/y/mmap_windows.go#L41-L48 When badger would re-open, we try to replay this "truncated" file. Since this truncated file consists of all zeros, the replay would return the last offset as `zero` and then we would truncate the original file to size `zero`. The replay function would return `zero` as the last valid offset which was wrong. The last valid offset is start offset plus the forward movements of the file offset. So instead of https://github.com/dgraph-io/badger/blob/f5b63211d7f3e2f5f8b698893313b2a54e4df7de/value.go#L433 ```go var validEndOff uint32 // notice we're starting from zero, not the start point. ``` we should be doing ```go var validEndOff uint32 = offset ``` Fixes - #1126 (cherry picked from commit 969a8e8)
Windows doesn't allow memory mapping a file to a size greater than the file's actual size. To circumvent this, we increase the file size by truncating it. https://github.com/dgraph-io/badger/blob/4b94bddd476fd8adca3d90341163bb4faa6d4ebe/y/mmap_windows.go#L41-L48 When badger would re-open, we try to replay this "truncated" file. Since this truncated file consists of all zeros, the replay would return the last offset as `zero` and then we would truncate the original file to size `zero`. The replay function would return `zero` as the last valid offset which was wrong. The last valid offset is start offset plus the forward movements of the file offset. So instead of https://github.com/dgraph-io/badger/blob/4b94bddd476fd8adca3d90341163bb4faa6d4ebe/value.go#L433 ```go var validEndOff uint32 // notice we're starting from zero, not the start point. ``` we should be doing ```go var validEndOff uint32 = offset ``` Fixes - dgraph-io/badger#1126
Hello. I have an issue. Badger loses data while reading data when app suddenly stops, via Ctrl+C (for example). And this happenes when I read data just through db.View (read only mode).
Go: 1.13.4
Badger: 2.0.0
Windows 10 build 1903
After stop, vlog file becomes too big (2 147 483 646 bytes aka 2 GB). And if i run app again, Badger gives an error
During db.vlog.open: Value log truncate required to run DB. This might result in data loss
And finally, If I run Badger with WithTruncate(true) option, data loses. Vlog file size becomes 20 bytes.
I want to use Badger in my app, but 'cause of this problem i can't. Any error/stop while reading can cause lossing data.
The problem can be reproduced with this code (Ctrl+C while program prints keys and values into console):
read.go
write.go
The text was updated successfully, but these errors were encountered: