-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RunValueLogGC crashed #1139
Comments
Hey @cedricfung , does the issue reproduce after a couple of hours? And do you have more badger logs? The logs you've posted show only the line on which the crash happened but I'd like to understand what the state of badger was when the crash happened. Logs would definitely have some more information. |
Hi, I have tried again, it crashes. Our code is https://github.com/MixinNetwork/mixin
Above is the full log |
@cedricfung are you inserting |
I have checked my code and find no `nil` inserting, only `txn.Set("key",
[]byte{})`
…On Mon, Dec 9, 2019 at 4:00 PM Ibrahim Jarif ***@***.***> wrote:
@cedricfung <https://github.com/cedricfung> are you inserting nil values?
Something like txn.Set("key", nil)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1139>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AARKANWH6S7K56XP4GYML33QXX3JBANCNFSM4JSOM2EA>
.
|
The |
Hey @cedricfung, just out of curiosity, why are you inserting empty values? What are you trying to accomplish with key-only badger? |
Hi @jarifibrahim, I want to maintain many states to indicate that we have those keys set. So they are actually some bool values. I have thought those key only writes and reads would be faster with badger. |
@cedricfung how can I run https://github.com/MixinNetwork/mixin and reproduce the issue? I just realized that we have a check in badger for empty values and the execution shouldn't have reached the line on which it crashed for you. |
Ignore other connection logs and wating for about 10 minutes, it should crash. |
@cedricfung Is there any way I can reproduce this without using your |
I don't know yet. Which entry is that? |
Sorry, I wasn't clear. My question is does the issue show up every time you run your |
It only produces the error on that snapshots data, for a fresh start it won't occur. |
Forgot to mention that the snapshots directory is converted from an old version. I upgraded badger from an 2.0-rc version to the latest 2.0 version, following the guide of upgrade from v1.6 to v2.0 |
Ah! Maybe backup is doing something mischievous. I'll look into the backup code and get back to you. |
Any updates on this? How could we enable RunValueLogGC again? |
@cedricfung Unfortunately, I haven't been able to find a test to produce this. I can, however, write a patch for you which will drop the key which seems to be crashing badger. You can run badger with that patch and GC should work fine. NOTE - You might lose data. I think you just have one key which has the wrong bits set and we'll have to drop it. Do you think this would unblock you for now? |
Which key is that? I'm sure that we have many keys with empty values, because we have code like |
Hey @cedricfung, I haven't been able to find the root cause for this issue but you can use the following diff to clean up the invalid bit set. Here's how you can use it
We created a backup that removed the (The patch also disables compression. This would reduce the time taken to diff --git a/backup.go b/backup.go
index aa03339..ebab32f 100644
--- a/backup.go
+++ b/backup.go
@@ -59,14 +59,20 @@ func (stream *Stream) Backup(w io.Writer, since uint64) (uint64, error) {
}
var valCopy []byte
+ // No need to copy value, if item is deleted or expired.
if !item.IsDeletedOrExpired() {
- // No need to copy value, if item is deleted or expired.
- var err error
- valCopy, err = item.ValueCopy(nil)
- if err != nil {
- stream.db.opt.Errorf("Key [%x, %d]. Error while fetching value [%v]\n",
- item.Key(), item.Version(), err)
- return nil, err
+ if item.meta&bitValuePointer > 0 && len(item.val) == 0 {
+ // This entry has wrong bit set. Unset the `bitValuePoiter` bit.
+ item.meta = item.meta &^ (bitValuePointer)
+ } else {
+ // This is a valid entry. Copy its value.
+ var err error
+ valCopy, err = item.ValueCopy(nil)
+ if err != nil {
+ stream.db.opt.Errorf("Key [%x, %d]. Error while fetching value [%v]\n",
+ item.Key(), item.Version(), err)
+ return nil, err
+ }
}
}
diff --git a/options.go b/options.go
index 4374fc3..0bb9699 100644
--- a/options.go
+++ b/options.go
@@ -107,6 +107,9 @@ func DefaultOptions(path string) Options {
if !y.CgoEnabled {
defaultCompression = options.Snappy
}
+
+ defaultCompression = options.None
+
return Options{
Dir: path,
ValueDir: path, I ran the same patch on the data directory provided by @cedricfung and all the invalid entries were fixed. |
Closing this issue since it is not reproducible. The invalid entries might have been added somehow in the older version of badger (I guess). Please re-open this issue if it shows up again. |
Hi @jarifibrahim will this process remove any entries? We need to keep all entries, even empty entries. |
@cedricfung No. It would not. The code above would take a backup of your current badger directory and unset the bit which was causing the issue. We're reinserting all the entries when you restore that backup. The process should not cause any data to be lost since your values are |
Thank you. I'm curious about the backup file size, my data is 47GB, and the backup is only 11GB, but the backup of v1.6 is 37GB. Is this normal? |
After I restored the data, the directory is only 20GB. And my program can't start with this new data set. |
The size reduction is expected because backup uses protobuf to store data which is supposed to reduce the disk space.
There shouldn't be such a huge size difference between v1.6.0 and v2.0.0
Are you able to open badger directory? Can you show me some logs? |
I have kept the old backup from v1.6, so I compared the size. I tried the steps again and get the same error. The database opened, but I think some value lost. The error is:
So it means the ival has zero length, it should have at least 32 bytes. |
Was this value read from badger? If badger has zero-length values, why does this code need 32 bytes? |
This value read from badger. This value should have 64 bytes before those steps. It has 64 bytes in the original dataset. |
@cedricfung Can you check the length of the |
It should not return an empty value. Before I run the process of backup and restore, the value is at least 32 bytes. After the process, the value is zero length, or maybe the entry is missing. E.g. the backup and restore steps have lost some data. The old snapshots data is about 47GB, after the process it's only 20GB. And the new data has no compression, but the old has compression. So I think there must be large quantity of entries lost. |
@cedricfung the original data would have some invalid data too. For instance, the keys which were deleted would be stored on the disk but they would be dropped after backup-restore. I had run the patch on my end on your directory and most of the keys were present in the new data. Let me double-check this. Some internal keys would be dropped and that's okay.
As you mentioned in this comment #1139 (comment), you're inserting |
I'm inserting []byte{} entries, but it is not this entry. I'm sure that this entry should have at least 32 bytes value. This exact entry is not empty. |
Hi @jarifibrahim any updates on this? |
Hey @cedricfung , sorry it took me so long to reply.
Initially, I understood that you're inserting zero-length values in badger, but I guess that's not the case. |
Hi, badger works very well. It's your modified patch that doesn't work. The
patch backup our data, and lost some data. Because I'm sure that the
original data has more than 32 bytes for that key, but the data restored
from your patch is zero length or just lost.
…On Fri, Feb 7, 2020 at 5:57 PM Ibrahim Jarif ***@***.***> wrote:
Hey @cedricfung <https://github.com/cedricfung> , sorry it took me so
long to reply.
Can you help me understand what do you mean by
I'm inserting []byte{} entries, but it is not this entry. I'm sure that this entry should have at least 32 bytes value. This exact entry is not empty.
Initially, I understood that you're inserting zero-length values in
badger, but I guess that's not the case.
Are you saying that you inserted 32 bytes long values and received
zero-length values back?
I tried to look at the code of https://github.com/MixinNetwork/mixin to
figure out what's going on but lost my way around the codebase.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1139>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AARKANRZVZRZOVXIG53R4CLRBUV7RANCNFSM4JSOM2EA>
.
|
Hey @cedricfung, I ran the following patch (I made one small change so that we don't delete any key. We'll keep even the deleted/discard ones on restore) on the In your snapshots directory
In the badger directory generated after backup-restore
I am not sure why the directory sizes defer by so much but as long as all the keys are present in the new badger directory, we should be fine. Please try out this patch and let me know how it goes diff --git a/backup.go b/backup.go
index aa03339..97b469c 100644
--- a/backup.go
+++ b/backup.go
@@ -59,14 +59,18 @@ func (stream *Stream) Backup(w io.Writer, since uint64) (uint64, error) {
}
var valCopy []byte
+ // No need to copy value, if item is deleted or expired.
if !item.IsDeletedOrExpired() {
- // No need to copy value, if item is deleted or expired.
- var err error
- valCopy, err = item.ValueCopy(nil)
- if err != nil {
- stream.db.opt.Errorf("Key [%x, %d]. Error while fetching value [%v]\n",
- item.Key(), item.Version(), err)
- return nil, err
+ if item.meta&bitValuePointer > 0 && len(item.val) == 0 {
+ item.meta = item.meta &^ (bitValuePointer)
+ } else {
+ var err error
+ valCopy, err = item.ValueCopy(nil)
+ if err != nil {
+ stream.db.opt.Errorf("Key [%x, %d]. Error while fetching value [%v]\n",
+ item.Key(), item.Version(), err)
+ return nil, err
+ }
}
}
diff --git a/badger/cmd/backup.go b/badger/cmd/backup.go
index 79fa9fd..189c296 100644
--- a/badger/cmd/backup.go
+++ b/badger/cmd/backup.go
@@ -18,6 +18,7 @@ package cmd
import (
"bufio"
+ "math"
"os"
"github.com/dgraph-io/badger/v2"
@@ -53,7 +54,8 @@ func doBackup(cmd *cobra.Command, args []string) error {
// Open DB
db, err := badger.Open(badger.DefaultOptions(sstDir).
WithValueDir(vlogDir).
- WithTruncate(truncate))
+ WithTruncate(truncate).WithNumVersionsToKeep(math.MaxUint32))
+
if err != nil {
return err
}
diff --git a/badger/cmd/restore.go b/badger/cmd/restore.go
index d5ee8f5..ee074f3 100644
--- a/badger/cmd/restore.go
+++ b/badger/cmd/restore.go
@@ -18,6 +18,7 @@ package cmd
import (
"errors"
+ "math"
"os"
"path"
@@ -65,7 +66,8 @@ func doRestore(cmd *cobra.Command, args []string) error {
}
// Open DB
- db, err := badger.Open(badger.DefaultOptions(sstDir).WithValueDir(vlogDir))
+ db, err := badger.Open(badger.DefaultOptions(sstDir).
+ WithValueDir(vlogDir).WithNumVersionsToKeep(math.MaxUint32))
if err != nil {
return err
} |
You can use this test to count the number of keys (put it anywhere in the badger directory) var dir1 string = "/home/ibrahim/mixin-data/snapshots"
func TestDumpKey(t *testing.T) {
opt := DefaultOptions(dir1)
opt.NumVersionsToKeep = math.MaxUint32
//opt.ReadOnly = true
db, err := Open(opt)
require.NoError(t, err)
txn := db.NewTransaction(false)
defer txn.Discard()
iopt := DefaultIteratorOptions
//iopt.AllVersions = true // Uncomment this if you want to see all the keys in badger.
it := txn.NewIterator(iopt)
defer it.Close()
var keys, keysWithEmptyValue int
for it.Rewind(); it.Valid(); it.Next() {
keys++
item := it.Item()
item.Value(func(val []byte) error {
if len(val) == 0 && item.meta&bitValuePointer > 0 {
keysWithEmptyValue++
}
return nil
})
}
fmt.Printf("keys: %d\n Keys with zero length value: %d\n", keys, keysWithEmptyValue)
} |
the same
|
This issue seems related to #1313. The |
How could we fix our existing data?
Thank you for these efforts.
…On Thu, Apr 30, 2020 at 7:54 PM Ibrahim Jarif ***@***.***> wrote:
This issue seems related to #1313
<#1313>. The bitValuePointer
might be incorrectly set because of backup restore.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1139 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AARKANVTR2MPS7LT7UFF5QTRPFRGZANCNFSM4JSOM2EA>
.
|
Any further steps to fix the existing data? Use the master branch to backup and restore? Any special params to use in the backup and restore? |
@jarifibrahim yes I'm using the default options. |
@cedricfung No. If you're using badger with default options, backup and restore on the |
@cedricfung Are you still seeing crashes? If not, I'd like to mark this issue as fixed and close this issue. |
After backup and restore with the master code, no crashes anymore.
…On Wed, Jun 3, 2020, 14:42 Ibrahim Jarif ***@***.***> wrote:
@cedricfung <https://github.com/cedricfung> Are you still seeing crashes?
If not, I'd like to mark this issue as fixed and close this issue.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1139 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AARKANVVJYRKLANLUHEKFGDRUXWF7ANCNFSM4JSOM2EA>
.
|
Perfect. Thank you so much @cedricfung :) |
What version of Go are you using (
go version
)?What version of Badger are you using?
v2.0.0
Does this issue reproduce with the latest master?
Never tried
What are the hardware specifications of the machine (RAM, OS, Disk)?
Linux 64 SSD
What did you do?
What did you expect to see?
Run value log gc should work
What did you see instead?
badger.go:68 db.RunValueLogGC(0.5)
The text was updated successfully, but these errors were encountered: