-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Corrupted large badger repo #5213
Comments
Thanks for reporting this issue, it's my main concern regarding the Badger transition, two separate issues:
|
Let me know an SSH key :) |
There's a windows related check in the badger truncation function that it's catching my attention, it rejects it if the value log has been loaded with |
Actually, if you're running the |
It did change something:
|
Yes, this is a different problem, I'll raise the corresponding issues at Badger. |
Can you try not passing the slash at the end? So, it doesn't have two slashes in the file path: Also, what version of Badger are you on? P.S. If you have more logs, it would better help understand what's happening here. |
That didn't help unfortunately, neither as a relative nor absolute path. Is there anything I could pull out of :8080 while it's still running? |
Can you expand more about the Badger version and the environment? Also, if you have access to the Badger directory, could you tar, gzip and upload it and send me a link? So, I could debug what's going on. |
It's 5 TB unfortunately. Can give you access to the host though. |
Sure. My email id is my first name at dgraph.io. Also, tell me the steps about what to do after logging in. |
It looks like this is the line which is causing the issue. For some reason, it is unable to truncate the file: https://github.com/dgraph-io/badger/blob/master/value.go#L329 |
The Badger version currently used in https://ipfs.io/ipfs/QmeAEa8FDWAmZJTL6YcM1oEndZ4MyhCr5rTsjYZQui1x1L/badger although @lgierth was using a much recent version to run the |
What filesystem is the environment using? Is it VFAT or EXT4 or something else? |
@lgierth Could you provide @manishrjain more details about the setup? |
I've also just encountered this issue on Windows with v0.4.16 on an NTFS partition. The update completed successfully (as far as I could tell) and I was able to use the repo for a while afterwards, but now I'm getting this error. The repo I lost is a lot smaller at 88GB, so I can share if it would be helpful. |
Hey @leerspace, there are different errors mentioned in this issue, are you getting the |
@schomatis sorry for not being more clear. I'm getting the error in the first post: |
Ok, this may be a consequence of many possible factors, but most possibly a crash or a hard-kill of an If you want you could try the |
@schomatis I just finished running |
So, Go's truncate function is failing:
Code:
I see that the root folder is on RAID array. I wonder if that's what's causing the issue -- this looks like a problem with either the standard file.Truncate library in Go, or a problem with the system itself.
|
have you tried doing a health check and/or repair of your raid array? |
Yeeah spot on, one (of four) disks has died without us noticing. I don't even see log lines of when it died. The filesystem seems to be intact and complete, but whatever, let's call this host dead. The data in the repo can be reproduced relatively easily. (It's really just the cdn.media.ccc.de mirror that needs reproducing.) |
You could copy over this data to another host, and verify that Badger is doing the right thing. Not sure there's anything else we need to do from Badger's end, so I'm considering this issue closed. |
Agreed, I'm closing the issue on the Badger end, thanks for investigating this issue @manishrjain which wasn't actually related to Badger.
Could you do this @lgierth to be extra sure? Or is the DB too big to perform a full copy? |
This has been solved -- the underlying mdadm RAID got into a weird state and might have corrupted/lost data. |
Version information: 0.4.16-rc2
Type: bug
Description:
I got his large badgerds repo (~5TB) which I recently update to 0.4.16-rc2. After the included 6-to-7 repo migration, my repo is corrupted. I'm not sure whether I hard-killed the daemon before the update. The version it was previously running is 8b383da which was first included in 0.4.15-rc1.
I tried doing a badger backup with truncation enabled, but that didn't actually go and truncate stuff:
The text was updated successfully, but these errors were encountered: