-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unexpected list of size zero while creating reverse index #2752
Comments
Unexpected list of size zero
while creating reverse index
Complete log:
|
Fixed in v1.0.11-rc2. Can you try with that? The on disk representation has not changed, between 10 and the rc. |
Feel free to reopen if it still persists in rc2. |
Unfortunately, it still fails: Dgraph version : v1.0.11-rc2
|
Yeah, I see the assert still there. Been on my TODO list to refactor this a bit and speed it up. Will take care of this. If you want to get out of this issue right away, and are not afraid to compile Dgraph, just replace the assert with a continue here: |
Thank you! On zero:
|
Well, the bug itself is not that important, but to me it shows a much deeper problem here: I’m coming from the Java universe, where bugs of this kind rarely crash even a thread. For my application I can accept a TX rollback or even a read-only state for some predicates. Well, even a slow, read-only cluster with warnings, alarms and blinking lights is still a possible solution: it still works. But loosing a possibility to read committed data doesn't seem like a viable option. |
It is actually not a crash, but an assert failure. Part of a defensive coding technique that we use at Dgraph (where we put asserts at places we never expect an error). In this case, that assert is actually not that useful, and as you're witnessing, does trigger. Java is a different universe, where a few unhandled exceptions here and there are just part and parcel of life. I'm personally not a fan of that. |
Manish, please understand me right: I really don't want to start a programming holy war here. I fully understand the value of But please try to imagine my situation right now: DGraph is at the core of my data product. Somebody clicked something, and this or some other assert misfired. Now my database is down and I have absolutely no way to recover it. What are my options before customers start calling? Before trying I've read the entire DGraph documentation and found some suggestion to delete WAL. Sadly, that does not help in my case. What I expect from a production ready product is quite modest:
Instead, on the step 3, the alpha goes down, killing 1/3 of the database. I believe, if it was a part of a replication group, all related alphas would face the same assert and behave identically. |
I think you're conflating an assert failure from a stack trace (in Java) or error handling (in Go). Both of the latter do what you expect, i.e. keep the process running and just error the request out. Assert failures are different, the idea behind them is not that the request has an error. The idea is that there're assumptions that a developer makes about how the code should work, and the asserts (Equivalent of log fatal) enforce those assumptions, so if it ever triggers, the developer knows that their assumptions are flawed. This is part of a defensive coding technique, nothing to do with the language that the code is written in. This discussion, however, ventures into programming philosophy, something that I don't intend to go into. @danielmai is putting together the fix that I'd mentioned earlier, and we'll cut a release candidate today, so you could use that. I'd also recommend not to base your entire perception of Dgraph on this bug. As you mentioned, bugs do and will happen. If you're concerned about Dgraph's stability in production, I'm happy to have a chat. |
Fix confirmed! Thank you |
thank god it works :) and thank you for the fast fix i think there should be a good practise to do regular backup of the database |
I'm using the latest DGraph v1.0.10 8b801bd
On a powerful Windows Laptop, with all default settings (e.g. replication factor 1).
ratel
reverse
index on actor.filmOne server runs 100% cpu of one core for 10 minutes and then crashes:
2018/11/15 13:00:06 Unexpected list of size zero: "\x00\x00\nactor.film\f\x00\x00\x00\x00"
github.com/dgraph-io/dgraph/vendor/github.com/dgraph-io/badger/y.AssertTruef
/ext-go/1/src/github.com/dgraph-io/dgraph/vendor/github.com/dgraph-io/badger/y/error.go:62
github.com/dgraph-io/dgraph/posting.(*rebuild).Run
/ext-go/1/src/github.com/dgraph-io/dgraph/posting/index.go:587
github.com/dgraph-io/dgraph/posting.RebuildReverseEdges
/ext-go/1/src/github.com/dgraph-io/dgraph/posting/index.go:710
github.com/dgraph-io/dgraph/worker.(*node).rebuildOrDelRevEdge
/ext-go/1/src/github.com/dgraph-io/dgraph/worker/index.go:55
github.com/dgraph-io/dgraph/worker.runSchemaMutationHelper
/ext-go/1/src/github.com/dgraph-io/dgraph/worker/mutation.go:169
github.com/dgraph-io/dgraph/worker.runSchemaMutation
/ext-go/1/src/github.com/dgraph-io/dgraph/worker/mutation.go:99
github.com/dgraph-io/dgraph/worker.(*node).applyMutations
/ext-go/1/src/github.com/dgraph-io/dgraph/worker/draft.go:198
github.com/dgraph-io/dgraph/worker.(*node).applyCommitted
/ext-go/1/src/github.com/dgraph-io/dgraph/worker/draft.go:280
github.com/dgraph-io/dgraph/worker.(*node).processApplyCh.func1
/ext-go/1/src/github.com/dgraph-io/dgraph/worker/draft.go:401
github.com/dgraph-io/dgraph/worker.(*node).processApplyCh
/ext-go/1/src/github.com/dgraph-io/dgraph/worker/draft.go:429
runtime.goexit
/usr/local/go/src/runtime/asm_amd64.s:1333
Upon restart, it reads the WAL of the schema migration, re-starts the index and crashes again.
The text was updated successfully, but these errors were encountered: