Unexpected list of size zero while creating reverse index #2752

kosiakk · 2018-11-15T12:15:26Z

I'm using the latest DGraph v1.0.10 8b801bd
On a powerful Windows Laptop, with all default settings (e.g. replication factor 1).

Steps to reproduce the issue (command/config used to run Dgraph).

Import 1M example from the tutorial into a fresh installation (3 alpha servers on the same host, 1 zero)
Open Schema in ratel
Add reverse index on actor.film

Expected behaviour and actual result.
One server runs 100% cpu of one core for 10 minutes and then crashes:

2018/11/15 13:00:06 Unexpected list of size zero: "\x00\x00\nactor.film\f\x00\x00\x00\x00"
github.com/dgraph-io/dgraph/vendor/github.com/dgraph-io/badger/y.AssertTruef
/ext-go/1/src/github.com/dgraph-io/dgraph/vendor/github.com/dgraph-io/badger/y/error.go:62
github.com/dgraph-io/dgraph/posting.(*rebuild).Run
/ext-go/1/src/github.com/dgraph-io/dgraph/posting/index.go:587
github.com/dgraph-io/dgraph/posting.RebuildReverseEdges
/ext-go/1/src/github.com/dgraph-io/dgraph/posting/index.go:710
github.com/dgraph-io/dgraph/worker.(*node).rebuildOrDelRevEdge
/ext-go/1/src/github.com/dgraph-io/dgraph/worker/index.go:55
github.com/dgraph-io/dgraph/worker.runSchemaMutationHelper
/ext-go/1/src/github.com/dgraph-io/dgraph/worker/mutation.go:169
github.com/dgraph-io/dgraph/worker.runSchemaMutation
/ext-go/1/src/github.com/dgraph-io/dgraph/worker/mutation.go:99
github.com/dgraph-io/dgraph/worker.(*node).applyMutations
/ext-go/1/src/github.com/dgraph-io/dgraph/worker/draft.go:198
github.com/dgraph-io/dgraph/worker.(*node).applyCommitted
/ext-go/1/src/github.com/dgraph-io/dgraph/worker/draft.go:280
github.com/dgraph-io/dgraph/worker.(*node).processApplyCh.func1
/ext-go/1/src/github.com/dgraph-io/dgraph/worker/draft.go:401
github.com/dgraph-io/dgraph/worker.(*node).processApplyCh
/ext-go/1/src/github.com/dgraph-io/dgraph/worker/draft.go:429
runtime.goexit
/usr/local/go/src/runtime/asm_amd64.s:1333

Upon restart, it reads the WAL of the schema migration, re-starts the index and crashes again.

The text was updated successfully, but these errors were encountered:

kosiakk · 2018-11-15T12:17:12Z

Complete log:

I1115 12:50:46.178210   23844 init.go:80] 

Dgraph version   : v1.0.10
Commit SHA-1     : 8b801bd7
Commit timestamp : 2018-11-05 17:52:33 -0800
Branch           : HEAD

For Dgraph official documentation, visit https://docs.dgraph.io.
For discussions about Dgraph     , visit https://discuss.dgraph.io.
To say hi to the community       , visit https://dgraph.slack.com.

Licensed under Apache 2.0. Copyright 2015-2018 Dgraph Labs, Inc.


I1115 12:50:46.180210   23844 server.go:115] Setting Badger table load option: mmap
I1115 12:50:46.180210   23844 server.go:127] Setting Badger value log load option: mmap
I1115 12:50:46.180210   23844 server.go:155] Opening write-ahead log BadgerDB with options: {Dir:w ValueDir:w SyncWrites:true TableLoadingMode:1 ValueLogLoadingMode:2 NumVersionsToKeep:1 MaxTableSize:67108864 LevelSizeMultiplier:10 MaxLevels:7 ValueThreshold:65500 NumMemtables:5 NumLevelZeroTables:5 NumLevelZeroTablesStall:10 LevelOneSize:268435456 ValueLogFileSize:1073741823 ValueLogMaxEntries:10000 NumCompactors:3 managedTxns:false DoNotCompact:false maxBatchCount:0 maxBatchSize:0 ReadOnly:false Truncate:true}
badger2018/11/15 12:50:46 INFO: Replaying file id: 0 at offset: 0
badger2018/11/15 12:50:46 INFO: Replay took: 148ms
badger2018/11/15 12:50:46 INFO: Replaying file id: 1 at offset: 0
badger2018/11/15 12:50:46 INFO: Replay took: 7ms
I1115 12:50:46.347210   23844 server.go:115] Setting Badger table load option: mmap
I1115 12:50:46.347210   23844 server.go:127] Setting Badger value log load option: mmap
I1115 12:50:46.347210   23844 server.go:169] Opening postings BadgerDB with options: {Dir:p ValueDir:p SyncWrites:true TableLoadingMode:2 ValueLogLoadingMode:2 NumVersionsToKeep:2147483647 MaxTableSize:67108864 LevelSizeMultiplier:10 MaxLevels:7 ValueThreshold:1024 NumMemtables:5 NumLevelZeroTables:5 NumLevelZeroTablesStall:10 LevelOneSize:268435456 ValueLogFileSize:1073741823 ValueLogMaxEntries:1000000 NumCompactors:3 managedTxns:false DoNotCompact:false maxBatchCount:0 maxBatchSize:0 ReadOnly:false Truncate:true}
badger2018/11/15 12:50:46 INFO: Replaying file id: 6 at offset: 37183251
badger2018/11/15 12:50:46 INFO: Replay took: 463ms
badger2018/11/15 12:50:46 INFO: Replaying file id: 7 at offset: 0
badger2018/11/15 12:50:50 INFO: Replay took: 3.214s
I1115 12:50:50.134210   23844 groups.go:92] Current Raft Id: 21
I1115 12:50:50.136210   23844 worker.go:80] Worker listening at address: [::]:7080
I1115 12:50:50.137210   23844 run.go:338] gRPC server started.  Listening on port 9080
I1115 12:50:50.137210   23844 run.go:339] HTTP server started.  Listening on port 8080
I1115 12:50:50.141210   23844 pool.go:118] CONNECTED to localhost:5080
I1115 12:50:50.144210   23844 groups.go:119] Connected to group zero. Assigned group: 0
I1115 12:50:50.146210   23844 pool.go:118] CONNECTED to localhost:7082
I1115 12:50:50.148210   23844 pool.go:118] CONNECTED to localhost:7081
I1115 12:50:50.148210   23844 draft.go:74] Node ID: 21 with GroupID: 3
I1115 12:50:50.148210   23844 node.go:152] Setting raft.Config to: &{ID:21 peers:[] ElectionTick:100 HeartbeatTick:1 Storage:0xc0129520d0 Applied:2905 MaxSizePerMsg:1048576 MaxInflightMsgs:256 CheckQuorum:false PreVote:true ReadOnlyOption:0 Logger:0x1718900}
I1115 12:50:50.149210   23844 node.go:271] Found Snapshot.Metadata: {ConfState:{Nodes:[21] XXX_unrecognized:[]} Index:2905 Term:2 XXX_unrecognized:[]}
I1115 12:50:50.149210   23844 node.go:282] Found hardstate: {Term:6 Vote:21 Commit:3118 XXX_unrecognized:[]}
I1115 12:50:50.152210   23844 node.go:291] Group 3 found 214 entries
I1115 12:50:50.152210   23844 draft.go:1047] Restarting node for group: 3
I1115 12:50:50.152210   23844 node.go:174] Setting conf state to nodes:21 
I1115 12:50:50.153210   23844 node.go:84] 15 became follower at term 6
I1115 12:50:50.154210   23844 node.go:84] newRaft 15 [peers: [15], term: 6, commit: 3118, applied: 2905, lastindex: 3118, lastterm: 6]
I1115 12:50:50.155210   23844 draft.go:313] Skipping snapshot at 2905, because found one at 2905
I1115 12:50:50.156210   23844 index.go:48] Deleting reverse index for actor.film
I1115 12:50:50.242210   23844 index.go:54] Rebuilding reverse index for actor.film
I1115 12:50:51.084210   23844 groups.go:519] Got address of a Zero server: localhost:5080
I1115 12:50:53.374210   23844 node.go:84] 15 is starting a new election at term 6
I1115 12:50:53.374210   23844 node.go:84] 15 became pre-candidate at term 6
I1115 12:50:53.374210   23844 node.go:84] 15 received MsgPreVoteResp from 15 at term 6
I1115 12:50:53.374210   23844 node.go:84] 15 became candidate at term 7
I1115 12:50:53.374210   23844 node.go:84] 15 received MsgVoteResp from 15 at term 7
I1115 12:50:53.374210   23844 node.go:84] 15 became leader at term 7
I1115 12:50:53.374210   23844 node.go:84] raft.node: 15 elected leader 15 at term 7
I1115 12:50:54.154210   23844 groups.go:718] Leader idx=21 of group=3 is connecting to Zero for txn updates
E1115 12:51:22.157210   23844 groups.go:826] While proposing delta: max_assigned:11757 . Error=Server unavailable. Please retry later. Retrying...
E1115 12:51:50.161210   23844 groups.go:826] While proposing delta: max_assigned:11757 . Error=Server unavailable. Please retry later. Retrying...
E1115 12:52:18.161210   23844 groups.go:826] While proposing delta: max_assigned:11757 . Error=Server unavailable. Please retry later. Retrying...
E1115 12:52:46.161210   23844 groups.go:826] While proposing delta: max_assigned:11757 . Error=Server unavailable. Please retry later. Retrying...
E1115 12:53:14.161210   23844 groups.go:826] While proposing delta: max_assigned:11757 . Error=Server unavailable. Please retry later. Retrying...
E1115 12:53:42.161210   23844 groups.go:826] While proposing delta: max_assigned:11757 . Error=Server unavailable. Please retry later. Retrying...
E1115 12:54:10.161210   23844 groups.go:826] While proposing delta: max_assigned:11757 . Error=Server unavailable. Please retry later. Retrying...
E1115 12:54:38.161210   23844 groups.go:826] While proposing delta: max_assigned:11757 . Error=Server unavailable. Please retry later. Retrying...
E1115 12:55:06.161210   23844 groups.go:826] While proposing delta: max_assigned:11757 . Error=Server unavailable. Please retry later. Retrying...
E1115 12:55:34.161210   23844 groups.go:826] While proposing delta: max_assigned:11757 . Error=Server unavailable. Please retry later. Retrying...
E1115 12:56:02.161210   23844 groups.go:826] While proposing delta: max_assigned:11757 . Error=Server unavailable. Please retry later. Retrying...
E1115 12:56:30.161210   23844 groups.go:826] While proposing delta: max_assigned:11757 . Error=Server unavailable. Please retry later. Retrying...
E1115 12:56:58.161210   23844 groups.go:826] While proposing delta: max_assigned:11757 . Error=Server unavailable. Please retry later. Retrying...
E1115 12:57:26.161210   23844 groups.go:826] While proposing delta: max_assigned:11757 . Error=Server unavailable. Please retry later. Retrying...
E1115 12:57:54.161210   23844 groups.go:826] While proposing delta: max_assigned:11757 . Error=Server unavailable. Please retry later. Retrying...
E1115 12:58:22.161210   23844 groups.go:826] While proposing delta: max_assigned:11757 . Error=Server unavailable. Please retry later. Retrying...
E1115 12:58:50.161210   23844 groups.go:826] While proposing delta: max_assigned:11757 . Error=Server unavailable. Please retry later. Retrying...
E1115 12:59:18.161210   23844 groups.go:826] While proposing delta: max_assigned:11757 . Error=Server unavailable. Please retry later. Retrying...
E1115 12:59:46.161210   23844 groups.go:826] While proposing delta: max_assigned:11757 . Error=Server unavailable. Please retry later. Retrying...
2018/11/15 13:00:06 Unexpected list of size zero: "\x00\x00\nactor.film\f\x00\x00\x00\x00"
github.com/dgraph-io/dgraph/vendor/github.com/dgraph-io/badger/y.AssertTruef
	/ext-go/1/src/github.com/dgraph-io/dgraph/vendor/github.com/dgraph-io/badger/y/error.go:62
github.com/dgraph-io/dgraph/posting.(*rebuild).Run
	/ext-go/1/src/github.com/dgraph-io/dgraph/posting/index.go:587
github.com/dgraph-io/dgraph/posting.RebuildReverseEdges
	/ext-go/1/src/github.com/dgraph-io/dgraph/posting/index.go:710
github.com/dgraph-io/dgraph/worker.(*node).rebuildOrDelRevEdge
	/ext-go/1/src/github.com/dgraph-io/dgraph/worker/index.go:55
github.com/dgraph-io/dgraph/worker.runSchemaMutationHelper
	/ext-go/1/src/github.com/dgraph-io/dgraph/worker/mutation.go:169
github.com/dgraph-io/dgraph/worker.runSchemaMutation
	/ext-go/1/src/github.com/dgraph-io/dgraph/worker/mutation.go:99
github.com/dgraph-io/dgraph/worker.(*node).applyMutations
	/ext-go/1/src/github.com/dgraph-io/dgraph/worker/draft.go:198
github.com/dgraph-io/dgraph/worker.(*node).applyCommitted
	/ext-go/1/src/github.com/dgraph-io/dgraph/worker/draft.go:280
github.com/dgraph-io/dgraph/worker.(*node).processApplyCh.func1
	/ext-go/1/src/github.com/dgraph-io/dgraph/worker/draft.go:401
github.com/dgraph-io/dgraph/worker.(*node).processApplyCh
	/ext-go/1/src/github.com/dgraph-io/dgraph/worker/draft.go:429
runtime.goexit
	/usr/local/go/src/runtime/asm_amd64.s:1333

manishrjain · 2018-11-15T13:00:43Z

Fixed in v1.0.11-rc2. Can you try with that? The on disk representation has not changed, between 10 and the rc.

manishrjain · 2018-11-16T14:59:00Z

Feel free to reopen if it still persists in rc2.

kosiakk · 2018-11-16T20:26:19Z

Unfortunately, it still fails:

Dgraph version : v1.0.11-rc2
Commit SHA-1 : bec18bc
Commit timestamp : 2018-11-14 15:03:36 -0800
Branch : HEAD

2018/11/16 20:20:48 Unexpected list of size zero: "\x00\x00\nactor.film\f\x00\x00\x00\x00"
github.com/dgraph-io/dgraph/vendor/github.com/dgraph-io/badger/y.AssertTruef
	/ext-go/1/src/github.com/dgraph-io/dgraph/vendor/github.com/dgraph-io/badger/y/error.go:62
github.com/dgraph-io/dgraph/posting.(*rebuild).Run
	/ext-go/1/src/github.com/dgraph-io/dgraph/posting/index.go:587
github.com/dgraph-io/dgraph/posting.RebuildReverseEdges
	/ext-go/1/src/github.com/dgraph-io/dgraph/posting/index.go:710
github.com/dgraph-io/dgraph/worker.(*node).rebuildOrDelRevEdge
	/ext-go/1/src/github.com/dgraph-io/dgraph/worker/index.go:55
github.com/dgraph-io/dgraph/worker.runSchemaMutationHelper
	/ext-go/1/src/github.com/dgraph-io/dgraph/worker/mutation.go:179
github.com/dgraph-io/dgraph/worker.runSchemaMutation
	/ext-go/1/src/github.com/dgraph-io/dgraph/worker/mutation.go:100
github.com/dgraph-io/dgraph/worker.(*node).applyMutations
	/ext-go/1/src/github.com/dgraph-io/dgraph/worker/draft.go:198
github.com/dgraph-io/dgraph/worker.(*node).applyCommitted
	/ext-go/1/src/github.com/dgraph-io/dgraph/worker/draft.go:289
github.com/dgraph-io/dgraph/worker.(*node).processApplyCh.func1
	/ext-go/1/src/github.com/dgraph-io/dgraph/worker/draft.go:402
github.com/dgraph-io/dgraph/worker.(*node).processApplyCh
	/ext-go/1/src/github.com/dgraph-io/dgraph/worker/draft.go:429
runtime.goexit
	/usr/local/go/src/runtime/asm_amd64.s:1333

manishrjain · 2018-11-16T21:34:58Z

Yeah, I see the assert still there. Been on my TODO list to refactor this a bit and speed it up. Will take care of this.

If you want to get out of this issue right away, and are not afraid to compile Dgraph, just replace the assert with a continue here:
https://github.com/dgraph-io/dgraph/blob/a1475713bc7f5ce666b5740349283d1c22f46adc/posting/index.go#L587

kosiakk · 2018-11-16T22:23:56Z

Thank you!
As a workaround I've tried to delete w folder of the crashing group. It does not try to redo the schema migration, but cannot restart either E1116 23:22:35.102485 11928 groups.go:109] Error while connecting with group zero: rpc error: code = Unknown desc = Invalid address

On zero:

I1116 23:23:07.194485   13700 zero.go:477] Connected: id:89 group_id:4 addr:"localhost:7080"
I1116 23:23:10.412485   13700 zero.go:375] Got connection request: addr:"localhost:7080"
E1116 23:23:10.413485   13700 raft.go:516] While applying proposal: Invalid address
I1116 23:23:10.415485   13700 zero.go:477] Connected: id:90 group_id:4 addr:"localhost:7080"

kosiakk · 2018-11-16T22:25:20Z

Well, the bug itself is not that important, but to me it shows a much deeper problem here:
Why a normal bug in TX processing crashes the entire node?

I’m coming from the Java universe, where bugs of this kind rarely crash even a thread.
In this case, I'd expect some kind of transaction abort or any way of preserving the status quo.

For my application I can accept a TX rollback or even a read-only state for some predicates. Well, even a slow, read-only cluster with warnings, alarms and blinking lights is still a possible solution: it still works.

But loosing a possibility to read committed data doesn't seem like a viable option.

manishrjain · 2018-11-16T22:33:54Z

It is actually not a crash, but an assert failure. Part of a defensive coding technique that we use at Dgraph (where we put asserts at places we never expect an error). In this case, that assert is actually not that useful, and as you're witnessing, does trigger.

Java is a different universe, where a few unhandled exceptions here and there are just part and parcel of life. I'm personally not a fan of that.

kosiakk · 2018-11-17T12:40:09Z

Manish, please understand me right: I really don't want to start a programming holy war here. I fully understand the value of assert and use it as well. That's okay.
Bugs happen, asserts misfire, people fool around. That is inevitable, no matter which language or tech is used. Go is quite obviously a popular and reasonable choice.

But please try to imagine my situation right now:

DGraph is at the core of my data product. Somebody clicked something, and this or some other assert misfired. Now my database is down and I have absolutely no way to recover it. What are my options before customers start calling?

Before trying I've read the entire DGraph documentation and found some suggestion to delete WAL. Sadly, that does not help in my case.

What I expect from a production ready product is quite modest:

Badger/DGrpath discovers an unexpected situation, assert fires.
I get all the infos and stack trace in the log.
Operation/request is aborted.
Database continues to work as if nothing happened.

Instead, on the step 3, the alpha goes down, killing 1/3 of the database. I believe, if it was a part of a replication group, all related alphas would face the same assert and behave identically.

manishrjain · 2018-11-19T23:12:14Z

I think you're conflating an assert failure from a stack trace (in Java) or error handling (in Go). Both of the latter do what you expect, i.e. keep the process running and just error the request out. Assert failures are different, the idea behind them is not that the request has an error. The idea is that there're assumptions that a developer makes about how the code should work, and the asserts (Equivalent of log fatal) enforce those assumptions, so if it ever triggers, the developer knows that their assumptions are flawed. This is part of a defensive coding technique, nothing to do with the language that the code is written in.

This discussion, however, ventures into programming philosophy, something that I don't intend to go into. @danielmai is putting together the fix that I'd mentioned earlier, and we'll cut a release candidate today, so you could use that.

I'd also recommend not to base your entire perception of Dgraph on this bug. As you mentioned, bugs do and will happen. If you're concerned about Dgraph's stability in production, I'm happy to have a chat.

danielmai · 2018-11-20T01:25:23Z

@kosiakk v1.0.11-rc3 has been released containing a fix for this issue.

Link to binaries: Linux, Darwin, Windows.
Available on Docker with docker pull dgraph/dgraph:v1.0.11-rc3.

kosiakk · 2018-11-20T19:21:56Z

Fix confirmed! Thank you
I'll gain some more experience and will certainly get back to you with questions

u007 · 2018-12-24T05:00:04Z

thank god it works :)

and thank you for the fast fix

i think there should be a good practise to do regular backup of the database

kosiakk changed the title ~~Unexpected list of size zero while creating reverse index~~ Unexpected list of size zero while creating reverse index Nov 15, 2018

manishrjain closed this as completed Nov 16, 2018

manishrjain reopened this Nov 16, 2018

danielmai mentioned this issue Nov 19, 2018

Remove assert for zero-length posting lists. #2763

Merged

danielmai closed this as completed in #2763 Nov 20, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unexpected list of size zero while creating reverse index #2752

Unexpected list of size zero while creating reverse index #2752

kosiakk commented Nov 15, 2018 •

edited

Loading

kosiakk commented Nov 15, 2018

manishrjain commented Nov 15, 2018

manishrjain commented Nov 16, 2018

kosiakk commented Nov 16, 2018 •

edited

Loading

manishrjain commented Nov 16, 2018

kosiakk commented Nov 16, 2018

kosiakk commented Nov 16, 2018

manishrjain commented Nov 16, 2018

kosiakk commented Nov 17, 2018

manishrjain commented Nov 19, 2018

danielmai commented Nov 20, 2018

kosiakk commented Nov 20, 2018

u007 commented Dec 24, 2018 •

edited

Loading

Unexpected list of size zero while creating reverse index #2752

Unexpected list of size zero while creating reverse index #2752

Comments

kosiakk commented Nov 15, 2018 • edited Loading

kosiakk commented Nov 15, 2018

manishrjain commented Nov 15, 2018

manishrjain commented Nov 16, 2018

kosiakk commented Nov 16, 2018 • edited Loading

manishrjain commented Nov 16, 2018

kosiakk commented Nov 16, 2018

kosiakk commented Nov 16, 2018

manishrjain commented Nov 16, 2018

kosiakk commented Nov 17, 2018

manishrjain commented Nov 19, 2018

danielmai commented Nov 20, 2018

kosiakk commented Nov 20, 2018

u007 commented Dec 24, 2018 • edited Loading

kosiakk commented Nov 15, 2018 •

edited

Loading

kosiakk commented Nov 16, 2018 •

edited

Loading

u007 commented Dec 24, 2018 •

edited

Loading