-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Node falls behind Metastore updates #2343
Milestone
Comments
In this test, the node is given 2 minutes to catch up, so this is clearly a real problem. |
This was referenced Apr 19, 2015
jwilder
added a commit
that referenced
this issue
Apr 21, 2015
jwilder
added a commit
that referenced
this issue
Apr 21, 2015
I think this is fixed with PR #2353. |
Reproduced locally w/ raft tracing enabled: https://gist.github.com/jwilder/1af8a408c98b2916c131 |
Gist link doesn't work for me. |
No longer applicable with the new clustering design. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Every so often during 3-node integration a testing, a failure occurs as a single node complains that it does not recognise a measurement for querying. This is happening because that node has fallen behind metastore updates. Even after many seconds it has not caught up.
The diagnostics dumps are below, which show that the broadcast index is too low on one of the nodes.
{"results":[{"series":[{"name":"server_go","tags":{"serverID":"1"},"columns":["time","goMaxProcs","numGoRoutine","version"],"values":[["2015-04-19T17:16:40.518967226Z",1,235,"go1.4"]]},{"name":"server_system","tags":{"serverID":"1"},"columns":["time","hostname","pid","os","arch","numCPU"],"values":[["2015-04-19T17:16:40.518968249Z","box446",12600,"linux","amd64",2]]},{"name":"server_memory","tags":{"serverID":"1"},"columns":["time","alloc","totalAlloc","sys","lookups","mallocs","frees","heapAlloc","heapSys","heapIdle","heapInUse","heapReleased","heapObjects","pauseTotalNs","numGG"],"values":[["2015-04-19T17:16:40.518969695Z",2631152,6857999304,26564856,117545,21069460,21055289,2631152,20938752,16162816,4775936,0,14171,5367642346,1711]]},{"name":"server_build","tags":{"serverID":"1"},"columns":["time","version","commitHash"],"values":[["2015-04-19T17:16:40.518972055Z","0.9",""]]},{"name":"server_diag","tags":{"serverID":"1"},"columns":["time","startTime","uptime","id","path","authEnabled","index","retentionAutoCreate","numShards","cqLastRun"],"values":[["2015-04-19T17:16:40.518874138Z","2015-04-19 17:14:42.896718315 +0000 UTC","1m57.622224804s","1","/tmp/influxdb-809097743/data-integration-test/0",false,290,true,8,"0001-01-01 00:00:00 +0000 UTC"]]},{"name":"shardGroups_diag","tags":{"serverID":"1"},"columns":["time","database","retentionPolicy","id","startTime","endTime","duration","numShards"],"values":[["2015-04-19T17:16:40.518874138Z","mydb","myrp","8","2000-01-01 00:00:00 +0000 UTC","2000-01-01 01:00:00 +0000 UTC","1h0m0s",1]]},{"name":"shards_diag","tags":{"serverID":"1"},"columns":["time","id","dataNodes","index","path","path","path","path","path","path","path","path"],"values":[["2015-04-19T17:16:40.518874138Z","6","1","222"],["/tmp/influxdb-809097743/data-integration-test/0/shards/6"],["2015-04-19T17:16:40.518874138Z","7","1","244"],["/tmp/influxdb-809097743/data-integration-test/0/shards/7"],["2015-04-19T17:16:40.518874138Z","8","1","292"],["/tmp/influxdb-809097743/data-integration-test/0/shards/8"],["2015-04-19T17:16:40.518874138Z","1","1","37"],["/tmp/influxdb-809097743/data-integration-test/0/shards/1"],["2015-04-19T17:16:40.518874138Z","2","1","52"],["/tmp/influxdb-809097743/data-integration-test/0/shards/2"],["2015-04-19T17:16:40.518874138Z","3","1","71"],["/tmp/influxdb-809097743/data-integration-test/0/shards/3"],["2015-04-19T17:16:40.518874138Z","4","1","87"],["/tmp/influxdb-809097743/data-integration-test/0/shards/4"],["2015-04-19T17:16:40.518874138Z","5","1","174"],["/tmp/influxdb-809097743/data-integration-test/0/shards/5"]]}]}]}
{"results":[{"series":[{"name":"server_go","tags":{"serverID":"0"},"columns":["time","goMaxProcs","numGoRoutine","version"],"values":[["2015-04-19T17:16:40.520398863Z",1,235,"go1.4"]]},{"name":"server_system","tags":{"serverID":"0"},"columns":["time","hostname","pid","os","arch","numCPU"],"values":[["2015-04-19T17:16:40.520399766Z","box446",12600,"linux","amd64",2]]},{"name":"server_memory","tags":{"serverID":"0"},"columns":["time","alloc","totalAlloc","sys","lookups","mallocs","frees","heapAlloc","heapSys","heapIdle","heapInUse","heapReleased","heapObjects","pauseTotalNs","numGG"],"values":[["2015-04-19T17:16:40.52040084Z",4249800,6859617952,26564856,117550,21070606,21055382,4249800,20922368,14589952,6332416,0,15224,5367642346,1711]]},{"name":"server_build","tags":{"serverID":"0"},"columns":["time","version","commitHash"],"values":[["2015-04-19T17:16:40.520402794Z","0.9",""]]},{"name":"server_diag","tags":{"serverID":"0"},"columns":["time","startTime","uptime","id","path","authEnabled","index","retentionAutoCreate","numShards","cqLastRun"],"values":[["2015-04-19T17:16:40.520331728Z","2015-04-19 17:14:42.896718315 +0000 UTC","1m57.623664339s","0","/tmp/influxdb-809097743/data-integration-test/1",false,290,true,8,"0001-01-01 00:00:00 +0000 UTC"]]},{"name":"shardGroups_diag","tags":{"serverID":"0"},"columns":["time","database","retentionPolicy","id","startTime","endTime","duration","numShards"],"values":[["2015-04-19T17:16:40.520331728Z","mydb","myrp","8","2000-01-01 00:00:00 +0000 UTC","2000-01-01 01:00:00 +0000 UTC","1h0m0s",1]]},{"name":"shards_diag","tags":{"serverID":"0"},"columns":["time","id","dataNodes","index"],"values":[["2015-04-19T17:16:40.520331728Z","8","1","0"],["2015-04-19T17:16:40.520331728Z","1","1","0"],["2015-04-19T17:16:40.520331728Z","2","1","0"],["2015-04-19T17:16:40.520331728Z","3","1","0"],["2015-04-19T17:16:40.520331728Z","4","1","0"],["2015-04-19T17:16:40.520331728Z","5","1","0"],["2015-04-19T17:16:40.520331728Z","6","1","0"],["2015-04-19T17:16:40.520331728Z","7","1","0"]]}]}]}
{"results":[{"series":[{"name":"server_go","tags":{"serverID":"0"},"columns":["time","goMaxProcs","numGoRoutine","version"],"values":[["2015-04-19T17:16:40.52486571Z",1,233,"go1.4"]]},{"name":"server_system","tags":{"serverID":"0"},"columns":["time","hostname","pid","os","arch","numCPU"],"values":[["2015-04-19T17:16:40.524866761Z","box446",12600,"linux","amd64",2]]},{"name":"server_memory","tags":{"serverID":"0"},"columns":["time","alloc","totalAlloc","sys","lookups","mallocs","frees","heapAlloc","heapSys","heapIdle","heapInUse","heapReleased","heapObjects","pauseTotalNs","numGG"],"values":[["2015-04-19T17:16:40.524867741Z",4174392,6861259592,26564856,117567,21071744,21057507,4174392,20922368,14573568,6348800,0,14237,5370146389,1712]]},{"name":"server_build","tags":{"serverID":"0"},"columns":["time","version","commitHash"],"values":[["2015-04-19T17:16:40.52486972Z","0.9",""]]},{"name":"server_diag","tags":{"serverID":"0"},"columns":["time","startTime","uptime","id","path","authEnabled","index","retentionAutoCreate","numShards","cqLastRun"],"values":[["2015-04-19T17:16:40.524797911Z","2015-04-19 17:14:42.896718315 +0000 UTC","1m57.628131383s","0","/tmp/influxdb-809097743/data-integration-test/2",false,273,true,8,"0001-01-01 00:00:00 +0000 UTC"]]},{"name":"shardGroups_diag","tags":{"serverID":"0"},"columns":["time","database","retentionPolicy","id","startTime","endTime","duration","numShards"],"values":[["2015-04-19T17:16:40.524797911Z","mydb","myrp","8","2000-01-01 00:00:00 +0000 UTC","2000-01-01 01:00:00 +0000 UTC","1h0m0s",1]]},{"name":"shards_diag","tags":{"serverID":"0"},"columns":["time","id","dataNodes","index"],"values":[["2015-04-19T17:16:40.524797911Z","3","1","0"],["2015-04-19T17:16:40.524797911Z","4","1","0"],["2015-04-19T17:16:40.524797911Z","5","1","0"],["2015-04-19T17:16:40.524797911Z","6","1","0"],["2015-04-19T17:16:40.524797911Z","7","1","0"],["2015-04-19T17:16:40.524797911Z","8","1","0"],["2015-04-19T17:16:40.524797911Z","1","1","0"],["2015-04-19T17:16:40.524797911Z","2","1","0"]]}]}]}
The text was updated successfully, but these errors were encountered: