Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[0.9.6.1 tsm1] panic: unexpected fault address #5283

Closed
ivanscattergood opened this issue Jan 6, 2016 · 13 comments
Closed

[0.9.6.1 tsm1] panic: unexpected fault address #5283

ivanscattergood opened this issue Jan 6, 2016 · 13 comments

Comments

@ivanscattergood
Copy link

Hi
I have been running an instance of 0.9.6.1 using tsm1 since 27th December, it crashed this morning just after midnight.

At midnight I run a summarisation routine which inserts aggregate data for the Day (I have tried using Continuous queries for this without much success). So I can assume this is the reason for the crash.

I am restarting the database now to see if it recovers.

Here is the output from the log:

[http] 2016/01/06 00:12:48 127.0.0.1 - root [06/Jan/2016:00:11:48 +0000] GET /query?db=open_access_db&p=%5BREDACTED%5D&q=
select+value+from+metric++WHERE+%28%22Data+View%22+%3D+%27ClientStats%27%29+AND+%28%22Row+name%22+%3D+%27nycdlpaperf08.ny
.rbcds.com%2F28691%2F%2300010002%27%29+AND+%28%22Gateway%22+%3D+%27mdoe_ny_dev%27%29+AND+%28%22Column+name%22+%3D+%27Tota
l+Client+Bytes+Received%27%29+AND+%28%22Managed+Entity%22+%3D+%27sol-nyplab-geqty1%27%29+and+time+%3E+now%28%29+-+5d+orde
r+by+time+asc+limit+1&u=root HTTP/1.1 200 40 - okhttp/2.4.0 13f895db-b40a-11e5-bb10-000000000000 1m0.13940845s
[http] 2016/01/06 00:12:48 127.0.0.1 - root [06/Jan/2016:00:12:39 +0000] POST /write?consistency=any&db=open_access_db&p=
%5BREDACTED%5D&precision=n&rp=detail&u=root HTTP/1.1 204 0 - okhttp/2.4.0 328e6b70-b40a-11e5-bb13-000000000000 9.24275996
8s
[query] 2016/01/06 00:12:48 SELECT mean FROM open_access_db.day.metric_day WHERE ("Data View" = 'ClientStats') AND ("Row
name" = 'nycdlpaperf07/17898/#0001001e') AND (Gateway = 'mdoe_ny_dev') AND ("Column name" = 'Denied Subscribe Permission'
) AND ("Managed Entity" = 'sol-nyplab-geqty1') AND time > now() - 5d ORDER BY time DESC LIMIT 1
[http] 2016/01/06 00:12:48 127.0.0.1 - root [06/Jan/2016:00:12:48 +0000] GET /query?db=open_access_db&p=%5BREDACTED%5D&q=
select+mean+from+day.metric_day++WHERE+%28%22Data+View%22+%3D+%27ClientStats%27%29+AND+%28%22Row+name%22+%3D+%27nycdlpape
rf07%2F17898%2F%230001001e%27%29+AND+%28%22Gateway%22+%3D+%27mdoe_ny_dev%27%29+AND+%28%22Column+name%22+%3D+%27Denied+Sub
scribe+Permission%27%29+AND+%28%22Managed+Entity%22+%3D+%27sol-nyplab-geqty1%27%29+and+time+%3E+now%28%29+-+5d+order+by+t
ime+desc+limit+1&u=root HTTP/1.1 200 40 - okhttp/2.4.0 381dae70-b40a-11e5-bb14-000000000000 22.881099ms
[query] 2016/01/06 00:12:48 SELECT value FROM open_access_db.detail.metric WHERE ("Data View" = 'ClientStats') AND ("Row
name" = 'nycdlpaperf07/17898/#0001001e') AND (Gateway = 'mdoe_ny_dev') AND ("Column name" = 'Denied Subscribe Permission'
) AND ("Managed Entity" = 'sol-nyplab-geqty1') AND time > now() - 5d ORDER BY time ASC LIMIT 1
[http] 2016/01/06 00:13:17 127.0.0.1 - root [06/Jan/2016:00:13:10 +0000] POST /write?consistency=any&db=open_access_db&p=
%5BREDACTED%5D&precision=n&rp=detail&u=root HTTP/1.1 204 0 - okhttp/2.4.0 44b9b02f-b40a-11e5-bb16-000000000000 7.38831678
5s
[http] 2016/01/06 00:13:56 127.0.0.1 - root [06/Jan/2016:00:13:45 +0000] POST /write?consistency=any&db=open_access_db&p=
%5BREDACTED%5D&precision=n&rp=detail&u=root HTTP/1.1 204 0 - okhttp/2.4.0 59bec709-b40a-11e5-bb17-000000000000 10.9149476
46s
unexpected fault address 0x7f8c91c9a2b4
fatal error: fault
[signal 0xb code=0x1 addr=0x7f8c91c9a2b4 pc=0x803db1]

goroutine 626594 [running]:
runtime.gothrow(0xbd0e00, 0x5)
        /usr/local/go/src/runtime/panic.go:503 +0x8e fp=0xc9f702af50 sp=0xc9f702af38
runtime.sigpanic()
        /usr/local/go/src/runtime/sigpanic_unix.go:29 +0x265 fp=0xc9f702afa0 sp=0xc9f702af50
github.com/influxdb/influxdb/tsdb/engine/tsm1.btou16(0x7f8c91c9a2b3, 0x2, 0xc30e4ea, 0xcb51bda1c0)
        /tmp/tmp.NZsW7kxYIE/src/github.com/influxdb/influxdb/tsdb/engine/tsm1/data_file.go:1444 +0x31 fp=0xc9f702afa8 sp=
0xc9f702afa0
github.com/influxdb/influxdb/tsdb/engine/tsm1.func·004(0x811a4, 0x17007f8d6f1b8400)
        /tmp/tmp.NZsW7kxYIE/src/github.com/influxdb/influxdb/tsdb/engine/tsm1/data_file.go:524 +0xe0 fp=0xc9f702b030 sp=0
xc9f702afa8
sort.Search(0xff36c, 0xc9f702b150, 0xcb51bda1c0)
        /usr/local/go/src/sort/search.go:66 +0x59 fp=0xc9f702b060 sp=0xc9f702b030
github.com/influxdb/influxdb/tsdb/engine/tsm1.(*indirectIndex).search(0xc6ec066000, 0xcb51bda1c0, 0x1ac, 0x1c0, 0x1c0)
        /tmp/tmp.NZsW7kxYIE/src/github.com/influxdb/influxdb/tsdb/engine/tsm1/data_file.go:528 +0x69 fp=0xc9f702b170 sp=0
xc9f702b060
@ivanscattergood
Copy link
Author

I have restarted the Database and its working OK.

@jwilder
Copy link
Contributor

jwilder commented Jan 6, 2016

This may have been fixed by #5264 which is in the latest nightlies.

@jwilder
Copy link
Contributor

jwilder commented Jan 6, 2016

Also, can you put the full trace of panic in a gist?

@ivanscattergood
Copy link
Author

@jwilder jwilder changed the title [0.9.6.1 tsm1] Crash on insert of summary data [0.9.6.1 tsm1] panic: unexpected fault address Jan 7, 2016
@jwilder
Copy link
Contributor

jwilder commented Jan 7, 2016

@ivanscattergood @dswarbrick Are you able to run one of the nightlies compiled with go 1.4.2 and see if this panic goes away?

@dswarbrick
Copy link

@jwilder I've built golang 1.4.3 from source packages that are still lurking in Debian's repos, and built InfluxDB from git master (7ccbbec as of writing). It's starting up now (which seems to take about 30 minutes, as it wades through every shard, but that's in issue for another day). Due to the nature of the bug, I won't know if it's resolved until 24 hours from now.

@jwilder
Copy link
Contributor

jwilder commented Jan 7, 2016

@dswarbrick Great. Could you log an issue for the startup and attach a copy of your startup log? I'm aware of some slow parts in startup, but would like see what is taking so long on your system.

@rossmcdonald
Copy link
Contributor

If anyone else would like to try and recreate this issue, I've generated Go 1.4.2 packages from master here:

@dswarbrick
Copy link

@jwilder Slow startup issue created, #5311

@dswarbrick
Copy link

@jwilder My golang 1.4 build did not panic or get oom-killed, but as you can see from the attached graphs, things still went quite pear shaped after it had been running for 24h.

influxdb-golang1 4

@jwilder
Copy link
Contributor

jwilder commented Jan 11, 2016

@dswarbrick I'm curious if you see that pear-shaped graph each night still? It may have been that the server crashed so no full compactions had run that pear is just first time all of them had succeeded.

@dswarbrick
Copy link

@jwilder Here is another screenshot of my dashboard. Unfortunately Grafana seems to mess up the CPU graphs when zoomed right out, but you can see on the left of the load average graph the "pear shape" hump which was the golang 1.4 build. At 16:00 on 1/9 I restarted InfluxDB, using 0.10.0-nightly-72c6a51 from repos (i.e., golang 1.5 build). Interestingly, the load average is noticeably higher with the golang 1.5 build (but still quite acceptable).

Since installing 0.10.0-nightly-72c6a51, there have been a few moments at intervals of 24h since starting, where it looked like it was getting ready for its daily spin cycle, but the load subsided once again. Also no crazy disk utilization spikes, no oom-kills, and no panics.

Incidentally, I took your advice and started using second-precision timestamps - which resulted in visibly lower network traffic around 03:00 on 1/10. Yesterday's shard was also 7.9 GB instead of the usual 11 GB.

pearshape

@jwilder
Copy link
Contributor

jwilder commented Jan 11, 2016

@dswarbrick Great. #5331 will be switching the builds back to 1.4.3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants