Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read tombstones: EOF after upgrade to 0.14.0~n201605160800 #6641

Closed
daviesalex opened this issue May 16, 2016 · 9 comments
Closed

read tombstones: EOF after upgrade to 0.14.0~n201605160800 #6641

daviesalex opened this issue May 16, 2016 · 9 comments
Assignees
Labels
Milestone

Comments

@daviesalex
Copy link
Contributor

We jsut upgraded from 0.12.1-1 to the latest nightly (0.14.0~n201605160800)

On restart, the performance was great, but it seems that only data in the last ~20 hours is acceissible. The other data is still on disk.

There are a bunch of startup errors:

[root@carf-metrics-influx01 influxdb]# grep err influxd.log
[store] 2016/05/16 16:14:55 Failed to open shard: 1: [shard 1] error opening memory map for file /data/influxdb-data/metrics/tg_udp/default/1/000002485-000000032.tsm: init: read tombstones: EOF
[store] 2016/05/16 16:14:55 Failed to open shard: 61: [shard 61] error opening memory map for file /data/influxdb-data/metrics/tg_udp/default/61/000000001-000000001.tsm: init: read tombstones: EOF
[store] 2016/05/16 16:14:56 Failed to open shard: 20: [shard 20] error opening memory map for file /data/influxdb-data/metrics/tg_udp/default/20/000003221-000000041.tsm: init: read tombstones: EOF
[store] 2016/05/16 16:14:57 Failed to open shard: 14: [shard 14] error opening memory map for file /data/influxdb-data/metrics/ryan_test/default/14/000000001-000000001.tsm: init: read tombstones: EOF
[store] 2016/05/16 16:14:58 Failed to open shard: 9: [shard 9] error opening memory map for file /data/influxdb-data/metrics/tg_udp/default/9/000003226-000000030.tsm: init: read tombstones: EOF
[store] 2016/05/16 16:14:59 Failed to open shard: 64: [shard 64] error opening memory map for file /data/influxdb-data/metrics/tg_udp/default/64/000000001-000000001.tsm: init: read tombstones: EOF
[store] 2016/05/16 16:14:59 Failed to open shard: 68: [shard 68] error opening memory map for file /data/influxdb-data/metrics/tg_udp/default/68/000000160-000000007.tsm: init: read tombstones: EOF
[store] 2016/05/16 16:18:09 Failed to open shard: 14: [shard 14] error opening memory map for file /data/influxdb-data/metrics/ryan_test/default/14/000000001-000000001.tsm: init: read tombstones: EOF
[store] 2016/05/16 16:18:10 Failed to open shard: 1: [shard 1] error opening memory map for file /data/influxdb-data/metrics/tg_udp/default/1/000002485-000000032.tsm: init: read tombstones: EOF
[store] 2016/05/16 16:18:10 Failed to open shard: 68: [shard 68] error opening memory map for file /data/influxdb-data/metrics/tg_udp/default/68/000000640-000000007.tsm: init: read tombstones: EOF
[store] 2016/05/16 16:18:10 Failed to open shard: 9: [shard 9] error opening memory map for file /data/influxdb-data/metrics/tg_udp/default/9/000003226-000000029.tsm: init: read tombstones: EOF
[store] 2016/05/16 16:18:11 Failed to open shard: 64: [shard 64] error opening memory map for file /data/influxdb-data/metrics/tg_udp/default/64/000000001-000000001.tsm: init: read tombstones: EOF
[store] 2016/05/16 16:18:13 Failed to open shard: 20: [shard 20] error opening memory map for file /data/influxdb-data/metrics/tg_udp/default/20/000003221-000000041.tsm: init: read tombstones: EOF
[store] 2016/05/16 16:18:13 Failed to open shard: 61: [shard 61] error opening memory map for file /data/influxdb-data/metrics/tg_udp/default/61/000000001-000000001.tsm: init: read tombstones: EOF

These files seem legit, have the correct permissions, and have a mtime from months ago:

[root@carf-metrics-influx01 influxdb]# ls -lh /data/influxdb-data/metrics/tg_udp/default/1/000002485-000000032.tsm
-rw-r--r-- 1 influxdb influxdb 940M Feb  2 07:18 /data/influxdb-data/metrics/tg_udp/default/1/000002485-000000032.tsm
[root@carf-metrics-influx01 influxdb]# ls -lh /data/influxdb-data/metrics/tg_udp/default/61/000000001-000000001.tsm
-rw-r--r-- 1 influxdb influxdb 6.3M Mar 11 19:00 /data/influxdb-data/metrics/tg_udp/default/61/000000001-000000001.tsm
[root@carf-metrics-influx01 influxdb]# ls -lh /data/influxdb-data/metrics/tg_udp/default/20/000003221-000000041.tsm
-rw-r--r-- 1 influxdb influxdb 1.5G Feb 16 09:04 /data/influxdb-data/metrics/tg_udp/default/20/000003221-000000041.tsm

Suggestions?

@jwilder
Copy link
Contributor

jwilder commented May 16, 2016

@daviesalex Can you see if the corresponding tombstone files are 0 sized. For /data/influxdb-data/metrics/tg_udp/default/1/000002485-000000032.tsm, the tombstone file would be /data/influxdb-data/metrics/tg_udp/default/1/000002485-000000032.tsm.tombstone.

If they are 0 size, you can remove them to allow the DB to start up. There was a bug that allowed 0 length files to be written incorrectly. These should get ignored, but might not be currently.

@daviesalex
Copy link
Contributor Author

@jwilder they are (without the .tsm). mtime is months ago though, and the data was accessible pre-restart (and will have been restated, and upgraded, multiple times since March).

From what I can see from a quick grafana check, 100% of data more than about 20 hours old is now not accessible.

[root@carf-metrics-influx01 influxdb]# ls -lh /data/influxdb-data/metrics/tg_udp/default/1/000002485-000000032.tombstone
-rw------- 1 influxdb influxdb 0 Mar 18 10:33 /data/influxdb-data/metrics/tg_udp/default/1/000002485-000000032.tombstone
[root@carf-metrics-influx01 influxdb]# ls -lh /data/influxdb-data/metrics/tg_udp/default/1/000002485-000000032.tsm.tombstone
ls: cannot access /data/influxdb-data/metrics/tg_udp/default/1/000002485-000000032.tsm.tombstone: No such file or directory

@jwilder
Copy link
Contributor

jwilder commented May 16, 2016

You can remove any 0 sized ones as a workaround for now.

@daviesalex
Copy link
Contributor Author

Thanks - giving that a go. FWIW, they are all zero sized:

[root@carf-metrics-influx01 influxdb-data]# find . -name *tombstone | wc -l
138
[root@carf-metrics-influx01 influxdb-data]# find . -name *tombstone -size 0 | wc -l
138
[root@carf-metrics-influx01 influxdb-data]# mkdir /tmp/tombs
[root@carf-metrics-influx01 influxdb-data]# find . -name *tombstone -size 0 -exec mv {} /tmp/tombs/ \;
[root@carf-metrics-influx01 influxdb-data]# find . -name *tombstone -size 0 | wc -l
0

@daviesalex
Copy link
Contributor Author

Right, that gets rid of the errors. Sadly now we panic immediately on startup (before any query executes):

[retention] 2016/05/16 16:40:58 Starting retention policy enforcement service with check interval of 10m0s
[udp] 2016/05/16 16:40:58 Started listening on UDP: :8089
[udp] 2016/05/16 16:40:58 Started listening on UDP: :8090
[run] 2016/05/16 16:40:58 Listening for signals
panic: interface conversion: tsm1.Value is *tsm1.FloatValue, not *tsm1.IntegerValue

goroutine 1852 [running]:
panic(0xb76bc0, 0xc936def740)
    /usr/local/go/src/runtime/panic.go:481 +0x3e6
github.com/influxdata/influxdb/tsdb/engine/tsm1.encodeIntegerBlock(0x0, 0x0, 0x0, 0xc95ec7c000, 0x2ff, 0x400, 0x0, 0x0, 0x0, 0x0, ...)
    /root/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/encoding.go:771 +0x72b
github.com/influxdata/influxdb/tsdb/engine/tsm1.Values.Encode(0xc95ec7c000, 0x2ff, 0x400, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
    /root/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/encoding.go:195 +0x22c
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*tsmKeyIterator).chunk(0xc8ca12ef00, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
    /root/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/compact.go:992 +0x4af
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*tsmKeyIterator).combine(0xc8ca12ef00, 0xc986cd9400, 0x0, 0x0, 0x0)
    /root/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/compact.go:965 +0xd0a
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*tsmKeyIterator).merge(0xc8ca12ef00)
    /root/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/compact.go:845 +0x13f
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*tsmKeyIterator).Next(0xc8ca12ef00, 0xc83e5d0531)
    /root/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/compact.go:813 +0xbe5
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Compactor).write(0xc822aa8ab0, 0xc926539630, 0x4a, 0x7fe690183820, 0xc8ca12ef00, 0x0, 0x0)
    /root/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/compact.go:560 +0x53b
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Compactor).writeNewFiles(0xc822aa8ab0, 0x212, 0x5, 0x7fe690183820, 0xc8ca12ef00, 0x0, 0x0, 0x0, 0x0, 0x0)
    /root/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/compact.go:507 +0x3a1
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Compactor).compact(0xc822aa8ab0, 0xc8222e2500, 0xc9a352dfc0, 0x2, 0x2, 0x0, 0x0, 0x0, 0x0, 0x0)
    /root/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/compact.go:473 +0x4fe
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Compactor).CompactFull(0xc822aa8ab0, 0xc9a352dfc0, 0x2, 0x2, 0x0, 0x0, 0x0, 0x0, 0x0)
    /root/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/compact.go:478 +0x71
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Engine).compactTSMFull.func1(0xc966b63f60, 0xc820124580, 0x0, 0xc9a352dfc0, 0x2, 0x2)
    /root/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/engine.go:760 +0x4c6
created by github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Engine).compactTSMFull
    /root/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/engine.go:778 +0x27e

New issue? Any suggestions?

@jwilder
Copy link
Contributor

jwilder commented May 16, 2016

That is a new issue: Looks similar to #6595 (comment)

Are there any log statements about full compactions further up?

@daviesalex
Copy link
Contributor Author

Moving discussion to #6595... thanks for your help! I'm not sure if the original issue (these 0 byte files getting in the way) is an issue that requires a better resolution, i'll leave it open for now.

At the very least others getting it will find it and know its safe to just delete them...

@jwilder
Copy link
Contributor

jwilder commented May 16, 2016

The 0 length tombstones should be ignored and not prevent the DB from starting. We'll keep this open until that is fixed.

@jwilder jwilder added this to the 1.0.0 milestone May 16, 2016
@daviesalex
Copy link
Contributor Author

Thanks. Just to be clear, in this case, they did not prevent the DB starting - I (incorrectly) linked them to the fact that most of our data was missing once it started but that appears to be a separate problem.

What is odd is that the DB started with them (with the last 24 hours data), but without them panics immediately. I have no idea why that is.

A start on the latest stable is taking quite a bit longer, which makes me think it might end up with more data (or its just less efficient at not starting with our data!!). I'll report back shortly.

@jwilder jwilder self-assigned this May 19, 2016
jwilder added a commit that referenced this issue May 19, 2016
Due to an bug in TSM tombstone files, it was possible to create
empty tombstone files.  At startup, the TSM file would error out
and not load the TSM file.

Instead, treat it as an empty v1 file so the TSM file can load
correctly.

Fixes #6641
@jwilder jwilder mentioned this issue May 19, 2016
3 tasks
@timhallinflux timhallinflux modified the milestones: 1.0.0, 1.0.0 beta Dec 20, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants