-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
panic: BlockCount: error unpacking block: #8482
Comments
Influxdb conf file |
Can you describe your storage setup? Are you using directly attached SSDs or is there RAID/NFS/SAN or anything else in the mix? If appears that one of your TSM files seems to have lost some of it's contents. Since TSM files are immutable after being written, this usually hints at a file system/hardware issue. From the logs, it looks like it might be one of these 4 files:
Could your run |
Storage is a 3TB raid 5 array consisting of physical disks (not SSD but sata or older scsi I think struggling to remember) It's possible there could be an issue with the raid I guess. Here is the output of the command /var/lib/influxdb/data$ sudo influx_inspect verify -dir /var/lib/influxdb goroutine 1 [running]: |
Looks like Unfortunately, I've seen several cases of RAIDed drives corrupting data files. Since the index is intact, (at the end of the file), but some of data blocks are failing checksums, this is very likely a hardware/filesystem issue where the block data was corrupted after being written. It looks like like some of the data in those files may be recoverable, but you will probably need to restore from backup or move the bad files out of the shard to get back up and running again. |
Ok I moved all of 179 out of the directory and ran verify and now I am getting this: influx_inspect verify -dir /var/lib/influxdb goroutine 1 [running]: |
Ok I just moved all the data files out of the data directory and started my service again and so far so good. Thanks for the help! |
I am experiencing a similar failure following a disk filling. The result is an influxd restart loop as it attempts to compact, hits the corrupt file, and fails.
I have InfluxDB version 1.4.2, branch 1.4, commit 6d2685d It would be nice if influxdb did not crash in this scenario, as this prevents it from accepting writes. |
@twm -- are you still experiencing this issue? |
I moved the corrupted file aside and InfluxDB was able to start successfully. That's about as much as I can expect from a single-host deployment, I think. It would have been nice if the failure message indicated that I could do this — I was not familiar with the on-disk layout at all, so I didn't realize that removing a file would simply drop the data within it until I found this issue. |
closing since you were able to work around this issue. |
Fixed in |
I am running influxdb (I think latest stable release) on ubuntu 14.04 with grafana and fastnetmon.
Lately influx db seems to be crashing every day or every couple of days but sometimes a reboot of the server fixes it.
If I manually start influxdb it stops after about a minute currently.
graphite database is about 11gig so I suspect it is something to do with the database size and the query grafana is trying to make but I have no idea how to fix it,
See the log.
The text was updated successfully, but these errors were encountered: