-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
client.WriteSeries returns: Server returned (400): IO error: /opt/influxdb/shared/data/db/shard_db_v2/00190/MANIFEST-000006: No such file or directory #985
Conversation
this might also be useful
|
when i manually retry the same write later, it works fine. so maybe it uses another dir then, or it was a race condition between creating the dir and trying to use it? |
today i got another, bit more exotic variant of this:
nothing particular in /var/log/messages, dmesg, and plenty of space and inodes available. again, resuming my program where it left off (i.e. doing the same write that failed) seems to work fine |
Similar to #1009 and #1013. This is caused by the concurrent closing of a shard and opening it at the same time. This operation needs to be go routine safe. I'm not sure why the shards are being dropped though. @Dieterbe are you trying to write points in the past or the data collection is lagging behind ? |
yes, this is an import of old data, with timestamps anywhere between 2y ago and now. |
What's the retention and duration of those shards |
not sure, have recreated the db a couple of times in the meantime. I think i've usually kept shard duration 7d, retention was probably 365 or 730 days. (it's possible that some of the points being written have timestamps older than what the shard cares about) |
Cool, just wanted to make sure my guess makes sense. |
* shard_datastore.go(Deleteshard): Check the reference count of the shard and mark it for deletion if there are still more references out there. Otherwise, delete the shard immediately. Also refactor the deletion code in deleteShard(), see below. * shard_datastore.go(ReturnShard): Check to see if the shard is marked for deletion. * shard_datastore.go(deleteShard): Refactor the code that used to be in Deleteshard in its own method. Use `closeShard` instead of doing the cleanup ourselves.
Looks good to me. |
lgtm |
client.WriteSeries returns: Server returned (400): IO error: /opt/influxdb/shared/data/db/shard_db_v2/00190/MANIFEST-000006: No such file or directory
Background of the bug: Prior to this patch we actually tried writing points that were older than the retention period of the shard. This caused race condition when it came to writing points to a shard that's being dropped, which will happen frequently if the user is loading old data (by accident). This is demonstrated in the test in this commit.This bug was previously addressed in #985. It turns the fix for #985 wasn't enough. A user reported in #1078 that some shards are left behind and not deleted. It turns out that while the shard is being dropped more write requests could come in and end up on line `cluster/shard.go:195` which will cause the datastore to create a shard on disk that isn't tracked anywhere in the metadata. This shard will live forever and never get deleted. This fix address this issue by not writing old points in, but there are still some edge cases with the current implementation, at least not as bad as current master.
Background of the bug: Prior to this patch we actually tried writing points that were older than the retention period of the shard. This caused race condition when it came to writing points to a shard that's being dropped, which will happen frequently if the user is loading old data (by accident). This is demonstrated in the test in this commit.This bug was previously addressed in #985. It turns the fix for #985 wasn't enough. A user reported in #1078 that some shards are left behind and not deleted. It turns out that while the shard is being dropped more write requests could come in and end up on line `cluster/shard.go:195` which will cause the datastore to create a shard on disk that isn't tracked anywhere in the metadata. This shard will live forever and never get deleted. This fix address this issue by not writing old points in, but there are still some edge cases with the current implementation, at least not as bad as current master. Close #1078
using https://github.com/vimeo/whisper-to-influxdb/ which invokes
influxClient.WriteSeriesWithTimePrecision(toCommit, client.Second)
to write a series called "servers.dfvimeostatsd1.diskspace.root.inodes_free" with 60643 records of (time, sequence_number, value) format, to my graphite database, which i recreated from scratch yesterday after i upgraded.
got this response:
my influxdb is 0.8.3, has debug logging enabled,
but the log only contains messages matching
(GraphiteServer committing|Executing leader loop|Dumping the cluster config|Testing if we should|Checking for shards to drop)
, no other messages.I also checked dmesg, no errors there. ditto for /var/log/messages, nothing useful there.