-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DROP SERIES is terribly inefficient #592
Comments
I suppose we could remove the compactions from delete and drop completely. Then just have a separate command to force a compaction. Then you can do whatever you want and do the compactions in one go. Maybe that'll work better? |
Does compact run any other time or would it be entirely manual? (such as startup, or something?) |
Does skipping compaction make the rest of the "drop series" work not reclaim much space at all? Running compaction manually sounds like a good workaround to the problem, but.. It'd be better if InfluxDB users don't have to worry about details of the underlying storage engine that they don't (need to) understand. Ideally, InfluxDB can put off compaction for a while (not do it on every "drop series" call) and estimate when is a good time to do it. Probably depending on the number of points deleted since the last compaction operation (but still not too often - deleting many large series within a short amount of time should only make it trigger once). |
yeah, that's a good idea. We could just trigger a compaction some amount of On Fri, May 30, 2014 at 4:19 PM, Slavi Pantaleev [email protected]
|
Marking this one as closed for now. In v0.8.0 drop series will now be fast to the end user. It simply removes the series metadata and returns a response. It then backgrounds the deletion of the series data without calling compactions. If it's series data that you're regularly going to be dropping, consider using the shard spaces retention policy feature: http://influxdb.com/docs/v0.8/advanced_topics/sharding_and_storage.html |
Dropping many (small) time-series results in many compact operations, which cause a lot of I/O.
My last tests showed that dropping 24 series (a tiny part of my whole 550MB database), takes about an hour to finish and results in 33 GB of I/O.
Does it really need to run a full compact operation on every drop series query? Maybe it should do it just once, for each "drop series batch"?
Looking at the code, it also seems to run a compaction for the full range, which may not be ideal, but I don't know that much about LevelDB or how InfluxDB uses it to comment on that.
From what I last heard, @jvshahid is working on improving the situation. This is just a place to track the issue.
The text was updated successfully, but these errors were encountered: