Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DROP SERIES is terribly inefficient #592

Closed
spantaleev opened this issue May 29, 2014 · 5 comments
Closed

DROP SERIES is terribly inefficient #592

spantaleev opened this issue May 29, 2014 · 5 comments

Comments

@spantaleev
Copy link

Dropping many (small) time-series results in many compact operations, which cause a lot of I/O.

My last tests showed that dropping 24 series (a tiny part of my whole 550MB database), takes about an hour to finish and results in 33 GB of I/O.

Does it really need to run a full compact operation on every drop series query? Maybe it should do it just once, for each "drop series batch"?

Looking at the code, it also seems to run a compaction for the full range, which may not be ideal, but I don't know that much about LevelDB or how InfluxDB uses it to comment on that.

From what I last heard, @jvshahid is working on improving the situation. This is just a place to track the issue.

@pauldix
Copy link
Member

pauldix commented May 30, 2014

I suppose we could remove the compactions from delete and drop completely. Then just have a separate command to force a compaction. Then you can do whatever you want and do the compactions in one go. Maybe that'll work better?

@freeeve
Copy link

freeeve commented May 30, 2014

Does compact run any other time or would it be entirely manual? (such as startup, or something?)

@spantaleev
Copy link
Author

Does skipping compaction make the rest of the "drop series" work not reclaim much space at all?

Running compaction manually sounds like a good workaround to the problem, but.. It'd be better if InfluxDB users don't have to worry about details of the underlying storage engine that they don't (need to) understand.

Ideally, InfluxDB can put off compaction for a while (not do it on every "drop series" call) and estimate when is a good time to do it. Probably depending on the number of points deleted since the last compaction operation (but still not too often - deleting many large series within a short amount of time should only make it trigger once).

@pauldix
Copy link
Member

pauldix commented May 31, 2014

yeah, that's a good idea. We could just trigger a compaction some amount of
time after a drop or delete occurs. With successive drops and deletes just
incrementing the timer a little bit.

On Fri, May 30, 2014 at 4:19 PM, Slavi Pantaleev [email protected]
wrote:

Does skipping compaction make the rest of the "drop series" work not
reclaim much space at all?

Running compaction manually sounds like a good workaround to the problem,
but.. It'd be better if InfluxDB users don't have to worry about details of
the underlying storage engine that they don't (need to) understand.

Ideally, InfluxDB can put off compaction for a while (not do it on every
"drop series" call) and estimate when is a good time to do it. Probably
depending on the number of points deleted since the last compaction
operation (but still not too often - deleting many large series within a
short amount of time should only make it trigger once).


Reply to this email directly or view it on GitHub
#592 (comment).

@pauldix
Copy link
Member

pauldix commented Jul 14, 2014

Marking this one as closed for now. In v0.8.0 drop series will now be fast to the end user. It simply removes the series metadata and returns a response. It then backgrounds the deletion of the series data without calling compactions.

If it's series data that you're regularly going to be dropping, consider using the shard spaces retention policy feature: http://influxdb.com/docs/v0.8/advanced_topics/sharding_and_storage.html

@pauldix pauldix closed this as completed Jul 14, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants