-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dropping measurement used several GB disk space #8558
Comments
@phemmer thanks for the report. Can you confirm a couple of things:
Does the space change over time as future compactions complete? Have you "left" the system for a while to see if anything changes? Regardless of if it does come down over time an initial 23% increase isn't really acceptible. |
Yes
I'll have to reproduce again to see. Will try to do later tonight.
During the removal I believe it did, but I can't recall for sure. However I dropped several measurements, and each time the behavior was the same, so I should be able to reproduce fairly easily to get these answers.
I left it for a few minutes and it did not change. However I bounced InfluxDB and after it started up, the usage came down.
Actually to look at it a different way, that 23% was of the disk's total capacity. Since the original data size was 16gb, and went to 23gb, that's a 44% increase. |
@phemmer Deletes of measurements and series are actually pretty expensive in TSM (and other log structured storage models). A delete is actually a write of tombstone files because TSM files are immutable. When a compaction occurs, the TSM files are rewritten to remove the deleted series. Tombstone files require space and there is additional storage required during the compactions. Deletes of whole shards are cheap though as they delete whole files at once and do not require compactions or writes. The reason your drop measurement incurred more disk usage is likely that that the measurement expanded out to many series and you likely have many TSM files. There is a tombstone per TSM file and these can get large if they expand out to many series keys. There are definitely some optimizations in the tombstone file format that could be done to reduce disk usage though. As a workaround when disk space is low, I'd suggest dropping shards or deleting small sets of series (and letting them compact) instead of dropping whole measurements to free up space. |
#8568 should reduce the size of tombstone files significantly which should resolve this issue. |
Bug report
System info: [Include InfluxDB version, operating system name, and other relevant details]
Version: e3918e0
OS: Linux
Steps to reproduce:
drop measurement foo
Expected behavior: [What you expected to happen]
Reduce disk usage
Actual behavior: [What actually happened]
Increased disk usage
Additional info: [Include gist of relevant config, logs, etc.]
Before running
drop measurement
:After running
drop measurement
:In this specific case I was trying to remove unneeded data to address performance issues, not disk usage issues. But if I were low on disk space, and trying to remove data to get some back, this would be a big problem. I would not expect dropping a measurement to result in using more disk space. Especially an additional 43% of the original size. If I were low on disk space, dropping a measurement in attempt to get some space back could end up causing me to run out of space entirely.
The text was updated successfully, but these errors were encountered: