-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: significant space remains in RaftLog after dropping table #26339
Comments
Ah, this has an easy explanation: the raft log queue only truncates if there is 64KiB of data to delete. The @benesch Merging of empty ranges would mitigate this issue. Or we could make the Raft log queue heuristic more aggressive after a PS An empty range with an empty Raft log consumes ~260 bytes on disk due to the various range local keys (e.g. |
Or we could have |
This problem will go away for empty ranges once we have the merge queue. It might be relevant for "regular" ranges though. I've wondered whether we should add a truncation criterion to the Raft log queue: if a range is quiescent for a while (or insert your favorite criterion for this range being worth trimming the fat for), we use a low value (3) here for cockroach/pkg/storage/raft_log_queue.go Lines 38 to 41 in e808caf
I've noticed that truncations never really manage to empty out the log. This is because when, say, index 100 is completely replicated, we're not going to include 100 in the truncation but will only delete up to and including 99. And in performing the truncation, we're adding an index to the Raft log ourselves, so that the resulting log will have three elements. That's why any value lower than @bdarnell is this a technical limitation of Raft ("log can't be completely empty") or could we in theory apply a truncation that results in a zero-length log? |
Perhaps there should be a different truncation criterion for ranges that are discovered by the scanner vs ranges that are preemptively added to the Raft log queue due to activity. The scanner could truncate quiescent ranges even if there isn't much to truncate, while preemptively added ranges would continue to use the existing logic where they wait for a significant chunk of log to accumulate before truncation. |
It's not a fundamental limitation (we start with a zero-length log), but as long as we put the index to truncate in the command itself, it's a little tricky to know what raft log position we're going to end up committing at. There's also a slightly higher risk of needing a raft snapshot if the truncation deletes itself since there's only one chance to broadcast the log entry before it gets deleted. It sounds like there are some off-by-one errors in there. We should easily be able to truncate the log down to a single entry instead of leaving 3 behind. |
Mostly. After |
We can now truncate raft logs locally, which means that any range, upon quiescing, can just clear out the remainder of the raft log. (cc #36262) |
In performing some basic experiments to verify space reclamation I discovered something interesting: significant space can remain in the RaftLog for ranges of a dropped table.
The setup: against a local 1 node cluster (created via
roachprod create local -n 1; roachprod start local
) I ran:[For fast testing purposes, at this point I stopped the cluster and packaged up the
~/local
directory into a 1.3GB tar].I then ran:
After the compactions completed, only 1GiB out of the initial 1.3GiB had been deleted. A manual
cockroach debug compact
of the storage directory did not recover any additional space. The output ofcockroach debug keys --sizes
shows 8.3MiB of live keys and 301MiB of live values. 97% of this data is fromRaftLog
keys/values.I'm not sure why these RaftLog entries are sticky around given that this is a 1 node cluster. Perhaps the sheer number of ranges prevented the Raft log queue from keeping up with truncations.
Cc @m-schneider as this might be related to the failure of the drop table roachtest.
The text was updated successfully, but these errors were encountered: