-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
server: use a common logic for system log gc #90789
Conversation
cc @rafiss |
a3e6360
to
327faf6
Compare
Prior to this patch, we had two problems: - the code to GC system.eventlog was not running on secondary tenants. This was the original motivation for the work in this commit. - there were 2 separate implementations of GC, one for system.eventlog/rangelog and another one for system.web_sessions. This was unfortunate because the first one is generic and was meant to be reused. Its implementation is just better all around: it is actually able to remove all rows when there are more than 1000 entries to delete at a time (this was a known limitation of the web_session cleanup code), and is optimized to avoid hitting tombstones when iterating. This commit improves the situation by using the common, better implementation for both cases, and enabling it for secondary tenants. We acknowledge that with the advent of row-level TTL in SQL, all this would be better served by a conversion of these tables to use row-level TTL. However, the fix here (run GC in secondary tenants) needs to be backported, so we are shy to rely on the row-level TTL which didn't work as well in previous versions. The remainder of this work is described in cockroachdb#89453. Release note (ops change): The cluster settings `server.web_session.purge.period` and `server.web_session.purge.max_deletions_per_cycle`, which were specific to the cleanup function for `system.web_sessions`, have been replaced by `server.log_gc.period` and `server.log_gc.max_deletions_per_cycle` which apply to the cleanup function for `system.eventlog`, `system.rangelog` and `system.web_sessions` equally. Release note (ops change): The cluster setting `server.web_session.auto_logout.timeout` has been removed. It had never been effective.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 6 of 11 files at r1, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aadityasondhi, @abarganier, @herkolategan, @knz, and @renatolabs)
docs/generated/settings/settings.html
line 81 at r2 (raw file):
<tr><td><code>server.consistency_check.max_rate</code></td><td>byte size</td><td><code>8.0 MiB</code></td><td>the rate limit (bytes/sec) to use for consistency checks; used in conjunction with server.consistency_check.interval to control the frequency of consistency checks. Note that setting this too high can negatively impact performance.</td></tr> <tr><td><code>server.eventlog.enabled</code></td><td>boolean</td><td><code>true</code></td><td>if set, logged notable events are also stored in the table system.eventlog</td></tr> <tr><td><code>server.eventlog.ttl</code></td><td>duration</td><td><code>2160h0m0s</code></td><td>if nonzero, entries in system.eventlog older than this duration are deleted every 10m0s. Should not be lowered below 24 hours.</td></tr>
why is the 24h directive no longer relevant?
pkg/settings/registry.go
line 151 at r2 (raw file):
// removed as of 22.2.1 "sql.ttl.default_range_concurrency": {}, "server.web_session.purge.period": {},
I'm a bit worried about doing this because we just had an incident that was resolved by a customer setting this to a higher number than default. I'd hate for them to upgrade and have their problem resurface.
What are customers with custom values for these supposed to do? Is there any way to copy the value from a deprecated cluster setting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aadityasondhi, @abarganier, @dhartunian, @herkolategan, and @renatolabs)
docs/generated/settings/settings.html
line 81 at r2 (raw file):
Previously, dhartunian (David Hartunian) wrote…
why is the 24h directive no longer relevant?
It was never relevant. The sentiment behind is is true of many other cluster settings: don't make things poll too frequently, or performance may suffer. Yet we don't put that advice everywhere.
pkg/settings/registry.go
line 151 at r2 (raw file):
we just had an incident
Do you have a link? I'd like to see the specifics. Maybe we'll find that the structure of the general-purpose GC code would avoid the original issue.
After all, nobody has complained to us about their system.eventlog
GC process in a very long while.
Is there any way to copy the value from a deprecated cluster setting?
Yes, there's a way. We've done it before. But I'd rather avoid it if possible - the mechanism we have to do that is currently deprecated and we haven't yet designed the replacement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aadityasondhi, @abarganier, @herkolategan, @knz, and @renatolabs)
pkg/settings/registry.go
line 151 at r2 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
we just had an incident
Do you have a link? I'd like to see the specifics. Maybe we'll find that the structure of the general-purpose GC code would avoid the original issue.
After all, nobody has complained to us about their
system.eventlog
GC process in a very long while.Is there any way to copy the value from a deprecated cluster setting?
Yes, there's a way. We've done it before. But I'd rather avoid it if possible - the mechanism we have to do that is currently deprecated and we haven't yet designed the replacement.
Here's the issue https://github.com/cockroachlabs/support/issues/1812
Seems like we could support the custom LIMIT value for web_sessions
, but it's more difficult to support a different tempo just for the web_sessions
table.
What do you think is best here? A customer can always have a situation where they insert faster than our defaults are culling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aadityasondhi, @abarganier, @dhartunian, @herkolategan, and @renatolabs)
pkg/settings/registry.go
line 151 at r2 (raw file):
Previously, dhartunian (David Hartunian) wrote…
Here's the issue https://github.com/cockroachlabs/support/issues/1812
Seems like we could support the custom LIMIT value for
web_sessions
, but it's more difficult to support a different tempo just for theweb_sessions
table.
What do you think is best here? A customer can always have a situation where they insert faster than our defaults are culling.
I have investigated that support issue and I'm pretty confident that the code in this PR is fully immune to the original problem.
Recall, that original problem was a combination of 3 factors:
- the query was using ORDER BY random. This was the source of the CPU disruption.
- the GC was only processing 10 rows at a time.
- the GC was only running once per use of the login endpoint or per purge period.
The last two points together were the reason why the GC was never able to "catch up"
The code in this PR is significantly different:
- it does not use ORDER BY - it uses the index on the timestamp column instead.
- it loops! it processes max 1000 rows per batch, but it loops to process as many batch as necessary to advance the GC cursor to the most recent entries.
This last aspect in particular makes me believe we wouldn't even need to make the purge period / max deletions per cycle configurable. In fact, prior to this PR, they were not configurable for the GC of system.eventlog, which on large cluster also get quite a a lot of traffic.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @aadityasondhi, @abarganier, @herkolategan, @knz, and @renatolabs)
pkg/settings/registry.go
line 151 at r2 (raw file):
Previously, knz (Raphael 'kena' Poss) wrote…
I have investigated that support issue and I'm pretty confident that the code in this PR is fully immune to the original problem.
Recall, that original problem was a combination of 3 factors:
- the query was using ORDER BY random. This was the source of the CPU disruption.
- the GC was only processing 10 rows at a time.
- the GC was only running once per use of the login endpoint or per purge period.
The last two points together were the reason why the GC was never able to "catch up"The code in this PR is significantly different:
- it does not use ORDER BY - it uses the index on the timestamp column instead.
- it loops! it processes max 1000 rows per batch, but it loops to process as many batch as necessary to advance the GC cursor to the most recent entries.
This last aspect in particular makes me believe we wouldn't even need to make the purge period / max deletions per cycle configurable. In fact, prior to this PR, they were not configurable for the GC of system.eventlog, which on large cluster also get quite a a lot of traffic.
What do you think?
SGTM. Thanks for the clarification. I forgot that the ORDER BY
was likely the culprit of elevated CPU.
Cheers! bors r=dhartunian |
Build succeeded: |
Encountered an error creating backports. Some common things that can go wrong:
You might need to create your backport manually using the backport tool. error creating merge commit from c7ce656 to blathers/backport-release-22.1-90789: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict [] you may need to manually resolve merge conflicts with the backport tool. Backport to branch 22.1.x failed. See errors above. error creating merge commit from c7ce656 to blathers/backport-release-22.2-90789: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict [] you may need to manually resolve merge conflicts with the backport tool. Backport to branch 22.2.x failed. See errors above. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
Fixes #90521.
Subsumes #90523.
Subsumes #88766.
Informs #89453.
Informs #90790.
Prior to this patch, we had two problems:
the code to GC system.eventlog was not running on secondary tenants.
This was the original motivation for the work in this commit.
there were 2 separate implementations of GC, one for system.eventlog/rangelog and another one for system.web_sessions.
This was unfortunate because the first one is generic and was meant to be reused. Its implementation is just better all around: it is actually able to remove all rows when there are more than 1000 entries to delete at a time (this was a known limitation of the web_session cleanup code), and is optimized to avoid hitting tombstones when iterating.
This commit improves the situation by using the common, better implementation for both cases.
We acknowledge that with the advent of row-level TTL in SQL, all this would be better served by a conversion of these tables to use row-level TTL. However, the fix here (run GC in secondary tenants) needs to be backported, so we are shy to rely on the row-level TTL which didn't work as well in previous versions. The remainder of this work is described in #89453.
Release note (ops change): The cluster settings
server.web_session.purge.period
andserver.web_session.purge.max_deletions_per_cycle
, which were specific to the cleanup function forsystem.web_sessions
, have been replaced byserver.log_gc.period
andserver.log_gc.max_deletions_per_cycle
which apply to the cleanup function forsystem.eventlog
,system.rangelog
andsystem.web_sessions
equally.Release note (ops change): The cluster setting
server.web_session.auto_logout.timeout
has been removed. It had never been effective.