Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server: use a common logic for system log gc #90789

Merged
merged 1 commit into from
Nov 17, 2022
Merged

Conversation

knz
Copy link
Contributor

@knz knz commented Oct 27, 2022

Fixes #90521.
Subsumes #90523.
Subsumes #88766.
Informs #89453.
Informs #90790.

Prior to this patch, we had two problems:

  • the code to GC system.eventlog was not running on secondary tenants.

    This was the original motivation for the work in this commit.

  • there were 2 separate implementations of GC, one for system.eventlog/rangelog and another one for system.web_sessions.

    This was unfortunate because the first one is generic and was meant to be reused. Its implementation is just better all around: it is actually able to remove all rows when there are more than 1000 entries to delete at a time (this was a known limitation of the web_session cleanup code), and is optimized to avoid hitting tombstones when iterating.

This commit improves the situation by using the common, better implementation for both cases.

We acknowledge that with the advent of row-level TTL in SQL, all this would be better served by a conversion of these tables to use row-level TTL. However, the fix here (run GC in secondary tenants) needs to be backported, so we are shy to rely on the row-level TTL which didn't work as well in previous versions. The remainder of this work is described in #89453.

Release note (ops change): The cluster settings
server.web_session.purge.period and
server.web_session.purge.max_deletions_per_cycle, which were specific to the cleanup function for system.web_sessions, have been replaced by server.log_gc.period and
server.log_gc.max_deletions_per_cycle which apply to the cleanup function for system.eventlog, system.rangelog and system.web_sessions equally.

Release note (ops change): The cluster setting server.web_session.auto_logout.timeout has been removed. It had never been effective.

@knz knz requested review from a team as code owners October 27, 2022 18:45
@knz knz requested a review from a team October 27, 2022 18:45
@knz knz requested review from a team as code owners October 27, 2022 18:45
@knz knz requested review from herkolategan and renatolabs and removed request for a team October 27, 2022 18:45
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@knz knz changed the title server: use a commong logic for system log gc server: use a common logic for system log gc Oct 27, 2022
@knz
Copy link
Contributor Author

knz commented Oct 27, 2022

cc @rafiss

@knz knz force-pushed the 20221027-gc branch 3 times, most recently from a3e6360 to 327faf6 Compare October 27, 2022 19:12
Prior to this patch, we had two problems:

- the code to GC system.eventlog was not running on secondary tenants.

  This was the original motivation for the work in this commit.

- there were 2 separate implementations of GC, one for
  system.eventlog/rangelog and another one for system.web_sessions.

  This was unfortunate because the first one is generic and was meant
  to be reused. Its implementation is just better all around: it is
  actually able to remove all rows when there are more than 1000
  entries to delete at a time (this was a known limitation of the
  web_session cleanup code), and is optimized to avoid hitting
  tombstones when iterating.

This commit improves the situation by using the common, better
implementation for both cases, and enabling it for secondary tenants.

We acknowledge that with the advent of row-level TTL in SQL, all
this would be better served by a conversion of these tables to use
row-level TTL. However, the fix here (run GC in secondary tenants)
needs to be backported, so we are shy to rely on the row-level TTL
which didn't work as well in previous versions. The remainder of this
work is described in cockroachdb#89453.

Release note (ops change): The cluster settings
`server.web_session.purge.period` and
`server.web_session.purge.max_deletions_per_cycle`, which were
specific to the cleanup function for `system.web_sessions`, have been
replaced by `server.log_gc.period` and
`server.log_gc.max_deletions_per_cycle` which apply to the cleanup
function for `system.eventlog`, `system.rangelog` and
`system.web_sessions` equally.

Release note (ops change): The cluster setting
`server.web_session.auto_logout.timeout` has been removed. It had
never been effective.
Copy link
Collaborator

@dhartunian dhartunian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 6 of 11 files at r1, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aadityasondhi, @abarganier, @herkolategan, @knz, and @renatolabs)


docs/generated/settings/settings.html line 81 at r2 (raw file):

<tr><td><code>server.consistency_check.max_rate</code></td><td>byte size</td><td><code>8.0 MiB</code></td><td>the rate limit (bytes/sec) to use for consistency checks; used in conjunction with server.consistency_check.interval to control the frequency of consistency checks. Note that setting this too high can negatively impact performance.</td></tr>
<tr><td><code>server.eventlog.enabled</code></td><td>boolean</td><td><code>true</code></td><td>if set, logged notable events are also stored in the table system.eventlog</td></tr>
<tr><td><code>server.eventlog.ttl</code></td><td>duration</td><td><code>2160h0m0s</code></td><td>if nonzero, entries in system.eventlog older than this duration are deleted every 10m0s. Should not be lowered below 24 hours.</td></tr>

why is the 24h directive no longer relevant?


pkg/settings/registry.go line 151 at r2 (raw file):

	// removed as of 22.2.1
	"sql.ttl.default_range_concurrency":                {},
	"server.web_session.purge.period":                  {},

I'm a bit worried about doing this because we just had an incident that was resolved by a customer setting this to a higher number than default. I'd hate for them to upgrade and have their problem resurface.

What are customers with custom values for these supposed to do? Is there any way to copy the value from a deprecated cluster setting?

Copy link
Contributor Author

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aadityasondhi, @abarganier, @dhartunian, @herkolategan, and @renatolabs)


docs/generated/settings/settings.html line 81 at r2 (raw file):

Previously, dhartunian (David Hartunian) wrote…

why is the 24h directive no longer relevant?

It was never relevant. The sentiment behind is is true of many other cluster settings: don't make things poll too frequently, or performance may suffer. Yet we don't put that advice everywhere.


pkg/settings/registry.go line 151 at r2 (raw file):

we just had an incident

Do you have a link? I'd like to see the specifics. Maybe we'll find that the structure of the general-purpose GC code would avoid the original issue.

After all, nobody has complained to us about their system.eventlog GC process in a very long while.

Is there any way to copy the value from a deprecated cluster setting?

Yes, there's a way. We've done it before. But I'd rather avoid it if possible - the mechanism we have to do that is currently deprecated and we haven't yet designed the replacement.

Copy link
Collaborator

@dhartunian dhartunian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aadityasondhi, @abarganier, @herkolategan, @knz, and @renatolabs)


pkg/settings/registry.go line 151 at r2 (raw file):

Previously, knz (Raphael 'kena' Poss) wrote…

we just had an incident

Do you have a link? I'd like to see the specifics. Maybe we'll find that the structure of the general-purpose GC code would avoid the original issue.

After all, nobody has complained to us about their system.eventlog GC process in a very long while.

Is there any way to copy the value from a deprecated cluster setting?

Yes, there's a way. We've done it before. But I'd rather avoid it if possible - the mechanism we have to do that is currently deprecated and we haven't yet designed the replacement.

Here's the issue https://github.com/cockroachlabs/support/issues/1812

Seems like we could support the custom LIMIT value for web_sessions, but it's more difficult to support a different tempo just for the web_sessions table.
What do you think is best here? A customer can always have a situation where they insert faster than our defaults are culling.

Copy link
Contributor Author

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aadityasondhi, @abarganier, @dhartunian, @herkolategan, and @renatolabs)


pkg/settings/registry.go line 151 at r2 (raw file):

Previously, dhartunian (David Hartunian) wrote…

Here's the issue https://github.com/cockroachlabs/support/issues/1812

Seems like we could support the custom LIMIT value for web_sessions, but it's more difficult to support a different tempo just for the web_sessions table.
What do you think is best here? A customer can always have a situation where they insert faster than our defaults are culling.

I have investigated that support issue and I'm pretty confident that the code in this PR is fully immune to the original problem.

Recall, that original problem was a combination of 3 factors:

  • the query was using ORDER BY random. This was the source of the CPU disruption.
  • the GC was only processing 10 rows at a time.
  • the GC was only running once per use of the login endpoint or per purge period.
    The last two points together were the reason why the GC was never able to "catch up"

The code in this PR is significantly different:

  • it does not use ORDER BY - it uses the index on the timestamp column instead.
  • it loops! it processes max 1000 rows per batch, but it loops to process as many batch as necessary to advance the GC cursor to the most recent entries.

This last aspect in particular makes me believe we wouldn't even need to make the purge period / max deletions per cycle configurable. In fact, prior to this PR, they were not configurable for the GC of system.eventlog, which on large cluster also get quite a a lot of traffic.

What do you think?

Copy link
Collaborator

@dhartunian dhartunian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @aadityasondhi, @abarganier, @herkolategan, @knz, and @renatolabs)


pkg/settings/registry.go line 151 at r2 (raw file):

Previously, knz (Raphael 'kena' Poss) wrote…

I have investigated that support issue and I'm pretty confident that the code in this PR is fully immune to the original problem.

Recall, that original problem was a combination of 3 factors:

  • the query was using ORDER BY random. This was the source of the CPU disruption.
  • the GC was only processing 10 rows at a time.
  • the GC was only running once per use of the login endpoint or per purge period.
    The last two points together were the reason why the GC was never able to "catch up"

The code in this PR is significantly different:

  • it does not use ORDER BY - it uses the index on the timestamp column instead.
  • it loops! it processes max 1000 rows per batch, but it loops to process as many batch as necessary to advance the GC cursor to the most recent entries.

This last aspect in particular makes me believe we wouldn't even need to make the purge period / max deletions per cycle configurable. In fact, prior to this PR, they were not configurable for the GC of system.eventlog, which on large cluster also get quite a a lot of traffic.

What do you think?

SGTM. Thanks for the clarification. I forgot that the ORDER BY was likely the culprit of elevated CPU.

@knz
Copy link
Contributor Author

knz commented Nov 17, 2022

Cheers!

bors r=dhartunian

@craig
Copy link
Contributor

craig bot commented Nov 17, 2022

Build succeeded:

@craig craig bot merged commit e8d0366 into cockroachdb:master Nov 17, 2022
@blathers-crl
Copy link

blathers-crl bot commented Nov 17, 2022

Encountered an error creating backports. Some common things that can go wrong:

  1. The backport branch might have already existed.
  2. There was a merge conflict.
  3. The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.


error creating merge commit from c7ce656 to blathers/backport-release-22.1-90789: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 22.1.x failed. See errors above.


error creating merge commit from c7ce656 to blathers/backport-release-22.2-90789: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 22.2.x failed. See errors above.


🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

server: old data in system.eventlog is not deleted for secondary tenants
3 participants