Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: add bulkio.column_backfill.update_chunk_size_threshold_bytes #83544

Merged
merged 1 commit into from
Jul 1, 2022

Conversation

postamar
Copy link
Contributor

@postamar postamar commented Jun 28, 2022

This commit adds this cluster setting, defaulting to 10 MiB, which
complements the bulkio.column_backfill.batch_size setting to limit the
size of the column backfiller update batches.

Fixes #83199.

Release note (bug fix): fixes bug where ADD/DROP COLUMN with the legacy
schema changer could fail on tables with large rows due to exceeding the
raft command max size.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@postamar postamar force-pushed the fix-83199 branch 2 times, most recently from baccb3a to 103452c Compare June 29, 2022 10:59
// an update batch is run at once when adding or removing columns.
var columnBackfillUpdateRunThresholdBytes = settings.RegisterIntSetting(
settings.TenantWritable,
"bulkio.column_backfill.update_run_threshold_bytes",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been struggling to name this, happy to take any suggestion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is hard... but I'm taking a shot...columnBackfillUpdateBatchThreshold ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds like an improvement. I think we quite like this bytes suffix for these kinds of values, though, I see them a lot, it must be some kind of style convention.

settings.TenantWritable,
"bulkio.column_backfill.update_run_threshold_bytes",
"the batch size in bytes above which an update is immediately run when adding/removing columns",
10<<20, /* 10 MiB */
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took this default value because that's also the default max size when scanning.

//
if updateRunThresholdBytes > 0 && b.ApproximateMutationBytes() >= int(updateRunThresholdBytes) {
break
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found that a threshold was more convenient than an absolute max size, both because it was easier to implement and because then I wouldn't have to worry about batches with zero puts, in the case where one put is just that big. What do we do then? not update the row? It didn't make sense, so I used a threshold.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good to me

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, so, in addition to the old 64 MiB, you put this inside the for loop to say that if the to-be-put batch exceeds the threshold then break and flush, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost, it's in addition to the number-of-rows limit (default 200). The 64 MiB limit we've changed in the customer support issues doesn't come into play here, that's a hard limit on the size of raft commands, which if exceeded causes the schema change to be blocked. This new setting I'm adding helps ensure we never reach that hard limit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, my mistake, I meant the 200-row limit. Thanks for your explanation!

10

statement ok
ROLLBACK;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This subtest only exercises in-txn column backfills, I couldn't figure out how to retrieve the traces in the other case. Those traces are emitted, according to the debugger at least, but I couldn't collect them. Anyway I figured that this would be enough.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I like the tests you added as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also have a "correctness" test? Namely, if set this threshold to be small and add a column, we expect it to use >1 batch and succeed by being able to observe the newly added column with the correct default value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure it's that valuable, we already arbitrarily break down batches in 200-row increments, but it also doesn't cost anything, so I'll add a SELECT.

205 test_serial_b_seq PUBLIC 204
204 test_serial PUBLIC NULL
207 test_serial_b_seq PUBLIC 206
206 test_serial PUBLIC NULL
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These and the others all get bumped because my subtest non-transactionally increments the descriptor ID counter twice.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why don't we put your subtest at the end of the file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a sane world I would, however, doing so often causes annoying merge conflicts when backporting.

false, /*traceKV*/
updateRunThresholdBytes,
true, /*alsoCommit*/
cb.flowCtx.TraceKV,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I figure I'd leave this change in, as the debugger assures me that this flag is set when SET tracing = on,kv.

@postamar postamar requested review from ajwerner, a team and rhu713 June 29, 2022 11:07
@postamar postamar marked this pull request as ready for review June 29, 2022 11:07
@postamar postamar requested a review from a team June 29, 2022 11:07
@postamar postamar requested a review from a team as a code owner June 29, 2022 11:07
Copy link
Contributor

@chengxiong-ruan chengxiong-ruan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

// an update batch is run at once when adding or removing columns.
var columnBackfillUpdateRunThresholdBytes = settings.RegisterIntSetting(
settings.TenantWritable,
"bulkio.column_backfill.update_run_threshold_bytes",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is hard... but I'm taking a shot...columnBackfillUpdateBatchThreshold ?

//
if updateRunThresholdBytes > 0 && b.ApproximateMutationBytes() >= int(updateRunThresholdBytes) {
break
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good to me

10

statement ok
ROLLBACK;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I like the tests you added as well.

Copy link
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 5 of 7 files at r1.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @postamar and @rhu713)


pkg/sql/backfill.go line 99 at r1 (raw file):

Previously, postamar (Marius Posta) wrote…

That sounds like an improvement. I think we quite like this bytes suffix for these kinds of values, though, I see them a lot, it must be some kind of style convention.

I prefer Batch to Run or Update, and I like the bytes suffix. The other term used in the code is Chunk, which I also like, maybe the most: columnBackfillChunkSizeThresholdBytes?

There was an incident once involving a bytes threshold and a customer assuming megabytes. It was bad.

This commit adds this cluster setting, defaulting to 10 MiB, which
complements the bulkio.column_backfill.batch_size setting to limit the
size of the column backfiller update batches.

Fixes cockroachdb#83199.

Release note (bug fix): fixes bug where ADD/DROP COLUMN with the legacy
schema changer could fail on tables with large rows due to exceeding the
raft command max size.
@postamar postamar changed the title sql: add bulkio.column_backfill.update_run_threshold_bytes cluster se… sql: add bulkio.column_backfill.update_chunk_size_threshold_bytes Jun 30, 2022
@postamar
Copy link
Contributor Author

bors r+

@postamar
Copy link
Contributor Author

Thanks for the reviews

@craig
Copy link
Contributor

craig bot commented Jun 30, 2022

Build failed (retrying...):

@craig
Copy link
Contributor

craig bot commented Jul 1, 2022

Build succeeded:

@craig craig bot merged commit e2767f9 into cockroachdb:master Jul 1, 2022
@blathers-crl
Copy link

blathers-crl bot commented Jul 1, 2022

Encountered an error creating backports. Some common things that can go wrong:

  1. The backport branch might have already existed.
  2. There was a merge conflict.
  3. The backport branch contained merge commits.

You might need to create your backport manually using the backport tool.


error creating merge commit from 119dc67 to blathers/backport-release-21.2-83544: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 21.2.x failed. See errors above.


error creating merge commit from 119dc67 to blathers/backport-release-22.1-83544: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict []

you may need to manually resolve merge conflicts with the backport tool.

Backport to branch 22.1.x failed. See errors above.


🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

@postamar
Copy link
Contributor Author

postamar commented Jul 4, 2022

We can't backport this PR as it modifies the backfiller spec protobuf.

@postamar postamar deleted the fix-83199 branch July 4, 2022 10:43
@postamar
Copy link
Contributor Author

postamar commented Jul 5, 2022

According to @ajwerner it's fine to backport. I'll do so shortly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sql: Limit batchsize in column backfill by bytes not by rows
5 participants