-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
opt: fix race caused by shared table annotations #35027
Conversation
b700773
to
ddacd90
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 7 of 7 files at r1, 5 of 5 files at r2, 6 of 6 files at r3.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @andy-kimball and @RaduBerinde)
pkg/sql/opt/norm/factory.go, line 171 at r3 (raw file):
// CopyAndRplaceDefault. Note that if it's constructing a non-leaf node, that // node's inputs //
This comment got messed up
pkg/sql/opt/testutils/build.go, line 1 at r3 (raw file):
// Copyright 2018 The Cockroach Authors.
[nit] 2019
ddacd90
to
7c27cda
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR!
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @andy-kimball and @rytaft)
pkg/sql/opt/norm/factory.go, line 171 at r3 (raw file):
Previously, rytaft wrote…
This comment got messed up
Done.
pkg/sql/opt/testutils/build.go, line 1 at r3 (raw file):
Previously, rytaft wrote…
[nit] 2019
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 12 files at r4, 4 of 5 files at r5, 6 of 6 files at r6.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andy-kimball and @RaduBerinde)
pkg/sql/opt/norm/factory.go, line 172 at r6 (raw file):
// children returned by recursive calls to the replace callback. Note that if a // non-leaf replacement node is constructed, its inputs must be copied using // CopyAndReplaceDefault
[nit] missing period.
7c27cda
to
3f0aefb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @RaduBerinde and @rytaft)
pkg/sql/opt/table_meta.go, line 205 at r7 (raw file):
// // See the TableAnnID comment for more details and a usage example. func NewTableAnnID(copyFn CopyTableAnnFn) TableAnnID {
How necessary is it to do the copying of annotations rather than allowing them to be re-derived? I bet most of the cost of creating stats is creating the stats objects, which has to happen even in the copy case. And the FD set is probably fairly quick to create as well.
I guess my default would be to just re-derive, unless one of our benchmarks showed a tempting improvement as a result of doing the copying. Can you run our bench_test.go
tests with and without copying enabled to see if it makes a diff?
pkg/sql/opt/norm/factory.go, line 169 at r7 (raw file):
// in the "from" subtree, and has the choice of constructing an arbitrary // replacement node, or delegating to the default behavior by calling // CopyAndRelaceDefault, which constructs a copy of the source operator using
NIT: sp: CopyAndReplace
pkg/sql/opt/table_meta.go, line 205 at r7 (raw file): Previously, andy-kimball (Andy Kimball) wrote…
I will run some benchmarks. What's the downside of keeping the FD annotation though? (we don't need to copy it so it should avoid an allocation). |
Test that reproduces cockroachdb#34904. Writing the test exposed a problem in the `CopyAndReplace` API: there was no way to recursively copy a subtree when creating an internal node. The API was reverted to the first iteration Andy had in his PR. Release note: None
The TableMeta field is problematic: it needs to be fixed up when copying Metadata; and since it points into a slice that we append to, it may or may not be aliased with the corresponding entry in `tables`. Avoiding these complication by just storing the TabeID. The `md` backpointer is also removed and QualifiedAlias is moved to Metadata. Release note: None
3f0aefb
to
137fa5f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andy-kimball and @rytaft)
pkg/sql/opt/table_meta.go, line 205 at r7 (raw file):
Previously, RaduBerinde wrote…
I will run some benchmarks. What's the downside of keeping the FD annotation though? (we don't need to copy it so it should avoid an allocation).
Great intuition - it looks like it's actually slower to copy it (note that we're still reusing the FD annotation in both cases):
Phases/kv-read/Parse 2.76ns ± 0% 2.78ns ± 1% ~ (p=0.171 n=4+4)
Phases/kv-read/OptBuild 7.04µs ± 3% 7.54µs ± 0% +7.17% (p=0.029 n=4+4)
Phases/kv-read/Normalize 7.05µs ± 2% 7.64µs ± 1% +8.39% (p=0.029 n=4+4)
Phases/kv-read/Explore 11.9µs ± 1% 12.6µs ± 1% +5.52% (p=0.029 n=4+4)
Phases/kv-read/ExecBuild 12.9µs ± 4% 13.8µs ± 2% +6.75% (p=0.029 n=4+4)
Phases/kv-read-no-prep/Parse 11.5µs ± 2% 11.2µs ± 1% -2.85% (p=0.029 n=4+4)
Phases/kv-read-no-prep/OptBuild 31.4µs ± 2% 31.9µs ± 2% ~ (p=0.200 n=4+4)
Phases/kv-read-no-prep/Normalize 31.0µs ± 2% 31.9µs ± 1% +2.90% (p=0.029 n=4+4)
Phases/kv-read-no-prep/Explore 39.3µs ± 3% 40.2µs ± 1% ~ (p=0.114 n=4+4)
Phases/kv-read-no-prep/ExecBuild 40.6µs ± 3% 40.9µs ± 2% ~ (p=0.886 n=4+4)
Phases/kv-read-const/Parse 2.78ns ± 0% 2.79ns ± 1% ~ (p=0.657 n=4+4)
Phases/kv-read-const/OptBuild 143ns ± 1% 144ns ± 0% ~ (p=0.657 n=4+4)
Phases/kv-read-const/Normalize 137ns ± 1% 140ns ± 3% ~ (p=0.086 n=4+4)
Phases/kv-read-const/Explore 149ns ± 1% 148ns ± 3% ~ (p=0.257 n=4+4)
Phases/kv-read-const/ExecBuild 654ns ± 1% 665ns ± 3% ~ (p=0.371 n=4+4)
Phases/tpcc-new-order/Parse 2.79ns ± 1% 2.78ns ± 0% ~ (p=0.971 n=4+4)
Phases/tpcc-new-order/OptBuild 12.9µs ± 1% 13.8µs ± 2% +7.30% (p=0.029 n=4+4)
Phases/tpcc-new-order/Normalize 12.9µs ± 0% 13.8µs ± 1% +6.73% (p=0.029 n=4+4)
Phases/tpcc-new-order/Explore 36.8µs ± 1% 38.8µs ± 2% +5.49% (p=0.029 n=4+4)
Phases/tpcc-new-order/ExecBuild 39.5µs ± 2% 42.2µs ± 1% +6.85% (p=0.029 n=4+4)
Phases/tpcc-delivery/Parse 2.77ns ± 0% 2.80ns ± 1% ~ (p=0.057 n=4+4)
Phases/tpcc-delivery/OptBuild 11.7µs ± 1% 12.5µs ± 2% +7.02% (p=0.029 n=4+4)
Phases/tpcc-delivery/Normalize 11.8µs ± 1% 12.9µs ± 2% +9.21% (p=0.029 n=4+4)
Phases/tpcc-delivery/Explore 24.7µs ± 2% 25.6µs ± 1% +3.66% (p=0.029 n=4+4)
Phases/tpcc-delivery/ExecBuild 26.8µs ± 1% 28.1µs ± 3% +4.74% (p=0.029 n=4+4)
Phases/tpcc-stock-level/Parse 2.77ns ± 0% 2.79ns ± 1% ~ (p=0.257 n=4+4)
Phases/tpcc-stock-level/OptBuild 49.1µs ± 0% 51.8µs ± 2% +5.57% (p=0.029 n=4+4)
Phases/tpcc-stock-level/Normalize 48.1µs ± 1% 52.9µs ± 9% +9.83% (p=0.029 n=4+4)
Phases/tpcc-stock-level/Explore 124µs ± 3% 138µs ± 5% +11.73% (p=0.029 n=4+4)
Phases/tpcc-stock-level/ExecBuild 136µs ± 1% 141µs ± 5% +3.92% (p=0.029 n=4+4)
Exec/kv-read/ExecPlan 143µs ± 3% 156µs ± 2% +8.94% (p=0.029 n=4+4)
Exec/kv-read/ExecOpt 141µs ± 1% 145µs ± 6% ~ (p=0.343 n=4+4)
Exec/kv-read-no-prep/ExecPlan 230µs ± 0% 227µs ± 3% ~ (p=0.343 n=4+4)
Exec/kv-read-no-prep/ExecOpt 223µs ± 1% 218µs ± 3% ~ (p=0.200 n=4+4)
Exec/kv-read-const/ExecPlan 144µs ± 2% 157µs ± 8% ~ (p=0.114 n=4+4)
Exec/kv-read-const/ExecOpt 119µs ± 1% 120µs ± 4% ~ (p=0.686 n=4+4)
Exec/tpcc-new-order/ExecPlan 205µs ± 1% 197µs ± 2% -4.24% (p=0.029 n=4+4)
Exec/tpcc-new-order/ExecOpt 202µs ± 3% 198µs ± 1% ~ (p=0.533 n=4+2)
I updated the change to only retain the FDs.
pkg/sql/opt/norm/factory.go, line 172 at r6 (raw file):
Previously, rytaft wrote…
[nit] missing period.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 11 files at r8, 4 of 5 files at r9, 4 of 4 files at r10.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andy-kimball)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @RaduBerinde)
pkg/sql/opt/table_meta.go, line 205 at r7 (raw file):
Previously, RaduBerinde wrote…
Great intuition - it looks like it's actually slower to copy it (note that we're still reusing the FD annotation in both cases):
Phases/kv-read/Parse 2.76ns ± 0% 2.78ns ± 1% ~ (p=0.171 n=4+4) Phases/kv-read/OptBuild 7.04µs ± 3% 7.54µs ± 0% +7.17% (p=0.029 n=4+4) Phases/kv-read/Normalize 7.05µs ± 2% 7.64µs ± 1% +8.39% (p=0.029 n=4+4) Phases/kv-read/Explore 11.9µs ± 1% 12.6µs ± 1% +5.52% (p=0.029 n=4+4) Phases/kv-read/ExecBuild 12.9µs ± 4% 13.8µs ± 2% +6.75% (p=0.029 n=4+4) Phases/kv-read-no-prep/Parse 11.5µs ± 2% 11.2µs ± 1% -2.85% (p=0.029 n=4+4) Phases/kv-read-no-prep/OptBuild 31.4µs ± 2% 31.9µs ± 2% ~ (p=0.200 n=4+4) Phases/kv-read-no-prep/Normalize 31.0µs ± 2% 31.9µs ± 1% +2.90% (p=0.029 n=4+4) Phases/kv-read-no-prep/Explore 39.3µs ± 3% 40.2µs ± 1% ~ (p=0.114 n=4+4) Phases/kv-read-no-prep/ExecBuild 40.6µs ± 3% 40.9µs ± 2% ~ (p=0.886 n=4+4) Phases/kv-read-const/Parse 2.78ns ± 0% 2.79ns ± 1% ~ (p=0.657 n=4+4) Phases/kv-read-const/OptBuild 143ns ± 1% 144ns ± 0% ~ (p=0.657 n=4+4) Phases/kv-read-const/Normalize 137ns ± 1% 140ns ± 3% ~ (p=0.086 n=4+4) Phases/kv-read-const/Explore 149ns ± 1% 148ns ± 3% ~ (p=0.257 n=4+4) Phases/kv-read-const/ExecBuild 654ns ± 1% 665ns ± 3% ~ (p=0.371 n=4+4) Phases/tpcc-new-order/Parse 2.79ns ± 1% 2.78ns ± 0% ~ (p=0.971 n=4+4) Phases/tpcc-new-order/OptBuild 12.9µs ± 1% 13.8µs ± 2% +7.30% (p=0.029 n=4+4) Phases/tpcc-new-order/Normalize 12.9µs ± 0% 13.8µs ± 1% +6.73% (p=0.029 n=4+4) Phases/tpcc-new-order/Explore 36.8µs ± 1% 38.8µs ± 2% +5.49% (p=0.029 n=4+4) Phases/tpcc-new-order/ExecBuild 39.5µs ± 2% 42.2µs ± 1% +6.85% (p=0.029 n=4+4) Phases/tpcc-delivery/Parse 2.77ns ± 0% 2.80ns ± 1% ~ (p=0.057 n=4+4) Phases/tpcc-delivery/OptBuild 11.7µs ± 1% 12.5µs ± 2% +7.02% (p=0.029 n=4+4) Phases/tpcc-delivery/Normalize 11.8µs ± 1% 12.9µs ± 2% +9.21% (p=0.029 n=4+4) Phases/tpcc-delivery/Explore 24.7µs ± 2% 25.6µs ± 1% +3.66% (p=0.029 n=4+4) Phases/tpcc-delivery/ExecBuild 26.8µs ± 1% 28.1µs ± 3% +4.74% (p=0.029 n=4+4) Phases/tpcc-stock-level/Parse 2.77ns ± 0% 2.79ns ± 1% ~ (p=0.257 n=4+4) Phases/tpcc-stock-level/OptBuild 49.1µs ± 0% 51.8µs ± 2% +5.57% (p=0.029 n=4+4) Phases/tpcc-stock-level/Normalize 48.1µs ± 1% 52.9µs ± 9% +9.83% (p=0.029 n=4+4) Phases/tpcc-stock-level/Explore 124µs ± 3% 138µs ± 5% +11.73% (p=0.029 n=4+4) Phases/tpcc-stock-level/ExecBuild 136µs ± 1% 141µs ± 5% +3.92% (p=0.029 n=4+4) Exec/kv-read/ExecPlan 143µs ± 3% 156µs ± 2% +8.94% (p=0.029 n=4+4) Exec/kv-read/ExecOpt 141µs ± 1% 145µs ± 6% ~ (p=0.343 n=4+4) Exec/kv-read-no-prep/ExecPlan 230µs ± 0% 227µs ± 3% ~ (p=0.343 n=4+4) Exec/kv-read-no-prep/ExecOpt 223µs ± 1% 218µs ± 3% ~ (p=0.200 n=4+4) Exec/kv-read-const/ExecPlan 144µs ± 2% 157µs ± 8% ~ (p=0.114 n=4+4) Exec/kv-read-const/ExecOpt 119µs ± 1% 120µs ± 4% ~ (p=0.686 n=4+4) Exec/tpcc-new-order/ExecPlan 205µs ± 1% 197µs ± 2% -4.24% (p=0.029 n=4+4) Exec/tpcc-new-order/ExecOpt 202µs ± 3% 198µs ± 1% ~ (p=0.533 n=4+2)
I updated the change to only retain the FDs.
The downside of keeping the FD is more complex code. It's worth it if we see an actual speedup, but if not, it's just simpler or more maintainable to zero out the annotations instead. Do you see a difference in perf in you reuse the FD?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andy-kimball)
pkg/sql/opt/table_meta.go, line 205 at r7 (raw file):
Previously, andy-kimball (Andy Kimball) wrote…
The downside of keeping the FD is more complex code. It's worth it if we see an actual speedup, but if not, it's just simpler or more maintainable to zero out the annotations instead. Do you see a difference in perf in you reuse the FD?
I kind of like that it forces any user of annotations to explicitly think about what should happen on a metadata copy. Anyway, the benchmark results have been inconsistent, I'll rerun on a gceworker and report back.
When we assign placeholders on a detached memo, we copy the metadata. However, this does a shallow copy of table annotations; this is problematic for the statistics annotation which is a mutable object. The fix is to register a copy function for each type of table annotation as part of `NewTableAnnID`. For statistics, we simply clear out the annotation because it can be recalculated as needed (and it turns out to be faster than copying it). Fixes cockroachdb#34904. Release note (bug fix): Fixed a crash related to cached plans.
137fa5f
to
c1401e0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andy-kimball and @rytaft)
pkg/sql/opt/table_meta.go, line 205 at r7 (raw file):
Previously, RaduBerinde wrote…
I kind of like that it forces any user of annotations to explicitly think about what should happen on a metadata copy. Anyway, the benchmark results have been inconsistent, I'll rerun on a gceworker and report back.
I made quite a few runs on a gceworker and couldn't see a difference above the noise. If anything zeroing them out is somehow faster. I simplified the change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 4 of 4 files at r11.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andy-kimball)
bors r+ |
35027: opt: fix race caused by shared table annotations r=RaduBerinde a=RaduBerinde #### opt: test reproducing detached memo race Test that reproduces #34904. Writing the test exposed a problem in the `CopyAndReplace` API: there was no way to recursively copy a subtree when creating an internal node. The API was reverted to the first iteration Andy had in his PR. Release note: None #### opt: replace ColumnMeta.TableMeta with a TableID The TableMeta field is problematic: it needs to be fixed up when copying Metadata; and since it points into a slice that we append to, it may or may not be aliased with the corresponding entry in `tables`. Avoiding these complication by just storing the TabeID. The `md` backpointer is also removed and QualifiedAlias is moved to Metadata. Release note: None #### opt: fix race caused by shared table annotations When we assign placeholders on a detached memo, we copy the metadata. However, this does a shallow copy of table annotations; this is problematic for the statistics annotation which is a mutable object. The fix is to register a copy function for each type of table annotation as part of `NewTableAnnID`. Fixes #34904. Release note (bug fix): Fixed a crash related to cached plans. Co-authored-by: Radu Berinde <[email protected]>
Build succeeded |
opt: test reproducing detached memo race
Test that reproduces #34904. Writing the test exposed a problem in the
CopyAndReplace
API: there was no way to recursively copy a subtreewhen creating an internal node. The API was reverted to the first
iteration Andy had in his PR.
Release note: None
opt: replace ColumnMeta.TableMeta with a TableID
The TableMeta field is problematic: it needs to be fixed up when
copying Metadata; and since it points into a slice that we append to,
it may or may not be aliased with the corresponding entry in
tables
.Avoiding these complication by just storing the TabeID. The
md
backpointer is also removed and QualifiedAlias is moved to Metadata.
Release note: None
opt: fix race caused by shared table annotations
When we assign placeholders on a detached memo, we copy the metadata.
However, this does a shallow copy of table annotations; this is
problematic for the statistics annotation which is a mutable object.
The fix is to register a copy function for each type of table
annotation as part of
NewTableAnnID
.Fixes #34904.
Release note (bug fix): Fixed a crash related to cached plans.