Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opt: fix race caused by shared table annotations #35027

Merged
merged 3 commits into from
Feb 21, 2019

Conversation

RaduBerinde
Copy link
Member

opt: test reproducing detached memo race

Test that reproduces #34904. Writing the test exposed a problem in the
CopyAndReplace API: there was no way to recursively copy a subtree
when creating an internal node. The API was reverted to the first
iteration Andy had in his PR.

Release note: None

opt: replace ColumnMeta.TableMeta with a TableID

The TableMeta field is problematic: it needs to be fixed up when
copying Metadata; and since it points into a slice that we append to,
it may or may not be aliased with the corresponding entry in tables.
Avoiding these complication by just storing the TabeID. The md
backpointer is also removed and QualifiedAlias is moved to Metadata.

Release note: None

opt: fix race caused by shared table annotations

When we assign placeholders on a detached memo, we copy the metadata.
However, this does a shallow copy of table annotations; this is
problematic for the statistics annotation which is a mutable object.

The fix is to register a copy function for each type of table
annotation as part of NewTableAnnID.

Fixes #34904.

Release note (bug fix): Fixed a crash related to cached plans.

@RaduBerinde RaduBerinde requested a review from a team as a code owner February 16, 2019 23:54
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 7 of 7 files at r1, 5 of 5 files at r2, 6 of 6 files at r3.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @andy-kimball and @RaduBerinde)


pkg/sql/opt/norm/factory.go, line 171 at r3 (raw file):

// CopyAndRplaceDefault. Note that if it's constructing a non-leaf node, that
// node's inputs
//

This comment got messed up


pkg/sql/opt/testutils/build.go, line 1 at r3 (raw file):

// Copyright 2018 The Cockroach Authors.

[nit] 2019

Copy link
Member Author

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TFTR!

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @andy-kimball and @rytaft)


pkg/sql/opt/norm/factory.go, line 171 at r3 (raw file):

Previously, rytaft wrote…

This comment got messed up

Done.


pkg/sql/opt/testutils/build.go, line 1 at r3 (raw file):

Previously, rytaft wrote…

[nit] 2019

Done.

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 12 files at r4, 4 of 5 files at r5, 6 of 6 files at r6.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andy-kimball and @RaduBerinde)


pkg/sql/opt/norm/factory.go, line 172 at r6 (raw file):

// children returned by recursive calls to the replace callback. Note that if a
// non-leaf replacement node is constructed, its inputs must be copied using
// CopyAndReplaceDefault

[nit] missing period.

Copy link
Contributor

@andy-kimball andy-kimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @RaduBerinde and @rytaft)


pkg/sql/opt/table_meta.go, line 205 at r7 (raw file):

//
// See the TableAnnID comment for more details and a usage example.
func NewTableAnnID(copyFn CopyTableAnnFn) TableAnnID {

How necessary is it to do the copying of annotations rather than allowing them to be re-derived? I bet most of the cost of creating stats is creating the stats objects, which has to happen even in the copy case. And the FD set is probably fairly quick to create as well.

I guess my default would be to just re-derive, unless one of our benchmarks showed a tempting improvement as a result of doing the copying. Can you run our bench_test.go tests with and without copying enabled to see if it makes a diff?


pkg/sql/opt/norm/factory.go, line 169 at r7 (raw file):

// in the "from" subtree, and has the choice of constructing an arbitrary
// replacement node, or delegating to the default behavior by calling
// CopyAndRelaceDefault, which constructs a copy of the source operator using

NIT: sp: CopyAndReplace

@RaduBerinde
Copy link
Member Author


pkg/sql/opt/table_meta.go, line 205 at r7 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

How necessary is it to do the copying of annotations rather than allowing them to be re-derived? I bet most of the cost of creating stats is creating the stats objects, which has to happen even in the copy case. And the FD set is probably fairly quick to create as well.

I guess my default would be to just re-derive, unless one of our benchmarks showed a tempting improvement as a result of doing the copying. Can you run our bench_test.go tests with and without copying enabled to see if it makes a diff?

I will run some benchmarks. What's the downside of keeping the FD annotation though? (we don't need to copy it so it should avoid an allocation).

Test that reproduces cockroachdb#34904. Writing the test exposed a problem in the
`CopyAndReplace` API: there was no way to recursively copy a subtree
when creating an internal node. The API was reverted to the first
iteration Andy had in his PR.

Release note: None
The TableMeta field is problematic: it needs to be fixed up when
copying Metadata; and since it points into a slice that we append to,
it may or may not be aliased with the corresponding entry in `tables`.
Avoiding these complication by just storing the TabeID. The `md`
backpointer is also removed and QualifiedAlias is moved to Metadata.

Release note: None
Copy link
Member Author

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andy-kimball and @rytaft)


pkg/sql/opt/table_meta.go, line 205 at r7 (raw file):

Previously, RaduBerinde wrote…

I will run some benchmarks. What's the downside of keeping the FD annotation though? (we don't need to copy it so it should avoid an allocation).

Great intuition - it looks like it's actually slower to copy it (note that we're still reusing the FD annotation in both cases):

Phases/kv-read/Parse               2.76ns ± 0%  2.78ns ± 1%     ~     (p=0.171 n=4+4)
Phases/kv-read/OptBuild            7.04µs ± 3%  7.54µs ± 0%   +7.17%  (p=0.029 n=4+4)
Phases/kv-read/Normalize           7.05µs ± 2%  7.64µs ± 1%   +8.39%  (p=0.029 n=4+4)
Phases/kv-read/Explore             11.9µs ± 1%  12.6µs ± 1%   +5.52%  (p=0.029 n=4+4)
Phases/kv-read/ExecBuild           12.9µs ± 4%  13.8µs ± 2%   +6.75%  (p=0.029 n=4+4)
Phases/kv-read-no-prep/Parse       11.5µs ± 2%  11.2µs ± 1%   -2.85%  (p=0.029 n=4+4)
Phases/kv-read-no-prep/OptBuild    31.4µs ± 2%  31.9µs ± 2%     ~     (p=0.200 n=4+4)
Phases/kv-read-no-prep/Normalize   31.0µs ± 2%  31.9µs ± 1%   +2.90%  (p=0.029 n=4+4)
Phases/kv-read-no-prep/Explore     39.3µs ± 3%  40.2µs ± 1%     ~     (p=0.114 n=4+4)
Phases/kv-read-no-prep/ExecBuild   40.6µs ± 3%  40.9µs ± 2%     ~     (p=0.886 n=4+4)
Phases/kv-read-const/Parse         2.78ns ± 0%  2.79ns ± 1%     ~     (p=0.657 n=4+4)
Phases/kv-read-const/OptBuild       143ns ± 1%   144ns ± 0%     ~     (p=0.657 n=4+4)
Phases/kv-read-const/Normalize      137ns ± 1%   140ns ± 3%     ~     (p=0.086 n=4+4)
Phases/kv-read-const/Explore        149ns ± 1%   148ns ± 3%     ~     (p=0.257 n=4+4)
Phases/kv-read-const/ExecBuild      654ns ± 1%   665ns ± 3%     ~     (p=0.371 n=4+4)
Phases/tpcc-new-order/Parse        2.79ns ± 1%  2.78ns ± 0%     ~     (p=0.971 n=4+4)
Phases/tpcc-new-order/OptBuild     12.9µs ± 1%  13.8µs ± 2%   +7.30%  (p=0.029 n=4+4)
Phases/tpcc-new-order/Normalize    12.9µs ± 0%  13.8µs ± 1%   +6.73%  (p=0.029 n=4+4)
Phases/tpcc-new-order/Explore      36.8µs ± 1%  38.8µs ± 2%   +5.49%  (p=0.029 n=4+4)
Phases/tpcc-new-order/ExecBuild    39.5µs ± 2%  42.2µs ± 1%   +6.85%  (p=0.029 n=4+4)
Phases/tpcc-delivery/Parse         2.77ns ± 0%  2.80ns ± 1%     ~     (p=0.057 n=4+4)
Phases/tpcc-delivery/OptBuild      11.7µs ± 1%  12.5µs ± 2%   +7.02%  (p=0.029 n=4+4)
Phases/tpcc-delivery/Normalize     11.8µs ± 1%  12.9µs ± 2%   +9.21%  (p=0.029 n=4+4)
Phases/tpcc-delivery/Explore       24.7µs ± 2%  25.6µs ± 1%   +3.66%  (p=0.029 n=4+4)
Phases/tpcc-delivery/ExecBuild     26.8µs ± 1%  28.1µs ± 3%   +4.74%  (p=0.029 n=4+4)
Phases/tpcc-stock-level/Parse      2.77ns ± 0%  2.79ns ± 1%     ~     (p=0.257 n=4+4)
Phases/tpcc-stock-level/OptBuild   49.1µs ± 0%  51.8µs ± 2%   +5.57%  (p=0.029 n=4+4)
Phases/tpcc-stock-level/Normalize  48.1µs ± 1%  52.9µs ± 9%   +9.83%  (p=0.029 n=4+4)
Phases/tpcc-stock-level/Explore     124µs ± 3%   138µs ± 5%  +11.73%  (p=0.029 n=4+4)
Phases/tpcc-stock-level/ExecBuild   136µs ± 1%   141µs ± 5%   +3.92%  (p=0.029 n=4+4)
Exec/kv-read/ExecPlan               143µs ± 3%   156µs ± 2%   +8.94%  (p=0.029 n=4+4)
Exec/kv-read/ExecOpt                141µs ± 1%   145µs ± 6%     ~     (p=0.343 n=4+4)
Exec/kv-read-no-prep/ExecPlan       230µs ± 0%   227µs ± 3%     ~     (p=0.343 n=4+4)
Exec/kv-read-no-prep/ExecOpt        223µs ± 1%   218µs ± 3%     ~     (p=0.200 n=4+4)
Exec/kv-read-const/ExecPlan         144µs ± 2%   157µs ± 8%     ~     (p=0.114 n=4+4)
Exec/kv-read-const/ExecOpt          119µs ± 1%   120µs ± 4%     ~     (p=0.686 n=4+4)
Exec/tpcc-new-order/ExecPlan        205µs ± 1%   197µs ± 2%   -4.24%  (p=0.029 n=4+4)
Exec/tpcc-new-order/ExecOpt         202µs ± 3%   198µs ± 1%     ~     (p=0.533 n=4+2)

I updated the change to only retain the FDs.


pkg/sql/opt/norm/factory.go, line 172 at r6 (raw file):

Previously, rytaft wrote…

[nit] missing period.

Done.

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 11 files at r8, 4 of 5 files at r9, 4 of 4 files at r10.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andy-kimball)

Copy link
Contributor

@andy-kimball andy-kimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @RaduBerinde)


pkg/sql/opt/table_meta.go, line 205 at r7 (raw file):

Previously, RaduBerinde wrote…

Great intuition - it looks like it's actually slower to copy it (note that we're still reusing the FD annotation in both cases):

Phases/kv-read/Parse               2.76ns ± 0%  2.78ns ± 1%     ~     (p=0.171 n=4+4)
Phases/kv-read/OptBuild            7.04µs ± 3%  7.54µs ± 0%   +7.17%  (p=0.029 n=4+4)
Phases/kv-read/Normalize           7.05µs ± 2%  7.64µs ± 1%   +8.39%  (p=0.029 n=4+4)
Phases/kv-read/Explore             11.9µs ± 1%  12.6µs ± 1%   +5.52%  (p=0.029 n=4+4)
Phases/kv-read/ExecBuild           12.9µs ± 4%  13.8µs ± 2%   +6.75%  (p=0.029 n=4+4)
Phases/kv-read-no-prep/Parse       11.5µs ± 2%  11.2µs ± 1%   -2.85%  (p=0.029 n=4+4)
Phases/kv-read-no-prep/OptBuild    31.4µs ± 2%  31.9µs ± 2%     ~     (p=0.200 n=4+4)
Phases/kv-read-no-prep/Normalize   31.0µs ± 2%  31.9µs ± 1%   +2.90%  (p=0.029 n=4+4)
Phases/kv-read-no-prep/Explore     39.3µs ± 3%  40.2µs ± 1%     ~     (p=0.114 n=4+4)
Phases/kv-read-no-prep/ExecBuild   40.6µs ± 3%  40.9µs ± 2%     ~     (p=0.886 n=4+4)
Phases/kv-read-const/Parse         2.78ns ± 0%  2.79ns ± 1%     ~     (p=0.657 n=4+4)
Phases/kv-read-const/OptBuild       143ns ± 1%   144ns ± 0%     ~     (p=0.657 n=4+4)
Phases/kv-read-const/Normalize      137ns ± 1%   140ns ± 3%     ~     (p=0.086 n=4+4)
Phases/kv-read-const/Explore        149ns ± 1%   148ns ± 3%     ~     (p=0.257 n=4+4)
Phases/kv-read-const/ExecBuild      654ns ± 1%   665ns ± 3%     ~     (p=0.371 n=4+4)
Phases/tpcc-new-order/Parse        2.79ns ± 1%  2.78ns ± 0%     ~     (p=0.971 n=4+4)
Phases/tpcc-new-order/OptBuild     12.9µs ± 1%  13.8µs ± 2%   +7.30%  (p=0.029 n=4+4)
Phases/tpcc-new-order/Normalize    12.9µs ± 0%  13.8µs ± 1%   +6.73%  (p=0.029 n=4+4)
Phases/tpcc-new-order/Explore      36.8µs ± 1%  38.8µs ± 2%   +5.49%  (p=0.029 n=4+4)
Phases/tpcc-new-order/ExecBuild    39.5µs ± 2%  42.2µs ± 1%   +6.85%  (p=0.029 n=4+4)
Phases/tpcc-delivery/Parse         2.77ns ± 0%  2.80ns ± 1%     ~     (p=0.057 n=4+4)
Phases/tpcc-delivery/OptBuild      11.7µs ± 1%  12.5µs ± 2%   +7.02%  (p=0.029 n=4+4)
Phases/tpcc-delivery/Normalize     11.8µs ± 1%  12.9µs ± 2%   +9.21%  (p=0.029 n=4+4)
Phases/tpcc-delivery/Explore       24.7µs ± 2%  25.6µs ± 1%   +3.66%  (p=0.029 n=4+4)
Phases/tpcc-delivery/ExecBuild     26.8µs ± 1%  28.1µs ± 3%   +4.74%  (p=0.029 n=4+4)
Phases/tpcc-stock-level/Parse      2.77ns ± 0%  2.79ns ± 1%     ~     (p=0.257 n=4+4)
Phases/tpcc-stock-level/OptBuild   49.1µs ± 0%  51.8µs ± 2%   +5.57%  (p=0.029 n=4+4)
Phases/tpcc-stock-level/Normalize  48.1µs ± 1%  52.9µs ± 9%   +9.83%  (p=0.029 n=4+4)
Phases/tpcc-stock-level/Explore     124µs ± 3%   138µs ± 5%  +11.73%  (p=0.029 n=4+4)
Phases/tpcc-stock-level/ExecBuild   136µs ± 1%   141µs ± 5%   +3.92%  (p=0.029 n=4+4)
Exec/kv-read/ExecPlan               143µs ± 3%   156µs ± 2%   +8.94%  (p=0.029 n=4+4)
Exec/kv-read/ExecOpt                141µs ± 1%   145µs ± 6%     ~     (p=0.343 n=4+4)
Exec/kv-read-no-prep/ExecPlan       230µs ± 0%   227µs ± 3%     ~     (p=0.343 n=4+4)
Exec/kv-read-no-prep/ExecOpt        223µs ± 1%   218µs ± 3%     ~     (p=0.200 n=4+4)
Exec/kv-read-const/ExecPlan         144µs ± 2%   157µs ± 8%     ~     (p=0.114 n=4+4)
Exec/kv-read-const/ExecOpt          119µs ± 1%   120µs ± 4%     ~     (p=0.686 n=4+4)
Exec/tpcc-new-order/ExecPlan        205µs ± 1%   197µs ± 2%   -4.24%  (p=0.029 n=4+4)
Exec/tpcc-new-order/ExecOpt         202µs ± 3%   198µs ± 1%     ~     (p=0.533 n=4+2)

I updated the change to only retain the FDs.

The downside of keeping the FD is more complex code. It's worth it if we see an actual speedup, but if not, it's just simpler or more maintainable to zero out the annotations instead. Do you see a difference in perf in you reuse the FD?

Copy link
Member Author

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andy-kimball)


pkg/sql/opt/table_meta.go, line 205 at r7 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

The downside of keeping the FD is more complex code. It's worth it if we see an actual speedup, but if not, it's just simpler or more maintainable to zero out the annotations instead. Do you see a difference in perf in you reuse the FD?

I kind of like that it forces any user of annotations to explicitly think about what should happen on a metadata copy. Anyway, the benchmark results have been inconsistent, I'll rerun on a gceworker and report back.

When we assign placeholders on a detached memo, we copy the metadata.
However, this does a shallow copy of table annotations; this is
problematic for the statistics annotation which is a mutable object.

The fix is to register a copy function for each type of table
annotation as part of `NewTableAnnID`. For statistics, we simply clear
out the annotation because it can be recalculated as needed (and it
turns out to be faster than copying it).

Fixes cockroachdb#34904.

Release note (bug fix): Fixed a crash related to cached plans.
Copy link
Member Author

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andy-kimball and @rytaft)


pkg/sql/opt/table_meta.go, line 205 at r7 (raw file):

Previously, RaduBerinde wrote…

I kind of like that it forces any user of annotations to explicitly think about what should happen on a metadata copy. Anyway, the benchmark results have been inconsistent, I'll rerun on a gceworker and report back.

I made quite a few runs on a gceworker and couldn't see a difference above the noise. If anything zeroing them out is somehow faster. I simplified the change.

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 4 files at r11.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andy-kimball)

@RaduBerinde
Copy link
Member Author

bors r+

craig bot pushed a commit that referenced this pull request Feb 21, 2019
35027: opt: fix race caused by shared table annotations r=RaduBerinde a=RaduBerinde

#### opt: test reproducing detached memo race

Test that reproduces #34904. Writing the test exposed a problem in the
`CopyAndReplace` API: there was no way to recursively copy a subtree
when creating an internal node. The API was reverted to the first
iteration Andy had in his PR.

Release note: None

#### opt: replace ColumnMeta.TableMeta with a TableID

The TableMeta field is problematic: it needs to be fixed up when
copying Metadata; and since it points into a slice that we append to,
it may or may not be aliased with the corresponding entry in `tables`.
Avoiding these complication by just storing the TabeID. The `md`
backpointer is also removed and QualifiedAlias is moved to Metadata.

Release note: None

#### opt: fix race caused by shared table annotations

When we assign placeholders on a detached memo, we copy the metadata.
However, this does a shallow copy of table annotations; this is
problematic for the statistics annotation which is a mutable object.

The fix is to register a copy function for each type of table
annotation as part of `NewTableAnnID`.

Fixes #34904.

Release note (bug fix): Fixed a crash related to cached plans.


Co-authored-by: Radu Berinde <[email protected]>
@craig
Copy link
Contributor

craig bot commented Feb 21, 2019

Build succeeded

@craig craig bot merged commit c1401e0 into cockroachdb:master Feb 21, 2019
@RaduBerinde RaduBerinde deleted the fix-memo-race branch February 25, 2019 17:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

opt: race in AssignPlaceholders
4 participants