Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: add ReType to resolveAndRequireType to fix vector engine panic #66885

Merged
merged 1 commit into from
Jun 28, 2021

Conversation

cucaroach
Copy link
Contributor

@cucaroach cucaroach commented Jun 25, 2021

Fixes #66708

The vector engine needs exact type coercion, specifically when piping
computed column values into downstream operators. Without this fix
the computed column was left as an int64 instead of cast back to the
required int16 type.

Exposed by sqlsmith, kudos to @michae2 for the reduce help

Release note: None

@cucaroach cucaroach requested review from RaduBerinde and a team June 25, 2021 15:49
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@cucaroach cucaroach linked an issue Jun 25, 2021 that may be closed by this pull request
@cucaroach
Copy link
Contributor Author

There's some CI failures here that bring to light the shortcomings of my "fix". @RaduBerinde whats the best way to bring those to your attention, should I just rewrite them and add them to the PR so we can discuss?

Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is the standard procedure - rewrite all tests so the changes can be reviewed.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @cucaroach)


pkg/sql/opt/optbuilder/select.go, line 711 at r1 (raw file):

			})
			// Paper over the problem, there's probably a better place to put this...
			if scalar.DataType() != tabCol.DatumType() {

I would call tree.ReType() on texpr before buildScalar instead.

@rytaft it feels that resolveAndRequireType should do this internally, what do you think?

Copy link
Contributor Author

@cucaroach cucaroach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @rytaft)


pkg/sql/opt/optbuilder/select.go, line 711 at r1 (raw file):

Previously, RaduBerinde wrote…

I would call tree.ReType() on texpr before buildScalar instead.

@rytaft it feels that resolveAndRequireType should do this internally, what do you think?

That sounds better. resolveAndRequireType sure doesn't seem to be living up to the "Require" bit. I guess I could try that and see what the fallout looks like.

Copy link
Contributor Author

@cucaroach cucaroach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @rytaft)


pkg/sql/opt/optbuilder/select.go, line 711 at r1 (raw file):

Previously, cucaroach (Tommy Reilly) wrote…

That sounds better. resolveAndRequireType sure doesn't seem to be living up to the "Require" bit. I guess I could try that and see what the fallout looks like.

FYI, with ReType instead of my fix the fallout is diminished.

Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but I think it's a good idea to try adding ReType to resolveAndRequireType and see what else changes.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @rytaft)


pkg/sql/opt/optbuilder/select.go, line 711 at r1 (raw file):

Previously, cucaroach (Tommy Reilly) wrote…

FYI, with ReType instead of my fix the fallout is diminished.

The != instead of using Identical() was likely causing some unneeded casts.

Copy link
Contributor Author

@cucaroach cucaroach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RFAL, I pushed the ReType down into resolveAndRequireType and updated the testdata that was affected. Seems okay? Not sure if any of these casts could/would hurt performance. At a high level I think ReType'ing doesn't belong in computed column specific code and it should be handled at a deeper level. At an even higher level I think our execution engine should have a "verifier" to ensure runtime mismatches like this can never happen, not sure what this would look like in practice but something to think about.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @rytaft)

Copy link
Contributor Author

@cucaroach cucaroach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @rytaft)


pkg/sql/opt/optbuilder/select.go, line 711 at r1 (raw file):

Previously, RaduBerinde wrote…

The != instead of using Identical() was likely causing some unneeded casts.

👍

Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm_strong: Yes, let's keep the Retype in resolveAndRequireType.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @cucaroach)

Fixes cockroachdb#66708

The vector engine needs exact type coercion, specifically when piping
computed column values into downstream operators.  Without this fix
the computed column was left as an int64 instead of cast back to the
required int16 type.

Exposed by sqlsmith, kudos to @michae2 for the review help

Release note: None
@cucaroach cucaroach marked this pull request as ready for review June 28, 2021 16:52
@cucaroach cucaroach changed the title sql: WIP: add cast to massage computed col type to desired sql: add ReType to resolveAndRequireType to fix vector engine panic Jun 28, 2021
@cucaroach cucaroach requested a review from a team June 28, 2021 16:57
@RaduBerinde
Copy link
Member

I think this should be backported; adding the labels.

Copy link
Contributor Author

@cucaroach cucaroach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @cucaroach)


pkg/sql/opt/optbuilder/scope.go, line 461 at r5 (raw file):

		panic(err)
	}
	return tree.ReType(s.ensureNullType(texpr, desired), desired)

Is this okay or should I break it up into two lines?

Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @cucaroach)


pkg/sql/opt/optbuilder/scope.go, line 461 at r5 (raw file):

Previously, cucaroach (Tommy Reilly) wrote…

Is this okay or should I break it up into two lines?

We try to keep things at most 100 columns in most cases (see https://wiki.crdb.io/wiki/spaces/CRDB/pages/181371303/Go+Golang+coding+guidelines), other than that it's up to you.

Or "rule" is that we try not to comment too much on formatting nits in PRs; if something is very important, it should be made into a linting test and then it never needs to be discussed.

@cucaroach
Copy link
Contributor Author

I think this should be backported; adding the labels.

Near as I could tell this only affects 21.1 so removing 20.2 label (CRDB needs to support virtual computed columns).

@cucaroach
Copy link
Contributor Author

bors r+

@RaduBerinde
Copy link
Member

The lack of a cast could affect STORED columns as well.

@RaduBerinde
Copy link
Member

The lack of a cast could affect STORED columns as well.

I can't come up with a case with a STORED column that is problematic, so I agree we can leave out of 20.2.

@craig
Copy link
Contributor

craig bot commented Jun 28, 2021

Build succeeded:

@RaduBerinde
Copy link
Member


pkg/sql/opt/xform/testdata/external/trading-mutation, line 1605 at r5 (raw file):

      │              └── q:37
      └── projections
           ├── crdb_internal.round_decimal_values((buyprice:23::DECIMAL - discount:25::DECIMAL)::DECIMAL(10,4), 4) [as=discountbuyprice_comp:53, outer=(23,25), immutable]

Hm, this is giving me second thoughts. If we cast to DECIMAL(10,4), are we missing some better rounding that round_decimal_values does? If not, does this mean we can just rely on casts and get rid of the special round_decimal_values stuff in optbuilder? CC @rytaft @andy-kimball

Copy link
Contributor Author

@cucaroach cucaroach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained


pkg/sql/opt/exec/execbuilder/testdata/hash_sharded_index, line 31 at r5 (raw file):

        │ columns: (crdb_internal_a_shard_11_comp, column1)
        │ estimated row count: 2
        │ render crdb_internal_a_shard_11_comp: mod(fnv32(COALESCE(column1::STRING, '')), 11)::INT4

Are these changes right/okay? Isn't "int" in sql supposed to be a 64-bit int?


pkg/sql/opt/xform/testdata/external/trading-mutation, line 1605 at r5 (raw file):

Previously, RaduBerinde wrote…

Hm, this is giving me second thoughts. If we cast to DECIMAL(10,4), are we missing some better rounding that round_decimal_values does? If not, does this mean we can just rely on casts and get rid of the special round_decimal_values stuff in optbuilder? CC @rytaft @andy-kimball

I'll look into it, it does seem redundant.

@RaduBerinde
Copy link
Member


pkg/sql/opt/exec/execbuilder/testdata/hash_sharded_index, line 31 at r5 (raw file):

Previously, cucaroach (Tommy Reilly) wrote…

Are these changes right/okay? Isn't "int" in sql supposed to be a 64-bit int?

It's good, the hidden shard column is INT4 (we don't support BUCKET_COUNT > 2^31)

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 2 of 2 files at r4, 6 of 6 files at r5.
Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained


pkg/sql/opt/optbuilder/scope.go, line 461 at r5 (raw file):

Previously, RaduBerinde wrote…

We try to keep things at most 100 columns in most cases (see https://wiki.crdb.io/wiki/spaces/CRDB/pages/181371303/Go+Golang+coding+guidelines), other than that it's up to you.

Or "rule" is that we try not to comment too much on formatting nits in PRs; if something is very important, it should be made into a linting test and then it never needs to be discussed.

Nice clean change!


pkg/sql/opt/xform/testdata/external/trading-mutation, line 1605 at r5 (raw file):

Previously, cucaroach (Tommy Reilly) wrote…

I'll look into it, it does seem redundant.

Good question -- I'm not sure off the top of my head. @andy-kimball was the one who did the rounding work.


pkg/sql/opt/optbuilder/select.go, line 711 at r1 (raw file):

Previously, cucaroach (Tommy Reilly) wrote…

:+!:

Sorry for the delay -- resolveAndRequireType looks like the right place.

Copy link
Contributor Author

@cucaroach cucaroach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained


pkg/sql/opt/xform/testdata/external/trading-mutation, line 1605 at r5 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

Good question -- I'm not sure off the top of my head. @andy-kimball was the one who did the rounding work.

Hmm, so this is a stored computed column and this is an update query on a unrelated field, I think this has no material affect at runtime. However simple update query on a one of the fields in the computed column reveals that my change does introduce an extra cast operator into the vec plans. Could that be a problem?

@rytaft
Copy link
Collaborator

rytaft commented Jun 30, 2021


pkg/sql/opt/xform/testdata/external/trading-mutation, line 1605 at r5 (raw file):

Previously, cucaroach (Tommy Reilly) wrote…

Hmm, so this is a stored computed column and this is an update query on a unrelated field, I think this has no material affect at runtime. However simple update query on a one of the fields in the computed column reveals that my change does introduce an extra cast operator into the vec plans. Could that be a problem?

I don't think an extra cast operator is a big problem, but we could avoid it by adding a normalization rule similar to the existing EliminateCast rule that eliminates the round_decimal_values function if it's redundant.

But @RaduBerinde does bring up a good question about whether we could get rid of the round_decimal_values stuff altogether in the optbuilder... I experimented with commenting out that logic, and it does seem like it might still be necessary in general. But maybe we can get rid of the special case for computed columns, and then a normalization rule would be unnecessary?

A separate question is whether we should actually be using round_decimal_values instead of ReType in some cases inside resolveAndRequireType. Poking around and experimenting a bit, it seems like under the covers round_decimal_values is doing the same thing as the cast, so I think it's fine as is. But I'm not 100% sure...

@RaduBerinde
Copy link
Member


pkg/sql/opt/xform/testdata/external/trading-mutation, line 1605 at r5 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

I don't think an extra cast operator is a big problem, but we could avoid it by adding a normalization rule similar to the existing EliminateCast rule that eliminates the round_decimal_values function if it's redundant.

But @RaduBerinde does bring up a good question about whether we could get rid of the round_decimal_values stuff altogether in the optbuilder... I experimented with commenting out that logic, and it does seem like it might still be necessary in general. But maybe we can get rid of the special case for computed columns, and then a normalization rule would be unnecessary?

A separate question is whether we should actually be using round_decimal_values instead of ReType in some cases inside resolveAndRequireType. Poking around and experimenting a bit, it seems like under the covers round_decimal_values is doing the same thing as the cast, so I think it's fine as is. But I'm not 100% sure...

Filed #67083 to investigate more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

roachtest: sqlsmith/setup=rand-tables/setting=no-ddl failed
4 participants