Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SELECT from a table with hash sharded PK has unexpected index join in query plan #67170

Closed
Azhng opened this issue Jul 2, 2021 · 22 comments · Fixed by #67865
Closed

SELECT from a table with hash sharded PK has unexpected index join in query plan #67170

Azhng opened this issue Jul 2, 2021 · 22 comments · Fixed by #67865
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)

Comments

@Azhng
Copy link
Contributor

Azhng commented Jul 2, 2021

Suppose I have a table with hash sharded PK and a secondary index as follow:

create table t (
  k1 int,
  k2 bytes,
  k3 int,
  k4 int,
  k5 int,
  v1 int,
  v2 int,
  primary key (k1, k2, k3, k4, k5) using hash with bucket_count = 8,
  index (k2, k1, k3, k4, k5)
);

Running the following query for the table t contains an unexpected index join:

root@127.0.0.1:26257/movr> explain select
  v1
from t
where k1 = 1 AND
 k2 = ''::bytes AND
 k3 = 3 AND
 k4 = 4 AND
 k5 = 5;
                                        info
------------------------------------------------------------------------------------
  distribution: local
  vectorized: true

  • index join
  │ estimated row count: 1
  │ table: t@primary
  │
  └── • scan
        estimated row count: 1 (100% of the table; stats collected 40 minutes ago)
        table: t@t_k2_k1_k3_k4_k5_idx
        spans: [/'\x'/1/3/4/5/0 - /'\x'/1/3/4/5/7]
(11 rows)

Time: 1ms total (execution 1ms / network 0ms)

However, if the column type for k2 is changed from BYTES to STRING, then the query plan will only scan the primary index:

root@127.0.0.1:26257/movr> explain select
  v1
from t
where k1 = 1 AND
 k2 = '' AND
 k3 = 3 AND
 k4 = 4 AND
 k5 = 5;

                    info
--------------------------------------------
  distribution: local
  vectorized: true

  • scan
    missing stats
    table: t@primary
    spans: [/4/1/''/3/4/5 - /4/1/''/3/4/5]
(7 rows)

Time: 1ms total (execution 1ms / network 0ms)

Epic: CRDB-7363

@Azhng Azhng added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Jul 2, 2021
@blathers-crl blathers-crl bot added the T-sql-queries SQL Queries Team label Jul 2, 2021
@ajwerner
Copy link
Contributor

ajwerner commented Jul 2, 2021

I'm pretty sure that the problem is that we treat that bytes->string cast as not immutable. I'm not totally sure why this is -- I think it's to deal with cases where the string type has a width.

That check is here:

volatility, ok := tree.LookupCastVolatility(input.DataType(), typ)

then
{from: types.BytesFamily, to: types.StringFamily, volatility: VolatilityStable},

The options are:

Rework the generated expression to not cast bytes but rather hash them directly (seems good) or to special case the cast logic to make it immutable if the string has infinite width (also seems good IIUC). The former seems safer and easier.

@ajwerner
Copy link
Contributor

ajwerner commented Jul 2, 2021

Yeah so seems like this is a problem for the following types: BYTES, FLOAT, and TIMESTAMPTZ. It should be easy to add special logic to knock out the casts. For the bytes, we can just hash them without the stirng casts. For the floats and timestamptz's, we can cast them to decimal and integer respectively. It should take just a small amount of plumbing.

@RaduBerinde
Copy link
Member

RaduBerinde commented Jul 2, 2021

I'm pretty sure that the problem is that we treat that bytes->string cast as not immutable. I'm not totally sure why this is -- I think it's to deal with cases where the string type has a width.

It's because it depends on a knob (BytesEncodeFormat, controlled by bytea_output). See https://www.postgresql.org/docs/10/datatype-binary.html

Similarly, for FLOAT there is some precision setting (extra_float_digits).

TIMESTAMPTZ::STRING contains the current timezone. It can be fixed by doing (t AT TIME ZONE 'utc')::STRING.

By the way, the way to find out these things is to find the case in PerformCast and check if and how ctx is used in the conversion.

String conversion is a bad idea in other cases too: you can have decimals 1.0 and 1.00 and their strings are different but they are equal. Similar issue with collated strings.

Maybe we should extend fnv32 (or add some new variant) to be able to take any type that can be indexed, and hash its key encoding?

@RaduBerinde
Copy link
Member

Or, perhaps more elegant - have a key_encoding() function that converts any indexable type to BYTES (and allow fnv32() to operate on BYTES directly).

@ajwerner
Copy link
Contributor

ajwerner commented Jul 2, 2021

Maybe we should extend fnv32 (or add some new variant) to be able to take any type that can be indexed, and hash its key encoding?

Alternatively we could expose a function that turns any type into bytes using its key encoding. fnv32 takes bytes.

@ajwerner
Copy link
Contributor

ajwerner commented Jul 2, 2021

Jinx

@RaduBerinde
Copy link
Member

Great minds think alike (and apparently rarely sleep).

@blathers-crl blathers-crl bot added the T-sql-schema-deprecated Use T-sql-foundations instead label Jul 13, 2021
@rytaft
Copy link
Collaborator

rytaft commented Jul 13, 2021

Assigning this to schema for triage -- let us know if you need any help from SQL Queries here.

@ajwerner
Copy link
Contributor

The function we should use to encode the bytes is this one:

// EncodeTableKey encodes `val` into `b` and returns the new buffer.
// This is suitable to generate index/lookup keys in KV.
//
// The encoded value is guaranteed to be lexicographically sortable,
// but not guaranteed to be round-trippable during decoding: some
// values like decimals or collated strings have composite encoding
// where part of their value lies in the value part of the key/value
// pair.
//
// See also: docs/tech-notes/encoding.md, EncodeTableValue().
func EncodeTableKey(b []byte, val tree.Datum, dir encoding.Direction) ([]byte, error) {

@ajwerner
Copy link
Contributor

I took a stab at this by first adding the function and then trying to hook it up. What I discovered when testing is that we still don't properly propagate the functional dependencies for a bunch of types. The crux of the problem, I think can be summarized with the following example:

CREATE TABLE intkey (
    k INT4, shard INT4 AS (mod(fnv32((k::INTERVAL::STRING)), 8)) STORED,
    PRIMARY KEY (shard, k),
    CONSTRAINT c CHECK (shard IN (0, 1, 2, 3, 4, 5, 6, 7))
);
CREATE TABLE floatkey (
    k FLOAT4, shard INT4 AS (mod(fnv32((k::INTERVAL::STRING)), 8)) STORED,
    PRIMARY KEY (shard, k),
    CONSTRAINT c CHECK (shard IN (0, 1, 2, 3, 4, 5, 6, 7))
);

We use ::INTERVAL::STRING as a stand in for an immutable function that gives us a string value for both float4 and int4. I have confirmed that the volatility is immutable for these casts. Also, during planning, I added logging for all volatilities added into a VolatilitySet and none are below Immutable:

I210721 17:01:20.934709 1434 sql/opt/optbuilder/builder.go:262  [n1,client=127.0.0.1:49288,hostssl,user=root] 609  building stmt EXPLAIN (VERBOSE) SELECT * FROM floatkey WHERE k = 1
I210721 17:01:20.934735 1434 sql/opt/optbuilder/builder.go:262  [n1,client=127.0.0.1:49288,hostssl,user=root] 610  building stmt EXPLAIN (VERBOSE) SELECT * FROM floatkey WHERE k = 1
I210721 17:01:20.943019 1434 sql/opt/props/volatility.go:90  [-] 620  adding volatility leak-proof
I210721 17:01:20.943043 1434 sql/opt/props/volatility.go:90  [-] 621  adding volatility leak-proof
I210721 17:01:20.943094 1434 sql/sem/tree/casts.go:361  [-] 622  cast int to int4: immutable true
I210721 17:01:20.943109 1434 sql/opt/props/volatility.go:90  [-] 623  adding volatility immutable
I210721 17:01:20.943115 1434 sql/opt/props/volatility.go:90  [-] 624  adding volatility immutable
I210721 17:01:20.943120 1434 sql/opt/props/volatility.go:90  [-] 625  adding volatility leak-proof
I210721 17:01:20.943126 1434 sql/sem/tree/casts.go:361  [-] 626  cast interval to string: immutable true
I210721 17:01:20.943134 1434 sql/opt/props/volatility.go:90  [-] 627  adding volatility immutable
I210721 17:01:20.943139 1434 sql/sem/tree/casts.go:361  [-] 628  cast float4 to interval: immutable true
I210721 17:01:20.943145 1434 sql/opt/props/volatility.go:90  [-] 629  adding volatility immutable
I210721 17:01:20.943173 1434 sql/opt/props/volatility.go:90  [-] 630  adding volatility leak-proof
I210721 17:01:20.943202 1434 sql/opt/optbuilder/builder.go:278  [n1,client=127.0.0.1:49288,hostssl,user=root] 631  done building stmt EXPLAIN (VERBOSE) SELECT * FROM floatkey WHERE k = 1
I210721 17:01:20.943235 1434 sql/opt/optbuilder/builder.go:305  [n1,client=127.0.0.1:49288,hostssl,user=root] 632  done building stmt EXPLAIN (VERBOSE) SELECT * FROM floatkey WHERE k = 1

Best guess of the moment is that it's something in

func (c *CustomFuncs) tryFoldComputedCol(
in
func (c *CustomFuncs) computedColFilters(

cc @mgartner

@RaduBerinde
Copy link
Member

What is the problem in that example? Do you have an example of a bad query plan?

We use ::INTERVAL::STRING as a stand in for an immutable function that gives us a string value for both float4 and int4

A bit off-topic, but this might change in a future release (we're trying to add support for some "interval style" setting, see #67792). Shouldn't we go down the encode_key() path instead of string casts?

@RaduBerinde
Copy link
Member

Oh, is the problem that the int version works as intended and float doesn't?

I think it may have to do with float being a "composite type". This is so silly and annoying that it hurts me to type it. Composite types are types where you can have non-identical values that are equal. Collated strings are the canonical example but easier to see are decimals (1.0 = 1.00 but they present differently). Float is composite because apparently there is such a thing as -0 which is equal to 0 but presents differently. Composite types restrict what the optimizer can do w.r.t replacing their values etc.

@ajwerner
Copy link
Contributor

A bit off-topic, but this might change in a future release (we're trying to add support for some "interval style" setting, see #67792). Shouldn't we go down the encode_key() path instead of string casts?

Yes, I only posted the ::INTERVAL thing because it exists in master today. See #67865 where I added key_encode and tried to adopt it. It hits the exact same problem.

Oh, is the problem that the int version works as intended and float doesn't?

Yes

@RaduBerinde
Copy link
Member

The optimizer has the concept of whether an expression is "composite-sensitive" or not. We could special case the key_encode function as a function that is not composite sensitive even if its input variable is composite.

https://github.com/cockroachdb/cockroach/blob/master/pkg/sql/opt/memo/logical_props_builder.go#L2608

@mgartner
Copy link
Collaborator

mgartner commented Jul 21, 2021

It seems like the ::INTERVAL actually makes the expression not truly composite-sensitive.

select (-0):::float4, (0):::float4, (-0):::float4::interval, (0):::float4::interval;
  float4 | float4 | interval | interval
---------+--------+----------+-----------
      -0 |      0 | 00:00:00 | 00:00:00

If so, maybe we can adjust our composite-sensitive logic to understand that certain operators are not composite-sensitive even if their input is.

@ajwerner
Copy link
Contributor

Yeah, for sure, that was a distraction. I have what I need and am making good progress. Thanks for all the help!

@RaduBerinde
Copy link
Member

If so, maybe we can adjust our composite-sensitive logic to understand that certain operators are not composite-sensitive even if their input is.

Note that this only applies when the input is just a variable. In general, the input expression might truly take on unequal values (e.g. IF(f::STRING = '-0', 1, 2)).

@mgartner
Copy link
Collaborator

mgartner commented Jul 21, 2021

Note that this only applies when the input is just a variable. In general, the input expression might truly take on unequal values (e.g. IF(f::STRING = '-0', 1, 2)).

Isn't this example only composite sensitive because f::STRING is composite sensitive? An IF operator's composite sensitivity is determined by its input.

As a comparison, consider IF(f::INTERVAL = '00:00:00', 1, 2). I think this is not composite sensitive because INTERVAL cast's composite sensitivity is not determined by its input. Since all three inputs to IF are not composite sensitive, then the IF is not composite sensitive.

@ajwerner
Copy link
Contributor

I've added this logic and some plumbing to the relevant check and now my expression is no longer "sensitive". It still isn't getting used properly.

		// Some functions are insensitive to the composite nature of their
		// arguments. In this case, we want to allow composite variable references
		// to be treated insensitively but all other exprs should get the usual
		// treatment.
		//
		// This will deal with cases like:
		//
		//  crdb_internal.key_encode(f4)                          -- not sensitive
		//  crdb_internal.key_encode(IF(f4::STRING = '-0', 0, 1)) -- sensitive
		//
		if funcExpr, ok := e.(*FunctionExpr); ok && funcExpr.Properties.CompositeInsensitive {
			args := funcExpr.Args
			for i, n := 0, args.ChildCount(); i < n; i++ {
				if _, isVariable := args.Child(i).(*VariableExpr); isVariable {
					continue
				}
				if canBeSensitive(args.Child(i)) {
					return true
				}
			}
			return false
		}

@ajwerner
Copy link
Contributor

With @RaduBerinde's help narrowed it down to

if colinfo.HasCompositeKeyEncoding(colTyp) {

Will clean up my PR a bit. The root of the problem is that we never fold constant expressions for types that might be composite. Consider:

CREATE TABLE ti (k INT PRIMARY KEY);
EXPLAIN SELECT * FROM ti ta, ti tb WHERE ta.k = 1 AND tb.k = ta.k - 1;
           info
---------------------------
  distribution: local
  vectorized: true

  • cross join
  │
  ├── • scan
  │     missing stats
  │     table: ti@primary
  │     spans: [/1 - /1]
  │
  └── • scan
        missing stats
        table: ti@primary
        spans: [/0 - /0]

vs.

CREATE TABLE tf (k FLOAT4 PRIMARY KEY);
EXPLAIN SELECT * FROM tf ta, tf tb WHERE ta.k = 1 AND tb.k = ta.k - 1; 
                                         info
---------------------------------------------------------------------------------------
  distribution: full
  vectorized: true

  • lookup join
  │ estimated row count: 1
  │ table: t@primary
  │ equality: (column7) = (k)
  │ equality cols are key
  │
  └── • render
      │ estimated row count: 1
      │
      └── • scan
            estimated row count: 1 (100% of the table; stats collected 3 seconds ago)
            table: t@primary
            spans: [/1.0 - /1.0]

@RaduBerinde
Copy link
Member

Isn't this example only composite sensitive because f::STRING is composite sensitive? An IF operator's composite sensitivity is determined by its input.

What I'm saying is that crdb_internal.key_encode(IF(f::STRING = '-0', 1, 2)) is composite sensitive, whereas crdb_internal.key_encode(f) is not (but also crdb_internal.key_encode(f+1)). So it's not enough to look only at the function in CanBeCompositeSensitive.

@ajwerner
Copy link
Contributor

but also crdb_internal.key_encode(f+1)

I didn't know how to deal with that one without more machinery.

@craig craig bot closed this as completed in d18da6c Aug 21, 2021
@exalate-issue-sync exalate-issue-sync bot added T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) and removed T-sql-schema-deprecated Use T-sql-foundations instead T-sql-queries SQL Queries Team labels May 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants