SELECT from a table with hash sharded PK has unexpected index join in query plan #67170

Azhng · 2021-07-02T02:30:56Z

Suppose I have a table with hash sharded PK and a secondary index as follow:

create table t (
  k1 int,
  k2 bytes,
  k3 int,
  k4 int,
  k5 int,
  v1 int,
  v2 int,
  primary key (k1, k2, k3, k4, k5) using hash with bucket_count = 8,
  index (k2, k1, k3, k4, k5)
);

Running the following query for the table t contains an unexpected index join:

root@127.0.0.1:26257/movr> explain select
  v1
from t
where k1 = 1 AND
 k2 = ''::bytes AND
 k3 = 3 AND
 k4 = 4 AND
 k5 = 5;
                                        info
------------------------------------------------------------------------------------
  distribution: local
  vectorized: true

  • index join
  │ estimated row count: 1
  │ table: t@primary
  │
  └── • scan
        estimated row count: 1 (100% of the table; stats collected 40 minutes ago)
        table: t@t_k2_k1_k3_k4_k5_idx
        spans: [/'\x'/1/3/4/5/0 - /'\x'/1/3/4/5/7]
(11 rows)

Time: 1ms total (execution 1ms / network 0ms)

However, if the column type for k2 is changed from BYTES to STRING, then the query plan will only scan the primary index:

root@127.0.0.1:26257/movr> explain select
  v1
from t
where k1 = 1 AND
 k2 = '' AND
 k3 = 3 AND
 k4 = 4 AND
 k5 = 5;

                    info
--------------------------------------------
  distribution: local
  vectorized: true

  • scan
    missing stats
    table: t@primary
    spans: [/4/1/''/3/4/5 - /4/1/''/3/4/5]
(7 rows)

Time: 1ms total (execution 1ms / network 0ms)

Epic: CRDB-7363

The text was updated successfully, but these errors were encountered:

ajwerner · 2021-07-02T02:43:41Z

I'm pretty sure that the problem is that we treat that bytes->string cast as not immutable. I'm not totally sure why this is -- I think it's to deal with cases where the string type has a width.

That check is here:

cockroach/pkg/sql/opt/norm/fold_constants_funcs.go

Line 359 in 9a11535

volatility, ok := tree.LookupCastVolatility(input.DataType(), typ)

then

cockroach/pkg/sql/sem/tree/casts.go

Line 176 in 9a11535

    
           {from: types.BytesFamily, to: types.StringFamily, volatility: VolatilityStable},

The options are:

Rework the generated expression to not cast bytes but rather hash them directly (seems good) or to special case the cast logic to make it immutable if the string has infinite width (also seems good IIUC). The former seems safer and easier.

ajwerner · 2021-07-02T02:52:33Z

Yeah so seems like this is a problem for the following types: BYTES, FLOAT, and TIMESTAMPTZ. It should be easy to add special logic to knock out the casts. For the bytes, we can just hash them without the stirng casts. For the floats and timestamptz's, we can cast them to decimal and integer respectively. It should take just a small amount of plumbing.

RaduBerinde · 2021-07-02T04:00:26Z

I'm pretty sure that the problem is that we treat that bytes->string cast as not immutable. I'm not totally sure why this is -- I think it's to deal with cases where the string type has a width.

It's because it depends on a knob (BytesEncodeFormat, controlled by bytea_output). See https://www.postgresql.org/docs/10/datatype-binary.html

Similarly, for FLOAT there is some precision setting (extra_float_digits).

TIMESTAMPTZ::STRING contains the current timezone. It can be fixed by doing (t AT TIME ZONE 'utc')::STRING.

By the way, the way to find out these things is to find the case in PerformCast and check if and how ctx is used in the conversion.

String conversion is a bad idea in other cases too: you can have decimals 1.0 and 1.00 and their strings are different but they are equal. Similar issue with collated strings.

Maybe we should extend fnv32 (or add some new variant) to be able to take any type that can be indexed, and hash its key encoding?

RaduBerinde · 2021-07-02T04:07:10Z

Or, perhaps more elegant - have a key_encoding() function that converts any indexable type to BYTES (and allow fnv32() to operate on BYTES directly).

ajwerner · 2021-07-02T04:07:19Z

Maybe we should extend fnv32 (or add some new variant) to be able to take any type that can be indexed, and hash its key encoding?

Alternatively we could expose a function that turns any type into bytes using its key encoding. fnv32 takes bytes.

ajwerner · 2021-07-02T04:07:42Z

Jinx

RaduBerinde · 2021-07-02T04:07:49Z

Great minds think alike (and apparently rarely sleep).

rytaft · 2021-07-13T18:50:34Z

Assigning this to schema for triage -- let us know if you need any help from SQL Queries here.

ajwerner · 2021-07-20T14:05:38Z

The function we should use to encode the bytes is this one:

cockroach/pkg/sql/rowenc/column_type_encoding.go

Lines 42 to 52 in 2a91e74

    
           // EncodeTableKey encodes `val` into `b` and returns the new buffer. 
        
           // This is suitable to generate index/lookup keys in KV. 
        
           // 
        
           // The encoded value is guaranteed to be lexicographically sortable, 
        
           // but not guaranteed to be round-trippable during decoding: some 
        
           // values like decimals or collated strings have composite encoding 
        
           // where part of their value lies in the value part of the key/value 
        
           // pair. 
        
           // 
        
           // See also: docs/tech-notes/encoding.md, EncodeTableValue(). 
        
           func EncodeTableKey(b []byte, val tree.Datum, dir encoding.Direction) ([]byte, error) {

ajwerner · 2021-07-21T17:04:41Z

I took a stab at this by first adding the function and then trying to hook it up. What I discovered when testing is that we still don't properly propagate the functional dependencies for a bunch of types. The crux of the problem, I think can be summarized with the following example:

CREATE TABLE intkey (
    k INT4, shard INT4 AS (mod(fnv32((k::INTERVAL::STRING)), 8)) STORED,
    PRIMARY KEY (shard, k),
    CONSTRAINT c CHECK (shard IN (0, 1, 2, 3, 4, 5, 6, 7))
);
CREATE TABLE floatkey (
    k FLOAT4, shard INT4 AS (mod(fnv32((k::INTERVAL::STRING)), 8)) STORED,
    PRIMARY KEY (shard, k),
    CONSTRAINT c CHECK (shard IN (0, 1, 2, 3, 4, 5, 6, 7))
);

We use ::INTERVAL::STRING as a stand in for an immutable function that gives us a string value for both float4 and int4. I have confirmed that the volatility is immutable for these casts. Also, during planning, I added logging for all volatilities added into a VolatilitySet and none are below Immutable:

I210721 17:01:20.934709 1434 sql/opt/optbuilder/builder.go:262  [n1,client=127.0.0.1:49288,hostssl,user=root] 609  building stmt EXPLAIN (VERBOSE) SELECT * FROM floatkey WHERE k = 1
I210721 17:01:20.934735 1434 sql/opt/optbuilder/builder.go:262  [n1,client=127.0.0.1:49288,hostssl,user=root] 610  building stmt EXPLAIN (VERBOSE) SELECT * FROM floatkey WHERE k = 1
I210721 17:01:20.943019 1434 sql/opt/props/volatility.go:90  [-] 620  adding volatility leak-proof
I210721 17:01:20.943043 1434 sql/opt/props/volatility.go:90  [-] 621  adding volatility leak-proof
I210721 17:01:20.943094 1434 sql/sem/tree/casts.go:361  [-] 622  cast int to int4: immutable true
I210721 17:01:20.943109 1434 sql/opt/props/volatility.go:90  [-] 623  adding volatility immutable
I210721 17:01:20.943115 1434 sql/opt/props/volatility.go:90  [-] 624  adding volatility immutable
I210721 17:01:20.943120 1434 sql/opt/props/volatility.go:90  [-] 625  adding volatility leak-proof
I210721 17:01:20.943126 1434 sql/sem/tree/casts.go:361  [-] 626  cast interval to string: immutable true
I210721 17:01:20.943134 1434 sql/opt/props/volatility.go:90  [-] 627  adding volatility immutable
I210721 17:01:20.943139 1434 sql/sem/tree/casts.go:361  [-] 628  cast float4 to interval: immutable true
I210721 17:01:20.943145 1434 sql/opt/props/volatility.go:90  [-] 629  adding volatility immutable
I210721 17:01:20.943173 1434 sql/opt/props/volatility.go:90  [-] 630  adding volatility leak-proof
I210721 17:01:20.943202 1434 sql/opt/optbuilder/builder.go:278  [n1,client=127.0.0.1:49288,hostssl,user=root] 631  done building stmt EXPLAIN (VERBOSE) SELECT * FROM floatkey WHERE k = 1
I210721 17:01:20.943235 1434 sql/opt/optbuilder/builder.go:305  [n1,client=127.0.0.1:49288,hostssl,user=root] 632  done building stmt EXPLAIN (VERBOSE) SELECT * FROM floatkey WHERE k = 1

Best guess of the moment is that it's something in

cockroach/pkg/sql/opt/xform/select_funcs.go

Line 393 in b4fad78

func (c *CustomFuncs) tryFoldComputedCol(

in

cockroach/pkg/sql/opt/xform/general_funcs.go

Line 240 in b4fad78

func (c *CustomFuncs) computedColFilters(

cc @mgartner

RaduBerinde · 2021-07-21T18:28:46Z

What is the problem in that example? Do you have an example of a bad query plan?

We use ::INTERVAL::STRING as a stand in for an immutable function that gives us a string value for both float4 and int4

A bit off-topic, but this might change in a future release (we're trying to add support for some "interval style" setting, see #67792). Shouldn't we go down the encode_key() path instead of string casts?

RaduBerinde · 2021-07-21T18:32:40Z

Oh, is the problem that the int version works as intended and float doesn't?

I think it may have to do with float being a "composite type". This is so silly and annoying that it hurts me to type it. Composite types are types where you can have non-identical values that are equal. Collated strings are the canonical example but easier to see are decimals (1.0 = 1.00 but they present differently). Float is composite because apparently there is such a thing as -0 which is equal to 0 but presents differently. Composite types restrict what the optimizer can do w.r.t replacing their values etc.

ajwerner · 2021-07-21T18:57:30Z

A bit off-topic, but this might change in a future release (we're trying to add support for some "interval style" setting, see #67792). Shouldn't we go down the encode_key() path instead of string casts?

Yes, I only posted the ::INTERVAL thing because it exists in master today. See #67865 where I added key_encode and tried to adopt it. It hits the exact same problem.

Oh, is the problem that the int version works as intended and float doesn't?

Yes

RaduBerinde · 2021-07-21T19:00:25Z

The optimizer has the concept of whether an expression is "composite-sensitive" or not. We could special case the key_encode function as a function that is not composite sensitive even if its input variable is composite.

https://github.com/cockroachdb/cockroach/blob/master/pkg/sql/opt/memo/logical_props_builder.go#L2608

mgartner · 2021-07-21T20:21:21Z

It seems like the ::INTERVAL actually makes the expression not truly composite-sensitive.

select (-0):::float4, (0):::float4, (-0):::float4::interval, (0):::float4::interval;
  float4 | float4 | interval | interval
---------+--------+----------+-----------
      -0 |      0 | 00:00:00 | 00:00:00

If so, maybe we can adjust our composite-sensitive logic to understand that certain operators are not composite-sensitive even if their input is.

ajwerner · 2021-07-21T20:28:50Z

Yeah, for sure, that was a distraction. I have what I need and am making good progress. Thanks for all the help!

RaduBerinde · 2021-07-21T20:32:01Z

If so, maybe we can adjust our composite-sensitive logic to understand that certain operators are not composite-sensitive even if their input is.

Note that this only applies when the input is just a variable. In general, the input expression might truly take on unequal values (e.g. IF(f::STRING = '-0', 1, 2)).

mgartner · 2021-07-21T21:21:45Z

Note that this only applies when the input is just a variable. In general, the input expression might truly take on unequal values (e.g. IF(f::STRING = '-0', 1, 2)).

Isn't this example only composite sensitive because f::STRING is composite sensitive? An IF operator's composite sensitivity is determined by its input.

As a comparison, consider IF(f::INTERVAL = '00:00:00', 1, 2). I think this is not composite sensitive because INTERVAL cast's composite sensitivity is not determined by its input. Since all three inputs to IF are not composite sensitive, then the IF is not composite sensitive.

ajwerner · 2021-07-21T21:38:08Z

I've added this logic and some plumbing to the relevant check and now my expression is no longer "sensitive". It still isn't getting used properly.

		// Some functions are insensitive to the composite nature of their
		// arguments. In this case, we want to allow composite variable references
		// to be treated insensitively but all other exprs should get the usual
		// treatment.
		//
		// This will deal with cases like:
		//
		//  crdb_internal.key_encode(f4)                          -- not sensitive
		//  crdb_internal.key_encode(IF(f4::STRING = '-0', 0, 1)) -- sensitive
		//
		if funcExpr, ok := e.(*FunctionExpr); ok && funcExpr.Properties.CompositeInsensitive {
			args := funcExpr.Args
			for i, n := 0, args.ChildCount(); i < n; i++ {
				if _, isVariable := args.Child(i).(*VariableExpr); isVariable {
					continue
				}
				if canBeSensitive(args.Child(i)) {
					return true
				}
			}
			return false
		}

ajwerner · 2021-07-21T22:05:14Z

With @RaduBerinde's help narrowed it down to

cockroach/pkg/sql/opt/xform/general_funcs.go

Line 317 in 6cdb10c

if colinfo.HasCompositeKeyEncoding(colTyp) {

Will clean up my PR a bit. The root of the problem is that we never fold constant expressions for types that might be composite. Consider:

CREATE TABLE ti (k INT PRIMARY KEY);
EXPLAIN SELECT * FROM ti ta, ti tb WHERE ta.k = 1 AND tb.k = ta.k - 1;
           info
---------------------------
  distribution: local
  vectorized: true

  • cross join
  │
  ├── • scan
  │     missing stats
  │     table: ti@primary
  │     spans: [/1 - /1]
  │
  └── • scan
        missing stats
        table: ti@primary
        spans: [/0 - /0]

vs.

CREATE TABLE tf (k FLOAT4 PRIMARY KEY);
EXPLAIN SELECT * FROM tf ta, tf tb WHERE ta.k = 1 AND tb.k = ta.k - 1; 
                                         info
---------------------------------------------------------------------------------------
  distribution: full
  vectorized: true

  • lookup join
  │ estimated row count: 1
  │ table: t@primary
  │ equality: (column7) = (k)
  │ equality cols are key
  │
  └── • render
      │ estimated row count: 1
      │
      └── • scan
            estimated row count: 1 (100% of the table; stats collected 3 seconds ago)
            table: t@primary
            spans: [/1.0 - /1.0]

RaduBerinde · 2021-07-21T22:08:00Z

Isn't this example only composite sensitive because f::STRING is composite sensitive? An IF operator's composite sensitivity is determined by its input.

What I'm saying is that crdb_internal.key_encode(IF(f::STRING = '-0', 1, 2)) is composite sensitive, whereas crdb_internal.key_encode(f) is not (but also crdb_internal.key_encode(f+1)). So it's not enough to look only at the function in CanBeCompositeSensitive.

ajwerner · 2021-07-21T22:12:31Z

but also crdb_internal.key_encode(f+1)

I didn't know how to deal with that one without more machinery.

Azhng added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Jul 2, 2021

blathers-crl bot added the T-sql-queries SQL Queries Team label Jul 2, 2021

rytaft assigned mgartner Jul 13, 2021

blathers-crl bot added the T-sql-schema-deprecated Use T-sql-foundations instead label Jul 13, 2021

rytaft unassigned mgartner Jul 13, 2021

ajwerner mentioned this issue Jul 21, 2021

sql/sem/builtins: add crdb_internal.datums_to_bytes, use in hash sharding #67865

Merged

craig bot closed this as completed in d18da6c Aug 21, 2021

mgartner mentioned this issue Apr 12, 2022

sql: interval type not using same comparison as sql equality for aggregation or insertion #79549

Open

exalate-issue-sync bot added T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) and removed T-sql-schema-deprecated Use T-sql-foundations instead T-sql-queries SQL Queries Team labels May 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SELECT from a table with hash sharded PK has unexpected index join in query plan #67170

SELECT from a table with hash sharded PK has unexpected index join in query plan #67170

Azhng commented Jul 2, 2021 •

edited by ajwerner

Loading

ajwerner commented Jul 2, 2021

ajwerner commented Jul 2, 2021

RaduBerinde commented Jul 2, 2021 •

edited

Loading

RaduBerinde commented Jul 2, 2021

ajwerner commented Jul 2, 2021

ajwerner commented Jul 2, 2021

RaduBerinde commented Jul 2, 2021

rytaft commented Jul 13, 2021

ajwerner commented Jul 20, 2021

ajwerner commented Jul 21, 2021

RaduBerinde commented Jul 21, 2021

RaduBerinde commented Jul 21, 2021

ajwerner commented Jul 21, 2021

RaduBerinde commented Jul 21, 2021

mgartner commented Jul 21, 2021 •

edited

Loading

ajwerner commented Jul 21, 2021

RaduBerinde commented Jul 21, 2021

mgartner commented Jul 21, 2021 •

edited

Loading

ajwerner commented Jul 21, 2021

ajwerner commented Jul 21, 2021

RaduBerinde commented Jul 21, 2021

ajwerner commented Jul 21, 2021

SELECT from a table with hash sharded PK has unexpected index join in query plan #67170

SELECT from a table with hash sharded PK has unexpected index join in query plan #67170

Comments

Azhng commented Jul 2, 2021 • edited by ajwerner Loading

ajwerner commented Jul 2, 2021

ajwerner commented Jul 2, 2021

RaduBerinde commented Jul 2, 2021 • edited Loading

RaduBerinde commented Jul 2, 2021

ajwerner commented Jul 2, 2021

ajwerner commented Jul 2, 2021

RaduBerinde commented Jul 2, 2021

rytaft commented Jul 13, 2021

ajwerner commented Jul 20, 2021

ajwerner commented Jul 21, 2021

RaduBerinde commented Jul 21, 2021

RaduBerinde commented Jul 21, 2021

ajwerner commented Jul 21, 2021

RaduBerinde commented Jul 21, 2021

mgartner commented Jul 21, 2021 • edited Loading

ajwerner commented Jul 21, 2021

RaduBerinde commented Jul 21, 2021

mgartner commented Jul 21, 2021 • edited Loading

ajwerner commented Jul 21, 2021

ajwerner commented Jul 21, 2021

RaduBerinde commented Jul 21, 2021

ajwerner commented Jul 21, 2021

Azhng commented Jul 2, 2021 •

edited by ajwerner

Loading

RaduBerinde commented Jul 2, 2021 •

edited

Loading

mgartner commented Jul 21, 2021 •

edited

Loading

mgartner commented Jul 21, 2021 •

edited

Loading