sql: interval type not using same comparison as sql equality for aggregation or insertion #79549

michae2 · 2022-04-06T22:10:55Z

It appears that the INTERVAL type is not using the same comparison for SQL equality as for aggregation. These queries illustrate the problem:

SELECT INTERVAL '1 day';
SELECT INTERVAL '24 hours';
SELECT INTERVAL '1 day' = INTERVAL '24 hours';
SELECT DISTINCT i FROM (VALUES (INTERVAL '1 day'), (INTERVAL '24 hours')) v(i);
SELECT i, COUNT(*) FROM (VALUES (INTERVAL '1 day'), (INTERVAL '24 hours')) v(i) GROUP BY i;
CREATE TABLE t (i INTERVAL PRIMARY KEY);
INSERT INTO t VALUES (INTERVAL '1 day');
INSERT INTO t VALUES (INTERVAL '24 hours');

In PostgreSQL, the two interval values are equal, group together in the same bucket, and count as the same key during insertion. PostgreSQL 14.2:

michae2=# SELECT INTERVAL '1 day';
 interval
----------
 1 day
(1 row)

michae2=# SELECT INTERVAL '24 hours';
 interval
----------
 24:00:00
(1 row)

michae2=# SELECT INTERVAL '1 day' = INTERVAL '24 hours';
 ?column?
----------
 t
(1 row)

michae2=# SELECT DISTINCT i FROM (VALUES (INTERVAL '1 day'), (INTERVAL '24 hours')) v(i);
   i
-------
 1 day
(1 row)

michae2=# SELECT i, COUNT(*) FROM (VALUES (INTERVAL '1 day'), (INTERVAL '24 hours')) v(i) GROUP BY i;
   i   | count
-------+-------
 1 day |     2
(1 row)

michae2=# CREATE TABLE t (i INTERVAL PRIMARY KEY);
CREATE TABLE

michae2=# INSERT INTO t VALUES (INTERVAL '1 day');
INSERT 0 1

michae2=# INSERT INTO t VALUES (INTERVAL '24 hours');
2022-04-06 15:07:42.453 PDT [27581] ERROR:  duplicate key value violates unique constraint "t_pkey"
2022-04-06 15:07:42.453 PDT [27581] DETAIL:  Key (i)=(24:00:00) already exists.
2022-04-06 15:07:42.453 PDT [27581] STATEMENT:  INSERT INTO t VALUES (INTERVAL '24 hours');
ERROR:  duplicate key value violates unique constraint "t_pkey"
DETAIL:  Key (i)=(24:00:00) already exists.

But in CockroachDB, the two interval values are equal, but do not group together in the same bucket or count as the same key during insertion. CockroachDB v22.1.0-beta.1, same for both vectorized engine and row-based engine:

[email protected]:26257/defaultdb> SELECT INTERVAL '1 day';
SELECT INTERVAL '24 hours';
SELECT INTERVAL '1 day' = INTERVAL '24 hours';
SELECT DISTINCT i FROM (VALUES (INTERVAL '1 day'), (INTERVAL '24 hours')) v(i);
SELECT i, COUNT(*) FROM (VALUES (INTERVAL '1 day'), (INTERVAL '24 hours')) v(i) GROUP BY i;
  interval
------------
  1 day
(1 row)


Time: 0ms total (execution 0ms / network 0ms)

  interval
------------
  24:00:00
(1 row)


Time: 0ms total (execution 0ms / network 0ms)

  ?column?
------------
    true
(1 row)


Time: 0ms total (execution 0ms / network 0ms)

     i
------------
  1 day
  24:00:00
(2 rows)


Time: 0ms total (execution 0ms / network 0ms)

     i     | count
-----------+--------
  1 day    |     1
  24:00:00 |     1
(2 rows)


Time: 1ms total (execution 1ms / network 0ms)

[email protected]:26257/defaultdb> CREATE TABLE t (i INTERVAL PRIMARY KEY);
CREATE TABLE


Time: 5ms total (execution 5ms / network 0ms)

[email protected]:26257/defaultdb> INSERT INTO t VALUES (INTERVAL '1 day');
INSERT 1


Time: 7ms total (execution 7ms / network 0ms)

[email protected]:26257/defaultdb> INSERT INTO t VALUES (INTERVAL '24 hours');
INSERT 1


Time: 1ms total (execution 1ms / network 0ms)

Jira issue: CRDB-14903

Epic: CRDB-20062

The text was updated successfully, but these errors were encountered:

rafiss · 2022-04-07T21:41:45Z

Nice find! I wonder if the explanation of the problem in #57876 is related to this bug.

mgartner · 2022-04-12T15:03:34Z

This made me think of composite sensitivity:

cockroach/pkg/sql/opt/memo/logical_props_builder.go

Lines 2657 to 2681 in 8a77fe5

    
           // CanBeCompositeSensitive returns true if a scalar expression could return 
        
           // logically different results because of non-logical differences in outer 
        
           // columns with composite type. 
        
           // 
        
           // Composite values are values that contain more information than the logical 
        
           // value (i.e. the key encoding). Examples are decimals (1.0 = 1.00) and 
        
           // collated strings ('foo' COLLATE en_u_ks_level1 = 'FOO' COLLATE 
        
           // en_u_ks_level1). 
        
           // 
        
           // An example of a composite-sensitive expression is `d::string`, where d is a 
        
           // DECIMAL. 
        
           // 
        
           // A formal definition: 
        
           //   Let (c1,c2,...) be the outer columns of the scalar expression. Let 
        
           //   f(x1,x2,..) be the result of the scalar expression for the given outer 
        
           //   column values. The expression is composite insensitive if, for any two 
        
           //   sets of values (x1,x2,...) and (y1,y2,...) 
        
           //      (x1=y1 AND x2=y2 AND ...) => f(x1,x2,...) = f(y1,y2,...) 
        
           // 
        
           //   Note that this doesn't mean that the final results are always *identical* 
        
           //   just that they are logically equal. 
        
           // 
        
           // This property is used to determine when a scalar expression can be copied, 
        
           // with outer column variable references changed to refer to other columns that 
        
           // are known to be equal to the original columns.

SELECT from a table with hash sharded PK has unexpected index join in query plan #67170 (comment)

And now I'm afraid that there's a number of other related bugs:

The optimizer thinks an expression like i::STRING where i is an INTERVAL is composite insensitive, when it is actually composite sensitive. I'm still trying to work out what the implications of this are - I'll try to produce an incorrect result based on this.
crdb_internal.datums_to_bytes returns different values for '1 day'::INTERVAL and '24 hours'::INTERVAL. This causes incorrect query results for tables with hash-sharded indexes on INTERVAL columns. cc @chengxiong-ruan @ajwerner

Example of (2):

defaultdb> CREATE TABLE t (k INT PRIMARY KEY, i INTERVAL, INDEX (i) USING HASH WITH bucket_count=8);
CREATE TABLE

defaultdb> INSERT INTO t VALUES (1, '24 hours');
INSERT 1

defaultdb> SELECT * FROM t WHERE i = '1 day'::INTERVAL;
  k | i
----+----
(0 rows)

defaultdb> SELECT * FROM t@primary WHERE i = '1 day'::INTERVAL;

  k |    i
----+-----------
  1 | 24:00:00
(1 row)

mgartner · 2022-04-12T15:17:31Z

The problem in (2) above is actually much broader than hash-sharded indexes. It looks like you can get incorrect results with any index on an interval column. On v21.2.7:

defaultdb> CREATE TABLE t (k INT PRIMARY KEY, i INTERVAL, INDEX (i));
CREATE TABLE

defaultdb> INSERT INTO t VALUES (1, '24 hours');
INSERT 1

defaultdb> SELECT * FROM t WHERE i = '1 day'::INTERVAL;
  k | i
----+----
(0 rows)

defaultdb> SELECT * FROM t@primary WHERE i = '1 day'::INTERVAL;

  k |    i
----+-----------
  1 | 24:00:00
(1 row)

michae2 · 2022-04-12T22:16:15Z

More examples of different values that should both compare equal and also bucket together:

'1 MONTH 2 DAYS'::INTERVAL
'32 DAYS'::INTERVAL

'2 MONTHS'::INTERVAL
'60 DAYS'::INTERVAL
'86400 MINUTES'::INTERVAL

INTERVAL '1 year'
INTERVAL '518400 minutes'

(Anyone who has seen Rent might be scratching their head at that last one, but this bit from the postgres docs explains it: "Fractional parts of units greater than months are truncated to be an integer number of months, e.g. '1.5 years' becomes '1 year 6 mons'. Fractional parts of weeks and days are computed to be an integer number of days and microseconds, assuming 30 days per month and 24 hours per day, e.g., '1.75 months' becomes 1 mon 22 days 12:00:00.")

michae2 · 2022-04-12T22:31:49Z

Based on EncodeDurationAscending these problems should only happen when using "improper" intervals in two forms:

seconds + minutes + hours >= 24 hours
seconds + minutes + hours + days >= 30 days

The improper interval doesn't necessarily have to be stored to get incorrect results, it could simply be used as a constant in a query. For example, this gives incorrect results:

[email protected]:26257/defaultdb> CREATE TABLE t (i INTERVAL PRIMARY KEY);
CREATE TABLE


Time: 8ms total (execution 8ms / network 0ms)

[email protected]:26257/defaultdb> INSERT INTO t VALUES ('1 day'::INTERVAL);
INSERT 1


Time: 6ms total (execution 5ms / network 0ms)

[email protected]:26257/defaultdb> SELECT * FROM t WHERE i = '24 hours'::INTERVAL;
  i
-----
(0 rows)


Time: 1ms total (execution 1ms / network 0ms)

mgartner · 2022-05-05T21:49:20Z

A related problem: #80999

otan · 2022-05-24T21:10:01Z

Another related problem: #81577

rail · 2022-05-25T12:39:41Z

Manually synced with Jira

otan mentioned this issue May 23, 2022

Inconsistency with negative intervals #81577

Open

jlinder added the sync-me-3 label May 24, 2022

exalate-issue-sync bot removed the T-sql-queries SQL Queries Team label May 24, 2022

jlinder removed the sync-me-3 label May 27, 2022

mgartner mentioned this issue Jul 8, 2022

sql: fix INTERVAL type representation and encoding #84078

Open

mgartner added this to SQL Queries Jul 24, 2023

mgartner moved this to 23.2 Release in SQL Queries Jul 24, 2023

mgartner moved this from 23.2 Release to 24.1 Release in SQL Queries Aug 3, 2023

yuzefovich added T-sql-queries SQL Queries Team T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) and removed T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) labels Nov 21, 2023

mgartner moved this from 24.1 Release to 24.2 Release in SQL Queries Nov 28, 2023

mgartner moved this from 24.2 Release to Bugs to Fix in SQL Queries Mar 28, 2024

mgartner added the S-2 Medium-high impact: many users impacted, risks of availability and difficult-to-fix data errors label Mar 28, 2024

github-project-automation bot added this to SQL Foundations Aug 28, 2024

github-project-automation bot moved this to Triage in SQL Foundations Aug 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: interval type not using same comparison as sql equality for aggregation or insertion #79549

sql: interval type not using same comparison as sql equality for aggregation or insertion #79549

michae2 commented Apr 6, 2022 •

edited by jlinder

Loading

rafiss commented Apr 7, 2022

mgartner commented Apr 12, 2022 •

edited

Loading

mgartner commented Apr 12, 2022

michae2 commented Apr 12, 2022

michae2 commented Apr 12, 2022

mgartner commented May 5, 2022

otan commented May 24, 2022

rail commented May 25, 2022

sql: interval type not using same comparison as sql equality for aggregation or insertion #79549

sql: interval type not using same comparison as sql equality for aggregation or insertion #79549

Comments

michae2 commented Apr 6, 2022 • edited by jlinder Loading

rafiss commented Apr 7, 2022

mgartner commented Apr 12, 2022 • edited Loading

Example of (2):

mgartner commented Apr 12, 2022

michae2 commented Apr 12, 2022

michae2 commented Apr 12, 2022

mgartner commented May 5, 2022

otan commented May 24, 2022

rail commented May 25, 2022

michae2 commented Apr 6, 2022 •

edited by jlinder

Loading

mgartner commented Apr 12, 2022 •

edited

Loading