sql: the conversion rules for numeric types to interval are not the same as string to interval #57876
Labels
A-sql-builtins
SQL built-in functions and semantics thereof.
C-bug
Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
E-starter
Might be suitable for a starter project for new employees or team members.
T-sql-foundations
SQL Foundations Team (formerly SQL Schema + SQL Sessions)
Found by @petermattis:
What is happening here:
'2592000'::interval
uses the postgres string to interval conversion rules, which state that a number of seconds converted to interval should not expand to days / months and keep the interval to the hour unit.2592000::interval
usesint -> interval
cast, which is a crdb-specific extension, not recognized by postgres. In Peter's words “so we could fix it”Or can we?
As the person who originally implemented this, I (@knz) 100% agree this was an oversight and should be corrected. The reason why this misdesign exists is that the casts appeared “easy” to implement: take the numeric value, convert it to a
duration.Duration
, then stick that in an interval.The numeric to
duration.Duration
conversion functions were already there.I had not appreciated that these conversion functions were expanding the seconds into days/months. Was that intentional? I don't know. (However, see below.)
How to fix this?
Before we go into details, we need to establish some clear goals:
the conversion behavior must be homogeneous between all numeric types (
int
,decimal
,float
). For example, the same integer value like 2592000 should not yield different intervals depending on whether it's converted to decimal or float first.both conversion directions must remain consistent (they are today). So for example, converting 2592000 to interval and then back to int should yield the same value.
(note however that we're OK if a starting value in interval form, converted to a numeric type, then back to interval, yields a different interval. That is possible and acceptable because the interval type contains more information than a numeric type, so we can expect data loss.)
Can we achieve this?
duration.FromInt64
/AsInt64
are only used by this cast. So we can change that to achieve alignment without fear of side effects.duration.FromBigInt64
/AsBigInt64
is similar.however,
duration.FromFloat64
/AsFloat64
are odd.These are used in multiple places:
AsFloat64
is used when converting from interval withextract(epoch from <interval>)
(this is a postgres standard feature)percentile_cont
, an aggregate function that reports fractions between input values in an aggregation bucket (also a standard pg feature)FromFloat64
is used to generate the interval values forSHOW LAST QUERY STATISTICS
I think case (4) will benefit from the proposed adjustment, because arguably SHOW LAST QUERY STATISTICS wants an interval expressed in hours/minutes/seconds and doesn't care about days/months.
However, I will recommend making sure we have ample testing for cases (2) and (3), with values that exceed a month, and ensure we don't diverge from postgres for those. (Maybe they are already incompatible with pg, I don't know? If they are, then they will also benefit from the fix.)
Of course, we can't let
duration.FromFloat64
do "something different" fromduration.FromInt64
because then the rules are not homogeneous any more.cc @otan for triage.
The text was updated successfully, but these errors were encountered: