-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CALCITE-4918] Add a VARIANT data type #3947
base: main
Are you sure you want to change the base?
Conversation
Can you move (or at least copy) that description to the Jira case? Jira cases are our feature specifications. Minimal features thatI would like to see specified:
|
5759a66
to
44e3d35
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
One useful addition would be a variant.iq
quidem test. With extensive comments, it could read like a tutorial, showing all the things you can do with variants. As a new language feature, specific to Calcite (albeit based on Snowflake) I think it needs better documentation than we usually provide.
import static java.util.Objects.requireNonNull; | ||
|
||
/** | ||
* This class represents the type of a SQL expression at runtime. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove 'This class represents'
* a dynamically-typed value, and needs this kind of information. | ||
* We cannot use the very similar RelDataType type since it carries extra | ||
* baggage, like the type system, which is not available at runtime. */ | ||
public abstract class RuntimeTypeInformation { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get the impression that RuntimeTypeInformation
is connected with either the enumerable engine, or as part of the Java API (which is used to define Java UDFs). If it is either of those, I don't think it belongs in (a subpackage of) the util package.
} | ||
|
||
/** | ||
* Create and return an expression that creates a runtime type that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/Create/Creates/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... and generally, please follow the best practice of using third-person indicative for method javadoc.
/** This class is the runtime support for values of the VARIANT SQL type. */ | ||
public class Variant { | ||
/** Actual value. */ | ||
final @Nullable Object value; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Making value
nullable probably complicates a lot of code. (If the experience with RexLiteral
is anything to go by.) Could you have a subclass for null variants (or some other solution)?
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
*/ | ||
package org.apache.calcite.util; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The util
package feels like the wrong place (for the reasons I described in rtti
).
How about org.apache.calcite.runtime
? The classes in that package are intended for use by generated code, rather than user code. But it has a similar goal to have few dependencies.
(DateString
, TimeString
, TimestampString
should have been there too.)
@julianhyde I have hopefully implemented your suggestions. The structure is much better this way. |
Implicitly cast is an important part of VARIANT type, will it be supported in this PR? edit: I just noticed you mention this in Jira. |
Yes, let's leave implicit casts for later. |
site/_docs/reference.md
Outdated
| Type | Description | Example type | ||
|:-------- |:---------------------------|:--------------- | ||
| ANY | The union of all types | | ||
| Type | Description | Example type | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our markdown style is to not have a |
after the last column. Can you remove it.
(Intellij does not respect that style, so its edits need to be reverted.)
Also revert the changes in spacing, which become spurious diffs.
@mihaibudiu Thanks for all your edits. Much improved. +1 after you fix the cosmetic issues in reference.md. |
I will leave this PR open for a few more days in case people want to comment. Otherwise I have tagged it with "merge soon". |
78bff8d
to
a35f642
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mihaibudiu I reviewed the comments and left some minor comment. I think this PR can be merged.
@@ -1722,6 +1722,68 @@ void testCastToBoolean(CastType castType, SqlOperatorFixture f) { | |||
f.checkNull("cast(null as row(f0 varchar, f1 varchar))"); | |||
} | |||
|
|||
/** Test cases for | |||
* <a href="https://issues.apache.org/jira/browse/CALCITE-4918"> | |||
* [CALCITE-4918] Add a VARIANT data type</a>. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a redundant space
|
||
import org.checkerframework.checker.nullness.qual.Nullable; | ||
|
||
/** The VARIANT type has its own notion of null, which is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe adding a line break would be better.
af0ff63
to
5827923
Compare
2ee9768
to
9b88b5b
Compare
Signed-off-by: Mihai Budiu <[email protected]>
Signed-off-by: Mihai Budiu <[email protected]>
Signed-off-by: Mihai Budiu <[email protected]>
Signed-off-by: Mihai Budiu <[email protected]>
Signed-off-by: Mihai Budiu <[email protected]>
Signed-off-by: Mihai Budiu <[email protected]>
Signed-off-by: Mihai Budiu <[email protected]>
Signed-off-by: Mihai Budiu <[email protected]>
Quality Gate passedIssues Measures |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's really a complicated PR. We may be able to fill in some optimizations about VARIANT in some future PRs.
Some example:
CAST(e as VARIANT ) is not null
to e is not null
.
CAST(cast(e as variant) as int)
to cast(e as int)
@@ -866,6 +868,8 @@ void populate1() { | |||
defineMethod(TRUNC_BIG_QUERY, BuiltInMethod.STRUNCATE.method, NullPolicy.STRICT); | |||
defineMethod(TRUNCATE, BuiltInMethod.STRUNCATE.method, NullPolicy.STRICT); | |||
defineMethod(LOG1P, BuiltInMethod.LOG1P.method, NullPolicy.STRICT); | |||
defineMethod(TYPEOF, BuiltInMethod.TYPEOF.method, NullPolicy.STRICT); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TYPEOF
is a new function in CALCITE. Can we use a split Jira issue to support it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VARIANT without TYPEOF is of not much use.
} | ||
|
||
@Override public String getTypeString() { | ||
switch (this.typeName) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we merge some types? Most of the names of the types here have not changed.
/** | ||
* The type of a SQL expression at runtime. | ||
* Normally SQL is a statically-typed language, and there is no need for | ||
* runtime-type information. However, the VARIANT data type is actually |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is an extra space before However
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's how I learned to type, emacs encourages this forfmatting for end of sentence.
.columnType("VARIANT NOT NULL"); | ||
expr("cast(TIMESTAMP '2024-09-01 00:00:00' as variant)") | ||
.columnType("VARIANT NOT NULL"); | ||
expr("cast(ARRAY[1,2,3] AS VARIANT)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we convert cast(ARRAY[1,2,3, null] AS VARIANT
, What's the columnType?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I plan to rework a bit this PR to bring it more in line with what Snowflake does.
In this case converting an ARRAY to a variant produces a VARIANT whose value is an array of variant. So the elements are all recursively converted to VARIANT. Same thing is true for MAP.
@@ -1239,14 +1240,66 @@ Note: | |||
| MAP | Collection of keys mapped to values | Example: map < int, varchar > | |||
| MULTISET | Unordered collection that may contain duplicates | Example: int multiset | |||
| ARRAY | Ordered, contiguous collection that may contain duplicates | Example: varchar(10) array | |||
| CURSOR | Cursor over the result of executing a query | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to remove this |
? ANY and UNKNOWN also end with |
;
Looks good. Thanks for adding |
We are actually using this feature extensively in our Rust-based runtime, but our runtime implementation is slightly different. I plan to make the Java implementation a bit closer. This is a very powerful feature. Once you have variant you can do many things which are very difficult otherwise, including recursive data types (like JSON). One other thing we use VARIANT for is error reporting. You can have a dynamic error type (a VARIANT MAP) which can include any other type, including any SQL record inside! |
I agree with your remark that variants are difficult to use without TYPEOF. We could also add something like this (apologies for the crappy syntax, hopefully you get the idea):
I believe Java has something like this and maybe some SQL dialects do too. |
This PR introduces a
VARIANT
SQL data type, based on the design in Snowflake: https://docs.snowflake.com/en/sql-reference/data-types-semistructured.VARIANT
is a much better base to build support for JSON that the existing functions in Calcite. All the existing Calcite JSON functions assume that JSON is stored as an unparsed string.VARIANT
can represent any JSON document in a "binary" form (and much more than JSON!)The PR is divided into two commits:
Turns out that the validator support is almost trivial. Most of the work is in the runtime, where a new kind of value must be introduced, Variant, which carries runtime type information. For this purpose a new class hierarchy has been introduced, rooted at
RuntimeTypeInformation
.Currently the runtime type information carries precision and scale, but I am not sure that these are actually necessary. I will study this a bit more and may amend this in a subsequent commit.
There are no functions supporting
VARIANT
at this point, butVARIANTS
can still be created using casts, and accessed using variant.field, variant[index], and variant['key'].Subsequent pull requests are expected to add more functions. In particular, functions to parse and unparse JSON into variants.