Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecates absent types and removes associated planner logic #1463

Merged
merged 5 commits into from
May 28, 2024

Conversation

johnedquinn
Copy link
Member

@johnedquinn johnedquinn commented May 15, 2024

Description

  • As discussed with the rest of the PartiQL maintainers, the null and missing values have leaked into our type semantics -- causing issues when it comes to function resolution, unsafe casts, and dynamic functions.
  • This PR deprecates NullType and MissingType and removes their usage in the planner. The implication is that ALL types may contain the null and missing values. See the CHANGELOG for all deprecated APIs.
  • In this PR, literal NULL and MISSING values are typed as ANY (dynamic). In the future, we may want to pursue the "unknown" type that PostgreSQL internally uses.
  • Smaller changes include:
    • I deprecated AnyOfType's primary constructor. We've been plagued for too long by forgetting to use .flatten() .
    • The majority of this diff is me getting rid of "null" and "missing" in the testFixtures.

Other Information

  • Updated Unreleased Section in CHANGELOG: YES
  • Any backward-incompatible changes? YES
    • Not an API breaking change, but a behavioral one.
  • Any new external dependencies? NO
  • Do your changes comply with the Contributing Guidelines
    and Code Style Guidelines? YES

License Information

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@@ -2144,15 +2136,13 @@ class PlanTyperTestsPorted {
) FROM <<
{ 'a': { 'b': 1 } },
{ 'a': { 'b': 'hello' } },
{ 'a': NULL },
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated this only because the literal NULL's type is unknown. Which leads to the inferred type to be ANY when used as an argument to a literal struct.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not too sure why the empty struct in the input binding tuples doesn't cause ANY for the type. Perhaps that binding tuple is filtered out?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if there's a test showing this but if there's not, could still be helpful to include a test showing that the inferred type for a path expression on a literal struct with a NULL/MISSING value is ANY. E.g. { 'a': NULL }.a -> ANY and { 'a': MISSING }.a -> ANY

)
)
),
SuccessTestCase(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is deleted now since ALL types are nullable.

@johnedquinn johnedquinn force-pushed the main-deprecate-null-missing branch 3 times, most recently from 27e5a55 to d56ed09 Compare May 16, 2024 18:30
Removes all logic regarding absent types in planner
@johnedquinn johnedquinn marked this pull request as ready for review May 16, 2024 18:35
Copy link
Member

@alancai98 alancai98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice work in removing the absent types from the static type modeling. I think it helps simplify a lot of the typing and test fixture involving typing logic quite a bit. comments were mostly minor though I did have some questions about some behavioral changes in tests.

@@ -28,8 +28,25 @@ Thank you to all who have contributed!
### Added

### Changed
- **Behavioral change**: The planner now does NOT support the NullType and MissingType variants of StaticType. The logic
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something for us to consider when releasing this change is that it's a pretty big behavioral change w/ the typing of plans. We should consider cutting a new major version release rather than a minor version release.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm concerned about this as well. We have many customers who assert on current typing judgements, we can't make any typing changes without breaking them.

@@ -635,26 +637,23 @@ class PlanTyperTestsPorted {
SuccessTestCase(
name = "BITWISE_AND_NULL_OPERAND",
query = "1 & NULL",
expected = StaticType.NULL,
expected = unionOf(INT4, INT8, INT),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, curious why this will now give a union of these int types? Similarly for the below case w/ MISSING.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline -- but it has to do with the fact that a literal NULL has an unknown type, which gets interpreted as ANY for our type inference. Therefore, the applicable functions for bitwise_and are only for the integer types that correspond to the LHS being an int_32.

Before this PR, 1 & NULL would be constant folded in the typer, which is wrong, because it doesn't actually give insight to the type being returned (just the value). This change narrows down the type. Though, in the future (or, if you'd like in this PR), we can attempt to find the highest precedence function when the argument is the literal NULL/MISSING value. This is kind of what PostgreSQL does. As it would result in more in-depth logic changes, I'd prefer a follow-up to this PR.

Copy link
Member

@alancai98 alancai98 May 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Thanks for providing some context. Just to clarify the difference of behavior, previously, we would always collapse the types to the unknown type. For example:

  • CASE WHEN ... THEN NULL END -> type null
  • t_int32 & NULL -> type null

But since we get rid of the unknown types in this PR, the behavior the above would get changed to:

  • CASE WHEN ... THEN NULL END -> ANY -- only know that NULL gets returned but don't know anything about the type being returned
  • t_int32 & NULL -> union(int32, int64, int) -- only know that NULL gets returned and we have some information on the type being returned based on the resolvable functions

I'm still wondering if for the second case (t_int32 & NULL) we should still declare the type as NULL though. Consider the following valid query:

(t_int32 & NULL) || t_string

This would previously type the LHS of the concat to type null causing the entire expression to have an output type of null.

However with this PR, since the LHS of the concat would have type union(int32, int64, int) and the RHS of the concat would have type string, this results in an unresolved function for concat(<union(int4, int8, int)>, <string>).

Perhaps we shouldn't narrow the type further if we know an expression always returns NULL/MISSING and just output ANY?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming concatenation of non-textual types isn't allowed, then the unresolved function is the expected behavior. The current behavior of returning the NULL type is a bug.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess what I'm saying here is that the null value is always typed when it comes to static analysis. The only scenario in which the type is unknown is when the literal NULL/MISSING keywords are used. From SQL:1999:

Since the null value is in every data type, the data type of the null value implied by the
keyword NULL cannot be inferred; hence NULL can be used to denote the null value only in
certain contexts, rather than everywhere that a literal is permitted.

To get around this, PostgreSQL represents this scenario with a distinct unknown type. They temporarily mark the null literal as unknown, and when it is input into a function, the type is narrowed to a specific type. We currently do not have the concept of "unknown", though, due to our dynamic nature, ANY gets us very close.

So, consider your example: (t_int32 & NULL) || t_string. In PostgreSQL, NULL would be typed as unknown. When using the binary and operator, the type is clarified and is labeled as an integer. Therefore, the signature looks like: & (int32, int32) -> int32. Then, the input to the concatenation is int32 and varchar. Since PostgreSQL allows for implicit coercion of int to text, it gets resolved.

While I'm in favor of using the unknown type in the future, ANY works as follows: typing the literal NULL results in ANY. The binary and operator has several relevant overloads, namely resulting in int32, int64, int. Therefore, with dynamic dispatch, those three types are the only types able to be returned. Assuming we do not allow concatenation of non-textual types, then we have enough knowledge to fail compilation and return the unresolved function representing concatenation.

@johnedquinn johnedquinn requested a review from RCHowell May 20, 2024 22:40
@@ -28,8 +28,25 @@ Thank you to all who have contributed!
### Added

### Changed
- **Behavioral change**: The planner now does NOT support the NullType and MissingType variants of StaticType. The logic
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm concerned about this as well. We have many customers who assert on current typing judgements, we can't make any typing changes without breaking them.

Comment on lines -306 to -311
val schema = when (node.type) {
Rel.Op.Join.Type.INNER -> l + r
Rel.Op.Join.Type.LEFT -> l + r.pad()
Rel.Op.Join.Type.RIGHT -> l.pad() + r
Rel.Op.Join.Type.FULL -> l.pad() + r.pad()
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you able to remove the padding?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All types are considered nullable, therefore, there isn't a need to pad the types with the NULL type.

Comment on lines +516 to +517
handleAlwaysMissing()
return rex(ANY, rexOpPathKey(root, key))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we be returning a rex with rexOpMissing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, in the v1 branch we can do that. Main doesn't have that concept yet.

// Check Root Type
if (!root.type.mayBeType<StructType>()) {
handleAlwaysMissing()
return rex(ANY, rexOpErr("Symbol lookup may only occur on structs, not ${root.type}."))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly we should have the rexOpMissing here — which is effectively an error node that we handle at runtime based upon evaluation mode.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +550 to +551
handleAlwaysMissing()
return rex(ANY, Rex.Op.Path.Symbol(root, node.key))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing node again

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adds TODO

Simplifies typing of index operator

Updates comment
@johnedquinn johnedquinn requested a review from RCHowell May 23, 2024 00:09
@johnedquinn johnedquinn merged commit 8402db6 into partiql:main May 28, 2024
7 checks passed
@johnedquinn johnedquinn deleted the main-deprecate-null-missing branch May 28, 2024 18:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants