Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Type check with the information from RapidsMeta #2888

Merged
merged 4 commits into from
Jul 14, 2021

Conversation

sperlingxx
Copy link
Collaborator

Signed-off-by: sperlingxx [email protected]

Currently, during type tagging, we get the type information directly from what we have wrapped in the RapidsMeta. It works well for most of expressions and plans. But for some special instances, such as TypedImperativeAggregate, we need to do type conversion based on the raw type. Taking TypedImperativeAggregate as an example, the GPU implementation introduces a different kind of internal aggregation buffer, whose data type is not same as the CPU counterpart.

Therefore, in this PR, we modified the RapidsMetas and type checks, to do type check with the type information from the corresponding methods of RapidsMeta rather than raw data types of wrapped instances.

@sperlingxx sperlingxx requested a review from revans2 July 8, 2021 10:55
Copy link
Collaborator

@revans2 revans2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally I think this is okay. My main problem is that it is not obvious to the end user that we changed the type. We have already been bitten by this with decimal and with timestamps where we do not explain to the user that decimal is disabled or that timestamps are disabled because the time zone is not UTC. I would like to avoid the same problem and ideally tag the types that we changed with a reason why we changed them.

Signed-off-by: sperlingxx <[email protected]>
@sperlingxx
Copy link
Collaborator Author

Generally I think this is okay. My main problem is that it is not obvious to the end user that we changed the type. We have already been bitten by this with decimal and with timestamps where we do not explain to the user that decimal is disabled or that timestamps are disabled because the time zone is not UTC. I would like to avoid the same problem and ideally tag the types that we changed with a reason why we changed them.

Hi @revans2, I involved the reasons of type change via introducing a new class DataTypeMeta. It serves as a metadata, but it doesn't inherit RapidsMeta.

revans2
revans2 previously approved these changes Jul 9, 2021
Copy link
Collaborator

@revans2 revans2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conceptually it looks fine. I would like to see how it ends up being used in practice, but that can wait for the next PR.

@sameerz sameerz added the task Work required that improves the product but is not user facing label Jul 9, 2021
@sperlingxx
Copy link
Collaborator Author

build

@sperlingxx
Copy link
Collaborator Author

build

Copy link
Collaborator

@gerashegalov gerashegalov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sperlingxx sperlingxx merged commit 1a09322 into NVIDIA:branch-21.08 Jul 14, 2021
@sperlingxx sperlingxx deleted the type_check_ex_for_dt branch July 14, 2021 01:51
mythrocks added a commit to mythrocks/spark-rapids that referenced this pull request Jul 16, 2021
Fixes NVIDIA#2939.

When the type-checking for expressions in `RapidsMeta` was changed in NVIDIA#2888 (i.e. commit 1a09322),
`OffsetWindowFunctionMeta` was modified to include `input`, `offset`, and `default` expressions among
its `childExprs`.
Unfortunately, the commit neglected to replicate that change in `Spark311Shims` `OffsetWindowFunctionMeta`.

The error manifests as a failure to do type-checks for `LEAD()`/`LAG()` window functions, with the error:
`java.lang.AssertionError: assertion failed: Lead expected at least 3 but found 0`.

This commit should remedy the situation, and allow `LEAD()`/`LAG()` to function correctly.
mythrocks added a commit to mythrocks/spark-rapids that referenced this pull request Jul 16, 2021
Fixes NVIDIA#2939.

When the type-checking for expressions in `RapidsMeta` was changed in NVIDIA#2888 (i.e. commit 1a09322),
`OffsetWindowFunctionMeta` was modified to include `input`, `offset`, and `default` expressions among
its `childExprs`.
Unfortunately, the commit neglected to replicate that change in `Spark311Shims` `OffsetWindowFunctionMeta`.

The error manifests as a failure to do type-checks for `LEAD()`/`LAG()` window functions, with the error:
`java.lang.AssertionError: assertion failed: Lead expected at least 3 but found 0`.

This commit should remedy the situation, and allow `LEAD()`/`LAG()` to function correctly.

Signed-off-by: Mithun RK <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
task Work required that improves the product but is not user facing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants