Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add dropColumns as TableOperations / TableSpec #3474

Merged
merged 6 commits into from
Mar 10, 2023

Conversation

devinrsmith
Copy link
Member

This is in support of #3473, where for adapting implementation purposes, it's cleaner and easier to use dropColumns.

This is in support of deephaven#3473, where for adapting implementation purposes, it's cleaner and easier to use dropColumns
@devinrsmith devinrsmith added this to the Mar 2023 milestone Mar 1, 2023
@devinrsmith devinrsmith requested a review from rcaudy March 1, 2023 19:55
@devinrsmith devinrsmith self-assigned this Mar 1, 2023
@devinrsmith
Copy link
Member Author

For a bit more context, the SQL query:

my_time = time_table("00:00:01").update_view(
    ["I=ii%10==0?null:ii", "B=ii%11==0?null:ii%3==0"]
)
SELECT 
  * 
FROM 
  my_time 
ORDER BY "B", "Timestamp" DESC

produces the logical plan

LogicalSort(sort0=[$2], sort1=[$0], dir0=[ASC], dir1=[DESC])
  LogicalProject(Timestamp=[$0], I=[$1], B=[$2])
    LogicalTableScan(table=[[my_time]])

notice that the hierarchy produced has sort being executed after the project, yet still refers to the columns by reference.

With the DHC implementation, I'd like to adapt all of the tables into "reference" space, execute the plan, and then remove the extra references.

In this implementation, instead of using a view for the projection, I'm able to use an update_view so that the sort can still proceed in reference space, and then finally by dropping the references as the last step.

digraph {
"op_0" ["label"="ticketTable(scan/my_time)"]
"op_1" ["label"="view(ref_0=Timestamp,ref_1=I,ref_2=B)"]
"op_2" ["label"="updateView(Timestamp=ref_0,I=ref_1,B=ref_2)"]
"op_3" ["label"="sort([ASCENDING(ref_2),DESCENDING(ref_0)])"]
"op_4" ["label"="dropColumns([ref_0,ref_1,ref_2])"]
"op_1" -> "op_0"
"op_2" -> "op_1"
"op_3" -> "op_2"
"op_4" -> "op_3"
}

@devinrsmith
Copy link
Member Author

I'd potentially like to follow this up with rename_columns.

@devinrsmith devinrsmith requested a review from rcaudy March 7, 2023 17:58
@devinrsmith
Copy link
Member Author

So - my current SQL implementation does not rely on dropColumns anymore. That said, it might be nice to get these improvements over the line regardless?

Copy link
Member

@rcaudy rcaudy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caught up to head, still need to discuss open issue.

@devinrsmith devinrsmith requested a review from rcaudy March 8, 2023 19:41
@devinrsmith devinrsmith merged commit c146b6c into deephaven:main Mar 10, 2023
@devinrsmith devinrsmith deleted the drop-columns branch March 10, 2023 17:11
@github-actions github-actions bot locked and limited conversation to collaborators Mar 10, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants