-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: df patched upgrade to 2024-03-05, requiring new DF fixes #3
WIP: df patched upgrade to 2024-03-05, requiring new DF fixes #3
Conversation
… dictionaries (apache#9679) * Add test for multiple count distincts on a dictionary * Fix accumulator merge bug * Fix cleanup code
…r common subexpr elimination optimization (apache#9685) * test(9678): reproducer of short-circuiting causing expr elimination to error * fix(9678): populate visited stack for short-circuited expressions, during the common-expr elimination optimization * test(9678): reproducer for optimizer error (in common_subexpr_eliminate), as seen in other test case * chore: extract id_array into abstraction, to make it more clear the relationship between the two visitors * refactor: tweak the fix and make code more explicit (JumpMark, node_to_identifier) * fix: get the series_number and curr_id with the correct self.current_idx, before the various incr/decr * chore: remove unneeded conditional check (already done earlier), and add code comments * Refine documentation in common_subexpr_eliminate.rs * chore: cleanup -- fix 1 doc comment and consolidate common-expr-elimination test with other expr test --------- Co-authored-by: Andrew Lamb <[email protected]>
… not always stay in sync with the updated TreeNode traversal
…, while keeping the (stack-popped) symbol used for alias.
This failure is due to the new rust version -- it was fixed upstream in apache#9725 (discussion https://github.com/apache/arrow-datafusion/pull/9725/files#r1534400422) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @wiedld
I think we should file a ticket upstream in DataFusion about this issue (even if we don't yet have a reproducer)
I looked at the changes in d59a8de and 049bf09 -- they look (very) nice to me but I am really not an an expert in this code. If all the tests pass, the only issue i can foresee is if it slows down performance somehow but we can do those tests as part of the datafusion PR
I think we need to fix them upstream so we can get a full CI run (this branch looks like it is failing CI due to necessary fixes for clippy in later releases)
Thus what I suggest is:
- File a ticket upstream (even if you don't have a precise reproducer, simply file a ticket with whatever description we have)
- Start making a PR upstream (the trick will be coming up with a self-contained reproducer)
All in all, great work 🕵️♀️ 🎸
…expr_set, while keeping the (stack-popped) symbol used for alias." This reverts commit 049bf09.
…ing does not always stay in sync with the updated TreeNode traversal" This reverts commit d59a8de.
…re-find the correct expression during re-write. (apache#9871) * test(9870): reproducer of error with jumping traversal patterns in common-expr-elimination traversals * refactor: remove the IdArray ordered idx, since the idx ordering does not always stay in sync with the updated TreeNode traversal * refactor: use the only reproducible key (expr_identifer) for expr_set, while keeping the (stack-popped) symbol used for alias. * refactor: encapsulate most of the logic within ExprSet, and delineate the expr_identifier from the alias symbol * test(9870): demonstrate that the sqllogictests are now passing
… exist on main, but do exist at 2024-03-05
* fix: Remove supported coalesce types * Use comparison_coercion * Fix test * Fix * Add comment * More * fix
I looked at the CI failures of 581e747 and they appear related to either clippy on the examples or that the examples need to be updated for a new version of chrono https://github.com/wiedld/arrow-datafusion/actions/runs/8510669631/job/23310350970 Thus I conclude that the code on this branch is passing CI and this defect free |
Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days. |
No longer needed. Closing. |
Below is edited, since a patch was merged into main
What's (was) in this branch:
When testing against iox, we have been finding patches needed in DF. This is a branch for datafusion through EOD 2024-03-05, and then layering on patches needed.
Starting at datafusion main branch commit from March 5th 2024:
Then we added these commits:
COUNT(DISTINCT..)
aggregates on dictionaries (Fix incorrect results with multipleCOUNT(DISTINCT..)
aggregates on dictionaries apache/datafusion#9679), merged into datafusion on 2024-03-19.And a new patch, based upon a newly found bug:
10463#issuecomment-2024334683).
This new patch^^, merged into DF main on 2024-03-31, no longer had 2 methods which existed at 2024-03-05. Therefore, those two methods were patched (just for this 2024-03-05 branch):
The new DF patch (merged into main) also included a test using coalesce. This test relies upon a bug fix merged into main on March 7th (and not available on this 2024-03-05 branch). Added that patch too:
add the clippy build fix