-
Notifications
You must be signed in to change notification settings - Fork 416
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error when Z Ordering a larger dataset #1459
Labels
bug
Something isn't working
Comments
Got it. That's likely from this line: Which indicates that we have a binary column that is too large for StringArray. Maybe we can use large types for this? Or else interleave the results in chunks. |
Here's a few rows of the data:
Here are the dtypes:
Here's the code I ran: Should have included these details in the initial bug report. |
wjones127
added a commit
that referenced
this issue
Jul 3, 2023
…1461) # Description Fixes the base implementation so that is doesn't materialize the entire result in one record batch. It will still require materializing the full input for each partition in memory. This is mostly a problem for unpartitioned table, since that means materializing the entire table in memory. Adds a new datafusion-based implementation enabled by the `datafusion` feature. In theory, this should support spilling to disk. # Related Issue(s) For example: - closes #1459 - closes #1460 # Documentation <!--- Share links to useful documentation --->
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Environment
Delta-rs version: 0.10.0
Binding: Python
Environment:
Bug
What happened: Z Order command worked on 5 GB h2o groupby dataset (1e8), but errors out of 50 GB dataset (1e9)
What you expected to happen: I expected the Z Ordering to work
How to reproduce it: This notebook shows the computations working well on the 1e8 dataset, but erroring out on the 1e9 dataset.
More details: I'm Z Ordering on a single column. Here's the error message:
The text was updated successfully, but these errors were encountered: