-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parquet writer column_size() should return a size_t #12870
Parquet writer column_size() should return a size_t #12870
Conversation
Pull requests from external contributors require approval from a |
Wanted to get this out fast for testing. I still have to add a unit test. |
/ok to test |
Thanks for the quick PR @etseidl ! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
Thanks @etseidl, I'll test this out today and get back |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/merge |
Sorry for the bug @galipremsagar |
Since CI hates me, should I push the unit test I wrote for this? @vuule |
How big/slow is the test? My concern is that it requires a lot of device memory to execute, so it would not run on all GPUs. |
Well, it writes 300M int64's, so 2.4GB. Takes about 1.5sec on my A6000. nsys says 24GB memory used 😮 |
FWIW, that's the size of the pool, so peak memory use could be a lot lower. Still, I have an 8GB GPU and it's unlikely that it's enough for a 2.4GB file. So let's leave this one without a test, the fix looks very safe anyway. |
/merge |
/ok to test |
/ok to test |
1 similar comment
/ok to test |
Description
Fixes #12867.
Bug introduced in #12685. A calculation of total bytes in a column was returned in a 32-bit
size_type
rather than 64-bitsize_t
leading to overflow for tables with many millions of rows.Checklist