-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closes #3334, #3351: Simplify server side string code and added fixed length #3335
Conversation
Noting that this PR should be merged after #3333 and rebased on top of that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe I'm misreading or it's just the variable name is not aligning with what I think it should be, but I don't understand how the segments ends up being what we want in the fixed length case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good! thanks ben
Annotations --- - (#25140) introduced shared-memory bypass behavior that caused several perf. regressions, (#25307) helped resolve that somewhat by reverting some behavior. Arkouda annotations: - (Bears-R-Us/arkouda#3323) reverted a prior PR that caused a perf regression w/ dataframe indexing - (Bears-R-Us/arkouda#3335) and (Bears-R-Us/arkouda#3368) PR (and the fix) that incorrectly changed how the number of bytes were calculated for string IO benchmark.
In an effort to simplify the Parquet string reading code
to better understand the performance implications of
changes to that code, this PR switches back to the simpler
way of doing things and also adds a "fixed_len" arg on
the client side to skip byte calculation in cases where
the size of each string is known at runtime and
consistent in the entire file.
Additionally, the IO benchmark is updated to properly
handle strings for byte calculation.
Closes #3334
Fixes #3351