Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closes #3334, #3351: Simplify server side string code and added fixed length #3335

Merged
merged 6 commits into from
Jun 25, 2024

Conversation

bmcdonald3
Copy link
Contributor

@bmcdonald3 bmcdonald3 commented Jun 14, 2024

In an effort to simplify the Parquet string reading code
to better understand the performance implications of
changes to that code, this PR switches back to the simpler
way of doing things and also adds a "fixed_len" arg on
the client side to skip byte calculation in cases where
the size of each string is known at runtime and
consistent in the entire file.

Additionally, the IO benchmark is updated to properly
handle strings for byte calculation.

Closes #3334
Fixes #3351

@bmcdonald3
Copy link
Contributor Author

Noting that this PR should be merged after #3333 and rebased on top of that

@bmcdonald3 bmcdonald3 marked this pull request as ready for review June 24, 2024 21:59
@stress-tess stress-tess self-requested a review June 25, 2024 01:14
@stress-tess stress-tess changed the title Closes #3334: Simplify server side string code and added fixed length Closes #3334, #3351: Simplify server side string code and added fixed length Jun 25, 2024
Copy link
Member

@stress-tess stress-tess left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe I'm misreading or it's just the variable name is not aligning with what I think it should be, but I don't understand how the segments ends up being what we want in the fixed length case

src/ParquetMsg.chpl Show resolved Hide resolved
Copy link
Member

@stress-tess stress-tess left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good! thanks ben

@stress-tess stress-tess added this pull request to the merge queue Jun 25, 2024
Merged via the queue into Bears-R-Us:master with commit 5223273 Jun 25, 2024
16 checks passed
stonea added a commit to chapel-lang/chapel that referenced this pull request Jul 3, 2024
Annotations ---

- (#25140) introduced shared-memory bypass behavior that caused several
perf. regressions, (#25307) helped resolve that somewhat by reverting
some behavior.

Arkouda annotations:

- (Bears-R-Us/arkouda#3323) reverted a prior PR that caused a perf
regression w/ dataframe indexing

- (Bears-R-Us/arkouda#3335) and (Bears-R-Us/arkouda#3368)
PR (and the fix) that incorrectly changed how the number of bytes
were calculated for string IO benchmark.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Multi-batch read test failures Simplify server side string code and added fixed length
2 participants