forked from JuliaIO/Parquet.jl
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster column reader #1
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The reader does not interpret any logical types. For example, timestamps that are `INT96` values will be represented by default as Julia Int128 types (the next higher type available). Logical type information is also often not present in schema. We could provide additional methods to interpret such fields. They can be applied on the values after they are read. As of now this PR adds methods for timestamp (`logical_timestamp`) and strings (`logical_string`): ```julia julia> for v in values println(logical_string(v.date_string_col), ", ", logical_timestamp(v.timestamp_col)) end 04/01/09, 2009-04-01T12:00:00 04/01/09, 2009-04-01T12:01:00 ```
after discussions in JuliaIO#49, adding an optional `offset` keyword parameter to `logical_timestamp` through which a Dates.Period instance can be passed to be added to each timestamp.
add methods to interpret some logical types
Update README.md
updated thrift definitions
added zstd read support
added ability to read missing values; test added
Updated tests to include test files for zstd compression. Originally created by @ldsands in JuliaIO#41 and available at https://github.com/JuliaIO/parquet-compatibility now.
update tests to add tests for zstd
purge protobuf and thrift conversion of parquet schemas in preparation of moving to named tuples representation.
The `ParFile` reader now accepts an optional `map_logical_types`. ParFile(path; map_logical_types) => ParFile `map_logical_types` can be one of: - `false`: no mapping is done (default) - `true`: default mappings are attempted on all columns (bytearray => String, int96 => DateTime) - A user supplied dict mapping column names to a tuple of type and a converter function
Install TagBot as a GitHub Action
named tuple iterator, fixes for nested structures and column name handling
instead of relying on the directory of the package, because relying on directory makes Parquet.jl static compilation unfriendly.
- Update Thrift dependency to v0.7 - Dependency on ProtoBuf.jl was unnecessary as the method being used from there was already available in Parquet.jl. Dropped it.
update Thrift dependency, drop ProtoBuf dependency
- correct few 32 bit tests that were failing - correct appveyor status link in README to point to JuliaIO/Parquet.jl
- fix condition for missing column values when row can not be located in a column chunk - few performance improvements
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.