Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster column reader #1

Merged
merged 68 commits into from
May 19, 2020
Merged

Faster column reader #1

merged 68 commits into from
May 19, 2020

Conversation

xiaodaigh
Copy link
Owner

No description provided.

JuliaTagBot and others added 30 commits February 8, 2020 20:29
The reader does not interpret any logical types. For example, timestamps that are `INT96` values will be represented by default as Julia Int128 types (the next higher type available). Logical type information is also often not present in schema.

We could provide additional methods to interpret such fields. They can be applied on the values after they are read.

As of now this PR adds methods for timestamp (`logical_timestamp`) and strings (`logical_string`):

```julia
julia> for v in values
        println(logical_string(v.date_string_col), ", ", logical_timestamp(v.timestamp_col))
       end
04/01/09, 2009-04-01T12:00:00
04/01/09, 2009-04-01T12:01:00
```
after discussions in JuliaIO#49, adding an optional `offset` keyword parameter to `logical_timestamp` through which a Dates.Period instance can be passed to be added to each timestamp.
add methods to interpret some logical types
added ability to read missing values; test added
Updated tests to include test files for zstd compression.
Originally created by @ldsands in JuliaIO#41 and available at https://github.com/JuliaIO/parquet-compatibility now.
update tests to add tests for zstd
purge protobuf and thrift conversion of parquet schemas in preparation of moving to named tuples representation.
xiaodaigh and others added 29 commits May 16, 2020 15:26
The `ParFile` reader now accepts an optional `map_logical_types`.

ParFile(path; map_logical_types) => ParFile

`map_logical_types` can be one of:

- `false`: no mapping is done (default)
- `true`: default mappings are attempted on all columns (bytearray => String, int96 => DateTime)
- A user supplied dict mapping column names to a tuple of type and a converter function
Install TagBot as a GitHub Action
named tuple iterator, fixes for nested structures and column name handling
instead of relying on the directory of the package, because relying on directory makes Parquet.jl static compilation unfriendly.
- Update Thrift dependency to v0.7
- Dependency on ProtoBuf.jl was unnecessary as the method being used from there was already available in Parquet.jl. Dropped it.
update Thrift dependency, drop ProtoBuf dependency
- correct few 32 bit tests that were failing
- correct appveyor status link in README to point to JuliaIO/Parquet.jl
- fix condition for missing column values when row can not be located in a column chunk
- few performance improvements
@xiaodaigh xiaodaigh merged commit 8617390 into master May 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants