-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DateTime writer support #108
Comments
Hmm, I think you can look at the writer.jl file and also it would be good to link the relevant DateTime support page from parquet format. Can you write some parquet using datetime in python or R can and provide some simple files for testing? |
Okay I was able to create some example files using Link to Apache Parquet docs about the Date / Time Logical Types: https://github.com/apache/parquet-format/blob/master/LogicalTypes.md I'm not sure how soon I'll be able to dig into this, but I'll leave this info here for myself or anyone else that wants to take a crack at it in the meantime. It looks like there is also some work to be done to support reading a few of the date / time types as well. Python script for generating parquet files with datetime columns: import pyarrow as pa
import pyarrow.parquet as pq
import pandas as pd
if __name__ == "__main__":
weeks = pd.date_range(start="2000-01-01", periods=26, freq="W")
hours = pd.date_range(start="2000-01-01", periods=26, freq="H")
data = pd.DataFrame(
{
"ns": list(weeks),
"ms": list(weeks),
"us": list(weeks),
"date": weeks.date,
"time": hours.time,
}
)
schema = pa.schema(
[
pa.field("ns", pa.timestamp("ns")),
pa.field("ms", pa.timestamp("ms")),
pa.field("us", pa.timestamp("us")),
pa.field("date", pa.date64()),
pa.field("time", pa.time64("ns")),
]
)
table = pa.Table.from_pandas(data, schema=schema)
pq.write_table(table, "example-v1.parquet")
pq.write_table(table, "example-v2.parquet", version="2.0")
v1_file = pq.ParquetFile('example-v1.parquet')
v2_file = pq.ParquetFile('example-v2.parquet')
print(v1_file.schema)
print(v2_file.schema) Output:
|
Being able to write datetimes is a crucial for a lot of data science applications. I'm still forced to call Python for this which makes it very hard to scale any timeseries Julia solution that has to interact with other components that speak parquet. |
I wonder if the parquet2.jl implementation solves this? |
I know that you mention in the README that a few types are not supported by the writer as of now including date-like types. I didn't see any issues referencing this so I wanted to add one to track its status.
Is this planned to be supported soon? If not, what needs to happen in order to support this? I might be able to create a PR at some point.
The text was updated successfully, but these errors were encountered: