-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make parquet
support optional
#7653
Comments
The same thing probably applies to CSV and JSON support |
If someone wanted a fun software engineering challenge that is well specified, I think this one is pretty reasonable It isn't a great "new to rust programming" project, but it is a good "new to datafusion" project for someone already familiar with rust |
arrow-rs dependency may make this a bit tricky as they don't have a parquet feature flag. |
Parquet depends on arrow-rs not the other way round? |
sorry I misunderstood the issue linked. Please disregard. |
Maybe this issue needs reopening because it no longer compiles without default features with following error: error[E0412]: cannot find type `ParquetSink` in this scope
--> datafusion/proto/src/physical_plan/mod.rs:962:32
|
962 | let data_sink: ParquetSink = sink
| ^^^^^^^^^^^ not found in this scope
|
help: consider importing one of these items
|
18 + use crate::protobuf::ParquetSink; |
@fudini -- thanks for the report. Can you provide a command to reproduce this error? |
Sure, it's: |
Thank you -- filed #8844 |
Is your feature request related to a problem or challenge?
DataFusion aspires to be a modular query engine, and not all users need support for parquet
The parquet crate has a non trivial number of dependencies (some of which prevent compiling DataFusion to WASM -- see #7652)
Also there have been reports like #2042 where some of the native dependencies like
zstd
cause build issuesDescribe the solution you'd like
I would like to make
parquet
support optional, the same wayavro
support isAvro is marked as
optional
: https://github.com/apache/arrow-datafusion/blob/5f38135d5d21160d6b1ef7213578dd5eddfa4f95/datafusion/core/Cargo.toml#L53It would be great to mark
parquet
as optional toohttps://github.com/apache/arrow-datafusion/blob/5f38135d5d21160d6b1ef7213578dd5eddfa4f95/datafusion/core/Cargo.toml#L82
In order to make this work, we would likely need to encapsulate the parquet code in a more modular fashion (rather than sprinkling
#[cfg(feature = parquet)]
all over the codeDescribe alternatives you've considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: