You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As mentioned by @djouallah in this PR, there are some queries where Parquet outperforms Delta Lake for DataFusion.
I mentioned in the thread how data for a certain query can be optimally distributed in a Parquet file and poorly distributed in a Delta table which might cause these differences.
In any case, I think it would be useful to have some benchmarks that show the performance differences of some queries on a Parquet file vs Delta Lake. The TPCH queries in this notebook seem like a reasonable starting point.
Some benchmarks showing some realistic end-to-end query patterns would be cool too, for example:
convert a CSV file to Parquet / Delta Lake
Delete some rows
Upsert some data
Run a query
The text was updated successfully, but these errors were encountered:
now I have maybe a more useful use case, if you compare glaredb which uses datafusion and delta_rs vs datafusion with dataset generated by delta-rs python, there is a non trivial difference
now I have maybe a more useful use case, if you compare glaredb which uses datafusion and delta_rs vs datafusion with dataset generated by delta-rs python, there is a non trivial difference
@djouallah can you try out polars-deltalake and share if you see improvements there?
As mentioned by @djouallah in this PR, there are some queries where Parquet outperforms Delta Lake for DataFusion.
I mentioned in the thread how data for a certain query can be optimally distributed in a Parquet file and poorly distributed in a Delta table which might cause these differences.
In any case, I think it would be useful to have some benchmarks that show the performance differences of some queries on a Parquet file vs Delta Lake. The TPCH queries in this notebook seem like a reasonable starting point.
Some benchmarks showing some realistic end-to-end query patterns would be cool too, for example:
The text was updated successfully, but these errors were encountered: