Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include DuckDB and use it for previewing tabular data (parquet, csv, json, ...) #8447

Open
3 tasks done
keen85 opened this issue Feb 3, 2025 · 0 comments
Open
3 tasks done
Labels
💡 feature request New feature or request

Comments

@keen85
Copy link

keen85 commented Feb 3, 2025

Preflight Checklist

Problem

Current preview functionality for data files (parquet, csv, json, ...) is limited:

  • preview only works for small files
  • data is downloaded to the client
  • preview has problems with specific data types (e.g. timestamps in parquet)

Desired Solution

DuckDB is a lightweight analytical engine. It is able to query various file types and also features an extension so it works with Azure Storage seamlessly. Using DuckDB for previewing data from Azure Storage will have several benefits:

  • less traffic: DuckDB does not require downloading the entire file but reads only parts of it
  • DuckDB is already able to handle even complex data types
  • integrating DuckDB into Azrue Storage Explorer theoretically would also allow users writing and executing more sophisticated queries than SELECT * FROM blob LIMIT 100

Alternatives and Workarounds

Users can install DuckDB manually, use Azure Storage Explorer to obtain a blob url, open DuckDB CLI and use the blob url to query a file from Azure Storage.

Additional Context

@keen85 keen85 added the 💡 feature request New feature or request label Feb 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💡 feature request New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant