Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request][Spark][WIP] Metadata only queries - Umbrella issue #2589

Open
2 of 8 tasks
felipepessoto opened this issue Jan 31, 2024 · 3 comments
Open
2 of 8 tasks
Labels
enhancement New feature or request

Comments

@felipepessoto
Copy link
Contributor

felipepessoto commented Jan 31, 2024

Feature request

Which Delta project/connector is this regarding?

  • Spark
  • Standalone
  • Flink
  • Kernel
  • Other (fill in here)

Overview

This is the umbrella issue for metadata only queries.

Motivation

Running queries on metadata only has huge performance improvement, up to 100x in some cases.

Further details

Delta Log provides the following information:

Stats (optional):

  • numRecords
  • minValues
  • maxValues
  • nullValues

PartitionValues:

  • The actual value

Project Plan

ID Task description Issue PR Status Author
1 SELECT COUNT(*) FROM Table #1192 #1377 Done @felipepessoto
2 SELECT MIN(X), MAX(X) FROM Table #2092 #1525 Done @felipepessoto
3 SELECT COUNT(*), MIN(X), MAX(X) FROM table WHERE partition_column = 1 #1916 #3345 Pending @7mming7
4 SELECT partition_column, COUNT(*) FROM table GROUP BY partition_column
5 SELECT DISTINCT partition_column FROM table
6 Support MIN/MAX on tables with Deletion Vectors
7 Support nested columns (nested leaf-level)
8 Refactor existing MIN/MAX code. Details in the comment below
9

Willingness to contribute

The Delta Lake Community encourages new feature contributions. Would you or another member of your organization be willing to contribute an implementation of this feature?

  • Yes. I can contribute this feature independently.
  • Yes. I would be willing to contribute this feature with guidance from the Delta Lake community.
  • No. I cannot contribute this feature at this time.
@felipepessoto felipepessoto added the enhancement New feature or request label Jan 31, 2024
@ItaiYaffeAkamai
Copy link

Thanks, @felipepessoto, for unifying all those under a single umbrella issue - I think it's really beneficial!

@felipepessoto
Copy link
Contributor Author

In #1525 we left two comments open:

#1525 (comment)
image

#1525 (comment)
image

@kmate
Copy link

kmate commented Feb 8, 2025

@felipepessoto I see that there's an attempt for #1916 that is open since a while (> 6 months?): #3345. Could we add that to the above table?

Also, if nobody is working on 5, I might take a stab at it; I'm just afraid a bit that I might conflict with the above mentioned open PR slightly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants