Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QUESTION] Getting max value for partitioned column based on metadata #1774

Open
rrazy-akamai opened this issue May 18, 2023 · 1 comment
Open
Labels
bug Something isn't working

Comments

@rrazy-akamai
Copy link

Question

Hi,
We have a table, with a ts column (timestamp Long) as a partition, and stored on Azure Blob Storage.
The directory structure has .../ts=/

I try to find the max value of ts, and it seems that this type of query causes Spark to read all files, instead of relying on metadata, and is VERY costly in time & resources.

I know databricks has some optimization for that - our code used to run there, and performance of this query was good.
However - could not find any implementation which works.
Will appreciate any input on how to do this right!

Thanks.

@rrazy-akamai rrazy-akamai added the bug Something isn't working label May 18, 2023
@felipepessoto
Copy link
Contributor

felipepessoto commented May 18, 2023

This PR should help, waiting for review for a while #1525

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants