[QUESTION] Getting max value for partitioned column based on metadata #1774

rrazy-akamai · 2023-05-18T06:36:08Z

Question

Hi,
We have a table, with a ts column (timestamp Long) as a partition, and stored on Azure Blob Storage.
The directory structure has .../ts=/

I try to find the max value of ts, and it seems that this type of query causes Spark to read all files, instead of relying on metadata, and is VERY costly in time & resources.

I know databricks has some optimization for that - our code used to run there, and performance of this query was good.
However - could not find any implementation which works.
Will appreciate any input on how to do this right!

Thanks.

The text was updated successfully, but these errors were encountered:

felipepessoto · 2023-05-18T07:38:11Z

This PR should help, waiting for review for a while #1525

rrazy-akamai added the bug Something isn't working label May 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QUESTION] Getting max value for partitioned column based on metadata #1774

[QUESTION] Getting max value for partitioned column based on metadata #1774

rrazy-akamai commented May 18, 2023

felipepessoto commented May 18, 2023 •

edited

Loading

[QUESTION] Getting max value for partitioned column based on metadata #1774

[QUESTION] Getting max value for partitioned column based on metadata #1774

Comments

rrazy-akamai commented May 18, 2023

Question

felipepessoto commented May 18, 2023 • edited Loading

felipepessoto commented May 18, 2023 •

edited

Loading