Add support for partition pruning in Delta checkpoint iterator #19588

ebyhr · 2023-10-31T08:28:28Z

Release notes

(x) Release notes are required, with the following suggested text:

# Delta Lake
* Improve performance when reading large checkpoint files on partitioned tables. ({issue}`issuenumber`)

plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeMetadata.java

plugin/trino-delta-lake/src/test/java/io/trino/plugin/deltalake/TestingDeltaLakeUtils.java

findinpath · 2023-11-06T13:07:45Z

...st/java/io/trino/plugin/deltalake/transactionlog/checkpoint/TestCheckpointEntryIterator.java

+                TupleDomain.withColumnDomains(ImmutableMap.of(intPartField, singleValue(BIGINT, 10L), stringPartField, singleValue(VARCHAR, utf8Slice("part1")))));
+        List<DeltaLakeTransactionLogEntry> entries = ImmutableList.copyOf(checkpointEntryIterator);
+
+        assertThat(entries).hasSize(2);


Shouldn't we have here only 1 entry?
Probably this relates to https://github.com/trinodb/trino/pull/19588/files/7c9ac692875bdb08827aa1dc9f7beac63a9874d4#r1383331077
We should have also the check to see that a reduced amount of entries has been actually read from the parquet file

assertThat(checkpointEntryIterator.getCompletedPositions().orElseThrow()).isEqualTo(....);

When doing buildAddEntry check whether the partitionValues / partitionValues_parsed match the partitionConstraint and return null if not matching.

trino/plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeSplitManager.java

Lines 212 to 213 in 10fdc95

Map<DeltaLakeColumnHandle, Domain> enforcedDomains = enforcedPartitionConstraint.getDomains().orElseThrow();

if (!partitionMatchesPredicate(addAction.getCanonicalPartitionValues(), enforcedDomains)) {

...lake/src/main/java/io/trino/plugin/deltalake/transactionlog/checkpoint/CheckpointWriter.java

plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeConfig.java

...c/main/java/io/trino/plugin/deltalake/transactionlog/checkpoint/CheckpointEntryIterator.java

ebyhr · 2023-11-08T06:24:37Z

Just rebased on master.

findepi · 2023-11-08T14:30:48Z

...-delta-lake/src/main/java/io/trino/plugin/deltalake/transactionlog/TransactionLogAccess.java

    {
        try {
+            if (isCheckpointPartitionFilterEnabled(session) && !partitionConstraint.isAll()) {


perhaps remove && !partitionConstraint.isAll()

i think the new code path should eventually replace the old cache-based approach, so we can use isCheckpointPartitionFilterEnabled as a algorithm-selecting toggle

ebyhr · 2023-11-09T08:10:48Z

CI hit #19602

raunaqmorarka · 2023-11-10T17:33:55Z

...c/main/java/io/trino/plugin/deltalake/transactionlog/checkpoint/CheckpointEntryIterator.java

@@ -431,13 +458,16 @@ private DeltaLakeTransactionLogEntry buildAddEntry(ConnectorSession session, Blo
            statsFieldIndex = 5;
        }

-        Optional<DeltaLakeParquetFileStatistics> parsedStats = Optional.ofNullable(getRowField(addEntryRow, statsFieldIndex + 1)).map(this::parseStatisticsFromParquet);
+        boolean partitionValuesParsedExists = addEntryRow.getUnderlyingFieldBlock(statsFieldIndex + 1) instanceof RowBlock && // partitionValues_parsed


Do we need to check this for every position ? Seems like we should know this per file based on parquet file metadata (maybe it's possible to use io.trino.plugin.hive.ReaderPageSource#getReaderColumns).

Agree with using Parquet metadata though getReaderColumns returns an empty list in this case. Sent another PR #19727

raunaqmorarka · 2023-11-10T17:45:43Z

...c/main/java/io/trino/plugin/deltalake/transactionlog/checkpoint/CheckpointEntryIterator.java

-                    nextEntries.add(entry);
+                    if (entry.getAdd() != null) {
+                        if (partitionConstraint.isAll() ||
+                                partitionMatchesPredicate(entry.getAdd().getCanonicalPartitionValues(), partitionConstraint.getDomains().orElseThrow())) {


While this may help in reducing the number of DeltaLakeTransactionLogEntry, doing the filtering after materialising all channels on each position of a page means that we can't benefit from lazy loading of blocks.
Ideally we should filter directly on the relevant block channels and skip to next position without decoding the remaining channels when the predicate does not match. But this can be looked at as a follow-up.

Correct.
The partition matching check should be done directly in io.trino.plugin.deltalake.transactionlog.checkpoint.CheckpointEntryIterator#buildAddEntry

If we know that we have the field partitionValues_parsed (see https://github.com/trinodb/trino/pull/19588/files#r1389691135) , maybe we should do this check right away after doing

trino/plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/transactionlog/checkpoint/CheckpointEntryIterator.java

Lines 410 to 413 in 65686fa

log.debug("Building add entry from %s pagePosition %d", block, pagePosition);

if (block.isNull(pagePosition)) {

return null;

}

optional: One word concerning using entry.getAdd().getCanonicalPartitionValues().
We have at hand the partitionValues_parsed. We could avoid deserializing the stringified partition values and use the "parsed" values directly. OTOH, we don't actually use the parsed partition values otherwise anywhere else. Did you intentionally restrain from reading the parsed partition values in favor of the stringified partition values?

...-delta-lake/src/main/java/io/trino/plugin/deltalake/transactionlog/TransactionLogAccess.java

findinpath · 2023-11-10T21:30:02Z

...c/main/java/io/trino/plugin/deltalake/transactionlog/checkpoint/CheckpointEntryIterator.java

-                    nextEntries.add(entry);
+                    if (entry.getAdd() != null) {
+                        if (partitionConstraint.isAll() ||
+                                partitionMatchesPredicate(entry.getAdd().getCanonicalPartitionValues(), partitionConstraint.getDomains().orElseThrow())) {


Correct.
The partition matching check should be done directly in io.trino.plugin.deltalake.transactionlog.checkpoint.CheckpointEntryIterator#buildAddEntry

If we know that we have the field partitionValues_parsed (see https://github.com/trinodb/trino/pull/19588/files#r1389691135) , maybe we should do this check right away after doing

trino/plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/transactionlog/checkpoint/CheckpointEntryIterator.java

Lines 410 to 413 in 65686fa

log.debug("Building add entry from %s pagePosition %d", block, pagePosition);

if (block.isNull(pagePosition)) {

return null;

}

optional: One word concerning using entry.getAdd().getCanonicalPartitionValues().
We have at hand the partitionValues_parsed. We could avoid deserializing the stringified partition values and use the "parsed" values directly. OTOH, we don't actually use the parsed partition values otherwise anywhere else. Did you intentionally restrain from reading the parsed partition values in favor of the stringified partition values?

...st/java/io/trino/plugin/deltalake/transactionlog/checkpoint/TestCheckpointEntryIterator.java

...c/main/java/io/trino/plugin/deltalake/transactionlog/checkpoint/CheckpointEntryIterator.java

findepi · 2023-11-13T11:04:53Z

plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeMetadata.java

@@ -3518,7 +3518,8 @@ private OptionalLong executeDelete(ConnectorSession session, ConnectorTableHandl
    private List<AddFileEntry> getAddFileEntriesMatchingEnforcedPartitionConstraint(ConnectorSession session, DeltaLakeTableHandle tableHandle)
    {
        TableSnapshot tableSnapshot = getSnapshot(session, tableHandle);
-        List<AddFileEntry> validDataFiles = transactionLogAccess.getActiveFiles(tableSnapshot, tableHandle.getMetadataEntry(), tableHandle.getProtocolEntry(), session);
+        // TODO Consider passing DeltaLakeTableHandle.getEnforcedPartitionConstraint to getActiveFiles method
+        List<AddFileEntry> validDataFiles = transactionLogAccess.getActiveFiles(tableSnapshot, TupleDomain.all(), tableHandle.getMetadataEntry(), tableHandle.getProtocolEntry(), session);


Why TODO? why not do it right away?

I just wanted to focus on SELECT path in this PR. Going to handle in this PR.

...-delta-lake/src/main/java/io/trino/plugin/deltalake/transactionlog/TransactionLogAccess.java

findepi · 2023-11-13T11:12:24Z

...-delta-lake/src/main/java/io/trino/plugin/deltalake/transactionlog/TransactionLogAccess.java

+                return addFileEntryStream.collect(toImmutableList());
+            }
+            return addFileEntryStream
+                    .filter(addAction -> partitionMatchesPredicate(addAction.getCanonicalPartitionValues(), partitionConstraint.getDomains().orElseThrow()))


The callers (eg split source) will likely repeat this work, so it's partially wasted.
Still useful because this allows us to materialize a shorter list.

I think this wouldn't be needed here if we could return a Stream/Iterator instead of a List.

findepi · 2023-11-13T11:30:28Z

...c/main/java/io/trino/plugin/deltalake/transactionlog/checkpoint/CheckpointSchemaManager.java

@@ -112,7 +114,7 @@ public RowType getMetadataEntryType()
        return metadataEntryType;
    }

-    public RowType getAddEntryType(MetadataEntry metadataEntry, ProtocolEntry protocolEntry, boolean requireWriteStatsAsJson, boolean requireWriteStatsAsStruct)
+    public RowType getAddEntryType(MetadataEntry metadataEntry, ProtocolEntry protocolEntry, boolean requireWriteStatsAsJson, boolean requireWriteStatsAsStruct, boolean requirePartitionValuesParsed)


require... or use... ?

we wnt to use use partitionvalues_parsed field if it is present, but we don't require that it exists (we don't fail when it doesn't), right?

findepi · 2023-11-13T11:35:24Z

...c/main/java/io/trino/plugin/deltalake/transactionlog/checkpoint/CheckpointSchemaManager.java

@@ -156,6 +158,15 @@ public RowType getAddEntryType(MetadataEntry metadataEntry, ProtocolEntry protoc
        if (requireWriteStatsAsJson) {
            addFields.add(RowType.field("stats", VARCHAR));
        }
+        if (requirePartitionValuesParsed) {
+            List<DeltaLakeColumnHandle> partitionColumns = extractPartitionColumns(metadataEntry, protocolEntry, typeManager);


The set of partitioning columns may change in the meantime probably only through the CREATE OR REPLACE TABLE operation. In such case, we shouldn't need to read the old checkpoint file at all, but I don't know whether this is the case.

findepi · 2023-11-13T11:35:44Z

...lake/src/main/java/io/trino/plugin/deltalake/transactionlog/checkpoint/CheckpointWriter.java

@@ -111,7 +111,8 @@ public void write(CheckpointEntries entries, TrinoOutputFile outputFile)
        RowType metadataEntryType = checkpointSchemaManager.getMetadataEntryType();
        RowType protocolEntryType = checkpointSchemaManager.getProtocolEntryType(protocolEntry.getReaderFeatures().isPresent(), protocolEntry.getWriterFeatures().isPresent());
        RowType txnEntryType = checkpointSchemaManager.getTxnEntryType();
-        RowType addEntryType = checkpointSchemaManager.getAddEntryType(entries.getMetadataEntry(), entries.getProtocolEntry(), writeStatsAsJson, writeStatsAsStruct);
+        // TODO https://github.com/trinodb/trino/issues/19586 Add support for writing 'partitionValues_parsed' field


plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeConfig.java

plugin/trino-delta-lake/src/test/java/io/trino/plugin/deltalake/TestDeltaLakeConfig.java

findinpath · 2023-11-15T06:18:13Z

docs/src/main/sphinx/connector/delta-lake.md

@@ -124,6 +124,10 @@ values. Typical usage does not require you to configure them.
 * - `delta.checkpoint-row-statistics-writing.enabled`
  - Enable writing row statistics to checkpoint files.
  - `true`
+* - ``delta.checkpoint-filtering.enabled``


Could you pls test coverage into TestDeltaLakeFileOperations with checkpoint_filtering_enabled session property enabled to add more transparence in regards to the consequences coming with this change?

cla-bot bot added the cla-signed label Oct 31, 2023

github-actions bot added docs delta-lake Delta Lake connector labels Oct 31, 2023

ebyhr self-assigned this Oct 31, 2023

ebyhr commented Oct 31, 2023

View reviewed changes

plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeMetadata.java Outdated Show resolved Hide resolved

ebyhr force-pushed the ebi/delta-part-values-parsed branch 3 times, most recently from a4106b2 to 7c9ac69 Compare October 31, 2023 21:39

ebyhr commented Nov 1, 2023

View reviewed changes

plugin/trino-delta-lake/src/test/java/io/trino/plugin/deltalake/TestingDeltaLakeUtils.java Show resolved Hide resolved

ebyhr mentioned this pull request Nov 6, 2023

Add support for v2Checkpoint in Delta Lake #19507

Merged

findinpath reviewed Nov 6, 2023

View reviewed changes

...lake/src/main/java/io/trino/plugin/deltalake/transactionlog/checkpoint/CheckpointWriter.java Show resolved Hide resolved

findinpath requested a review from raunaqmorarka November 6, 2023 14:09

findinpath reviewed Nov 7, 2023

View reviewed changes

plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeConfig.java Outdated Show resolved Hide resolved

findinpath reviewed Nov 7, 2023

View reviewed changes

...c/main/java/io/trino/plugin/deltalake/transactionlog/checkpoint/CheckpointEntryIterator.java Outdated Show resolved Hide resolved

findinpath mentioned this pull request Nov 7, 2023

Add support for writing partitionValues_parsed for the add entries in the Delta Lake checkpoint #19662

Merged

ebyhr force-pushed the ebi/delta-part-values-parsed branch from 7c9ac69 to 9adb767 Compare November 8, 2023 06:22

findepi reviewed Nov 8, 2023

View reviewed changes

ebyhr force-pushed the ebi/delta-part-values-parsed branch from 9adb767 to e73edbc Compare November 9, 2023 03:00

ebyhr mentioned this pull request Nov 9, 2023

Flaky OutOfMemoryError in suite-delta-lake-oss product test #19602

Closed

ebyhr marked this pull request as ready for review November 9, 2023 08:11

raunaqmorarka reviewed Nov 10, 2023

View reviewed changes

findinpath reviewed Nov 10, 2023

View reviewed changes

ebyhr force-pushed the ebi/delta-part-values-parsed branch from e73edbc to c0494e2 Compare November 13, 2023 02:05

findepi approved these changes Nov 13, 2023

View reviewed changes

findinpath mentioned this pull request Nov 14, 2023

Delta Lake: Read from the checkpoint only the statistics relevant to the query #19733

Closed

findinpath reviewed Nov 15, 2023

View reviewed changes

ebyhr mentioned this pull request Nov 15, 2023

Extract fields by names in CheckpointEntryIterator #19727

Merged

findinpath approved these changes Nov 16, 2023

View reviewed changes

Add support for partition pruning in Delta checkpoint iterator

7f87123

ebyhr force-pushed the ebi/delta-part-values-parsed branch from 97cf6e7 to 7f87123 Compare November 16, 2023 08:10

ebyhr merged commit 42a3c02 into trinodb:master Nov 16, 2023
4 of 15 checks passed

ebyhr deleted the ebi/delta-part-values-parsed branch November 16, 2023 08:10

github-actions bot added this to the 434 milestone Nov 16, 2023

mosabua mentioned this pull request Nov 17, 2023

Add Trino 434 release notes #19764

Merged

This was referenced Nov 17, 2023

Construct AddFileEntry instance only if necessary while reading the Delta Lake checkpoint #19795

Merged

Prune unused stats columns when reading Delta checkpoint #19848

Merged

ebyhr mentioned this pull request Nov 27, 2023

Improve reading checkpoint files using stats_parsed field in Delta Lake connector #19902

Open

findinpath mentioned this pull request Dec 6, 2023

Test checkpoint filtering on table with multiple partition fields #20039

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for partition pruning in Delta checkpoint iterator #19588

Add support for partition pruning in Delta checkpoint iterator #19588

ebyhr commented Oct 31, 2023

findinpath Nov 6, 2023 •

edited

Loading

findinpath Nov 7, 2023

ebyhr commented Nov 8, 2023

findepi Nov 8, 2023

ebyhr commented Nov 9, 2023

raunaqmorarka Nov 10, 2023

ebyhr Nov 13, 2023 •

edited

Loading

raunaqmorarka Nov 10, 2023

findinpath Nov 10, 2023

findinpath Nov 10, 2023

findepi Nov 13, 2023

ebyhr Nov 15, 2023

findepi Nov 13, 2023

findepi Nov 13, 2023

findepi Nov 13, 2023

findepi Nov 13, 2023

findinpath Nov 15, 2023

	Map<DeltaLakeColumnHandle, Domain> enforcedDomains = enforcedPartitionConstraint.getDomains().orElseThrow();
	if (!partitionMatchesPredicate(addAction.getCanonicalPartitionValues(), enforcedDomains)) {

	log.debug("Building add entry from %s pagePosition %d", block, pagePosition);
	if (block.isNull(pagePosition)) {
	return null;
	}

Add support for partition pruning in Delta checkpoint iterator #19588

Add support for partition pruning in Delta checkpoint iterator #19588

Conversation

ebyhr commented Oct 31, 2023

Release notes

findinpath Nov 6, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ebyhr commented Nov 8, 2023

Choose a reason for hiding this comment

ebyhr commented Nov 9, 2023

Choose a reason for hiding this comment

ebyhr Nov 13, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

findinpath Nov 6, 2023 •

edited

Loading

ebyhr Nov 13, 2023 •

edited

Loading