Delta Lake MERGE/UPDATE/DELETE on Databricks should trigger optimized write and auto compaction #10417

jlowe · 2024-02-13T20:10:45Z

https://docs.databricks.com/en/delta/tune-file-size.html states that Delta Lake MERGE, UPDATE, and DELETE operations will always trigger optimized write and auto compaction behavior as of 10.4 LTS, and this cannot be disabled. The RAPIDS Accelerator forms of these operations should mimic this behavior.

jlowe · 2024-02-13T22:44:32Z

Note that this also should remove the repartition by partition key for partitioned tables when writing a MERGE because we're going to turn around and repartition for the optimized write anyway.

jlowe · 2024-04-02T22:18:57Z

Note that for MERGE the user can specify spark.databricks.delta.merge.repartitionBeforeWrite.enabled=false to avoid repartitioning by the partition key when doing a merge into a few number of partitions to avoid sending all the write data to just a small number of tasks. Not exactly semantically equivalent to optimize write and auto compact, but it can avoid the terrible write performance for that partitioned write case.

liurenjie1024 · 2024-04-18T06:32:47Z

Hi, @jlowe delta oss have added support for optimized write: delta-io/delta#2145 I think we can always enable optimized write after porting this?

jlowe · 2024-04-18T13:34:37Z

This is a Databricks-specific behavior per the doc linked above, not a behavior in OSS Delta Lake, at least for the versions of OSS Delta Lake that we support. There's already a separate issue for tracking the OSS versions of optimized write and auto compact, see #10397 and #10398, respectively, but I do not see it as being relevant for this issue. We already support optimized write and auto compact on Databricks.

liurenjie1024 · 2024-04-19T14:44:44Z

I'll take this.

jlowe added feature request New feature or request ? - Needs Triage Need team to review and classify and removed feature request New feature or request labels Feb 13, 2024

mattahrens added feature request New feature or request and removed ? - Needs Triage Need team to review and classify labels Feb 13, 2024

mattahrens assigned andygrove Mar 13, 2024

andygrove added the ? - Needs Triage Need team to review and classify label Apr 1, 2024

andygrove removed their assignment Apr 1, 2024

mattahrens removed the ? - Needs Triage Need team to review and classify label Apr 2, 2024

liurenjie1024 self-assigned this Apr 19, 2024

jlowe mentioned this issue May 9, 2024

Introduce low shuffle merge. #10786

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Delta Lake MERGE/UPDATE/DELETE on Databricks should trigger optimized write and auto compaction #10417

Delta Lake MERGE/UPDATE/DELETE on Databricks should trigger optimized write and auto compaction #10417

jlowe commented Feb 13, 2024

jlowe commented Feb 13, 2024

jlowe commented Apr 2, 2024

liurenjie1024 commented Apr 18, 2024

jlowe commented Apr 18, 2024 •

edited

Loading

liurenjie1024 commented Apr 19, 2024

Delta Lake MERGE/UPDATE/DELETE on Databricks should trigger optimized write and auto compaction #10417

Delta Lake MERGE/UPDATE/DELETE on Databricks should trigger optimized write and auto compaction #10417

Comments

jlowe commented Feb 13, 2024

jlowe commented Feb 13, 2024

jlowe commented Apr 2, 2024

liurenjie1024 commented Apr 18, 2024

jlowe commented Apr 18, 2024 • edited Loading

liurenjie1024 commented Apr 19, 2024

jlowe commented Apr 18, 2024 •

edited

Loading