Skip to content

Commit

Permalink
fix(deps): update dependency io.delta:delta-sharing-spark_2.13 to v3.…
Browse files Browse the repository at this point in the history
…2.0 (#281)

[![Mend
Renovate](https://app.renovatebot.com/images/banner.svg)](https://renovatebot.com)

This PR contains the following updates:

| Package | Change | Age | Adoption | Passing | Confidence |
|---|---|---|---|---|---|
| [io.delta:delta-sharing-spark_2.13](https://delta.io/)
([source](https://togithub.com/delta-io/delta)) | `3.1.0` -> `3.2.0` |
[![age](https://developer.mend.io/api/mc/badges/age/maven/io.delta:delta-sharing-spark_2.13/3.2.0?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
[![adoption](https://developer.mend.io/api/mc/badges/adoption/maven/io.delta:delta-sharing-spark_2.13/3.2.0?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
[![passing](https://developer.mend.io/api/mc/badges/compatibility/maven/io.delta:delta-sharing-spark_2.13/3.1.0/3.2.0?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
[![confidence](https://developer.mend.io/api/mc/badges/confidence/maven/io.delta:delta-sharing-spark_2.13/3.1.0/3.2.0?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|

---

### Release Notes

<details>
<summary>delta-io/delta (io.delta:delta-sharing-spark_2.13)</summary>

### [`v3.2.0`](https://togithub.com/delta-io/delta/releases/tag/v3.2.0):
Delta Lake 3.2.0

We are excited to announce the release of Delta Lake 3.2.0! This release
includes several exciting new features.

#### Highlights

- [Support for Liquid
clustering](https://togithub.com/delta-io/delta/commit/4456a122929b834e5c2652f99cc64ff8a71f4113)
to reduce write amplification using incremental clustering.
- Preview [support for Type
Widening](https://togithub.com/delta-io/delta/commit/9b3fa0a1a05e51b38cec083afb41226beb399b0f)
to allow users to change the type of columns without having to rewrite
data.
- Preview
[support](https://togithub.com/delta-io/delta/commit/902830369662f5a84e987b3a97e23f916da104ca)
for [Apache Hudi](https://hudi.apache.org/) in Delta UniForm tables.

#### Delta Spark

Delta Spark 3.2.0 is built on [Apache Spark™
3.5](https://spark.apache.org/releases/spark-release-3-5-0.html).
Similar to Apache Spark, we have released Maven artifacts for both Scala
2.12 and Scala 2.13.

-   Documentation: <https://docs.delta.io/3.2.0/index.html>
- API documentation:
<https://docs.delta.io/3.2.0/delta-apidoc.html#delta-spark>
- Maven artifacts:
[delta-spark\_2.12](https://repo1.maven.org/maven2/io/delta/delta-spark\_2.12/3.2.0/),
[delta-spark\_2.13](https://repo1.maven.org/maven2/io/delta/delta-spark\_2.13/3.2.0/),
[delta-contribs\_2.12](https://repo1.maven.org/maven2/io/delta/delta-contribs\_2.12/3.2.0/),
[delta_contribs\_2.13](https://repo1.maven.org/maven2/io/delta/delta-contribs\_2.13/3.2.0/),
[delta-storage](https://repo1.maven.org/maven2/io/delta/delta-storage/3.2.0/),
[delta-storage-s3-dynamodb](https://repo1.maven.org/maven2/io/delta/delta-storage-s3-dynamodb/3.2.0/),
[delta-iceberg\_2.12](https://repo1.maven.org/maven2/io/delta/delta-iceberg\_2.12/3.2.0/),
[delta-iceberg\_2.13](https://repo1.maven.org/maven2/io/delta/delta-iceberg\_2.13/3.2.0/)
-   Python artifacts: https://pypi.org/project/delta-spark/3.2.0/

The key features of this release are:

- [Support for Liquid
clustering](https://togithub.com/delta-io/delta/issues/1874): This
allows for [incremental
clustering](https://togithub.com/delta-io/delta/commit/4456a122929b834e5c2652f99cc64ff8a71f4113)
based on ZCubes and reduces the write amplification by not touching
files already well clustered (i.e., files in stable ZCubes). Users can
now use the [ALTER TABLE CLUSTER
BY](https://togithub.com/delta-io/delta/commit/6f4e05197) syntax to
change clustering columns and use the DESCRIBE DETAIL command to check
the clustering columns. In addition, Delta Spark now supports DeltaTable
`clusterBy` API in both Python and Scala to allow creating clustered
tables using DeltaTable API. See the
[documentation](https://docs.delta.io/3.2.0/delta-clustering.html) and
[examples](https://togithub.com/delta-io/delta/blob/branch-3.2/examples/scala/src/main/scala/example/Clustering.scala)
for more information.
- Preview [support for Type
Widening](https://togithub.com/delta-io/delta/commit/9b3fa0a1a05e51b38cec083afb41226beb399b0f):
Delta Spark can now change the type of a column from `byte` to `short`
to `integer` using the [ALTER TABLE t CHANGE COLUMN col TYPE
type](https://spark.apache.org/docs/latest/sql-ref-syntax-ddl-alter-table.html#alter-or-change-column)
command or with schema evolution during MERGE and INSERT operations. The
table remains readable by Delta 3.2 readers without requiring the data
to be rewritten. For compatibility with older versions, a rewrite of the
data can be triggered using the `ALTER TABLE t DROP FEATURE
'typeWidening-preview’` command.
- Note that this feature is in preview and that tables created with this
preview feature enabled may not be compatible with future Delta Spark
releases.
- [Support for Vacuum
Inventory](https://togithub.com/delta-io/delta/commit/7d41fb7bbf63af33ad228007dd6ba3800b4efe81):
Delta Spark now extends the VACUUM SQL command to allow users to specify
an inventory table in a VACUUM command. When an inventory table is
provided, VACUUM will consider the files listed there instead of doing
the full listing of the table directory, which can be time consuming for
very large tables. See the docs
[here](https://docs.delta.io/3.2.0/delta-utility.html#inventory-table).
- [Support for Vacuum Writer Protocol
Check](https://togithub.com/delta-io/delta/commit/2e197f130765d91f201b6b649f30190a44304b29):
Delta Spark can now  support `vacuumProtocolCheck` ReaderWriter feature
which ensures consistent application of reader and writer protocol
checks during `VACUUM` operations, addressing potential protocol
discrepancies and mitigating the risk of data corruption due to skipped
writer checks.
- Preview [support for In-Commit
Timestamps](https://togithub.com/delta-io/delta/commit/b15a2c97432c8892f986c1526ceb2c3f63ed5d2c):
When enabled, this [preview
feature](https://togithub.com/delta-io/delta/issues/2532) persists
monotonically increasing timestamps within Delta commits, ensuring they
are not affected by file operations. When enabled, time travel queries
will yield consistent results, even if the table directory is relocated.
- Note that this feature is in preview and that tables created with this
preview feature enabled may not be compatible with future Delta Spark
releases.
- Deletion Vectors Read Performance Improvements: Two improvements were
introduced to DVs in Delta 3.2.
- [Removing broadcasting of DV information to
executors](https://togithub.com/delta-io/delta/commit/be7183bef85feaebfc928d5f291c5a90246cde87):
This work improves stability by reducing drivers’ memory consumption,
preventing potential Driver OOM for very large Delta tables like 1TB+.
This work also improves performance by saving us fixed broadcasting
overhead in reading small Delta Tables.
- [Supporting predicate pushdown and splitting in scans with
DVs](https://togithub.com/delta-io/delta/pull/2982): Improving
performance of DV reads with filters queries thanks to predicate
pushdown and splitting. This feature gains 2x performance improvement on
average.
- [Support for Row
Tracking](https://togithub.com/delta-io/delta/commit/23b7c17628c21881fbefd04db11a31c973205d95):
Delta Spark can now write to tables that maintain information that
allows identifying rows across multiple versions of a Delta table. Delta
Spark can now also access this tracking information using the two
metadata fields `_metadata.row_id` and `_metadata.row_commit_version`.

Other notable changes include:

- [Delta
Sharing](https://togithub.com/delta-io/delta/commit/8b4b6cce7071046da3d6d3fda4b85120a7445771):
reduce the minimum RPC interval in delta sharing streaming from 30
seconds to 10 seconds
- [Improve](https://togithub.com/delta-io/delta/commit/bba0e94f0) the
performance of write operations by skipping collecting commit stats
- [New SQL
configurations](https://togithub.com/delta-io/delta/commit/3f0496ba3) to
specify Delta Log cache size
(`spark.databricks.delta.delta.log.cacheSize`) and retention duration
(`spark.databricks.delta.delta.log.cacheRetentionMinutes`)
- [Fix](https://togithub.com/delta-io/delta/commit/8db9617b5) bug in
plan validation due to inconsistent field metadata in MERGE
- [Improved](https://togithub.com/delta-io/delta/commit/ef751d236)
metrics during VACUUM for better visibility
- Hive Metastore schema sync: The truncation threshold for schemas with
long fields is now [user
configurable](https://togithub.com/delta-io/delta/commit/3c09d95a34b71fff20cb23753c65af95da5cb48f)

#### Delta Universal Format (UniForm)

-   Documentation: <https://docs.delta.io/3.2.0/delta-uniform.html>
- Maven artifacts:
[delta-iceberg\_2.12](https://repo1.maven.org/maven2/io/delta/delta-iceberg\_2.12/3.2.0/),
[delta-iceberg\_2.13](https://repo1.maven.org/maven2/io/delta/delta-iceberg\_2.13/3.2.0/),
[delta-hudi\_2.12](https://repo1.maven.org/maven2/io/delta/delta-hudi\_2.12/3.2.0/),
[delta-hudi\_2.13](https://repo1.maven.org/maven2/io/delta/delta-hudi\_2.13/3.2.0/)

Hudi is now
[supported](https://togithub.com/delta-io/delta/commit/902830369662f5a84e987b3a97e23f916da104ca)
by Delta Universal format in addition to Iceberg. Writing to a Delta
UniForm table can generate Hudi metadata, alongside Delta. This feature
is contributed by XTable.

Create a UniForm-enabled that automatically generates Hudi metadata
using the following command:

```sql
CREATE TABLE T (c1 INT) USING DELTA TBLPROPERTIES ('delta.universalFormat.enabledFormats' = hudi);
```

See the documentation
[here](https://docs.delta.io/3.2.0/delta-uniform.html) for more details.

Other notable changes include:

- [Throw](https://togithub.com/delta-io/delta/commit/726165608) a better
error if Iceberg conversion fails during initial sync
- [Fix](https://togithub.com/delta-io/delta/commit/79a0581bd) a bug in
Delta Universal Format to support correct table overwrites

#### Delta Kernel

- API documentation:
<https://docs.delta.io/3.2.0/api/java/kernel/index.html>
- Maven artifacts:
[delta-kernel-api](https://repo1.maven.org/maven2/io/delta/delta-kernel-api/3.2.0/),
[delta-kernel-defaults](https://repo1.maven.org/maven2/io/delta/delta-kernel-defaults/3.2.0/)

The Delta Kernel project is a set of Java libraries
([Rust](https://togithub.com/delta-incubator/delta-kernel-rs) will be
coming soon!) for building Delta connectors that can read (and, soon,
write to) Delta tables without the need to understand the [Delta
protocol
details](https://togithub.com/delta-io/delta/blob/master/PROTOCOL.md)).
In this release,e we improved the read support to make it
production-ready by adding numerous performance improvements, additional
functionality, and improved protocol support.

- Support for time travel. Now you can read a table snapshot at a
[version
id](https://docs.delta.io/3.2.0/api/java/kernel/io/delta/kernel/Table.html#getSnapshotAsOfVersion-io.delta.kernel.engine.Engine-long-)
or snapshot at a
[timestamp](https://docs.delta.io/3.2.0/api/java/kernel/io/delta/kernel/Table.html#getSnapshotAsOfTimestamp-io.delta.kernel.engine.Engine-long-).

-   Improved Delta protocol support.
- [Support](https://togithub.com/delta-io/delta/pull/2826) for reading
tables with [`checkpoint
v2`](https://togithub.com/delta-io/delta/blob/master/PROTOCOL.md#v2-checkpoint-table-feature).
- Support for reading tables with `timestamp` partition type data
column.
- [Support](https://togithub.com/delta-io/delta/pull/2855) for reading
tables with column data type
[`timestamp_ntz`](https://togithub.com/delta-io/delta/blob/master/PROTOCOL.md#timestamp-without-timezone-timestampntz).

- Improved table metadata read performance and reliability on very large
tables with millions of files
- Improved [checkpoint reading
latency](https://togithub.com/delta-io/delta/pull/2872) by pushing the
partition predicate to the checkpoint Parquet reader to minimize reading
number of checkpoint files read.
- Improved state reconstruction latency by
[using](https://togithub.com/delta-io/delta/pull/2770) `LogStore`s from
[`delta-storage`](https://togithub.com/delta-io/delta/blob/master/storage/src/main/java/io/delta/storage/LogStore.java)
module for faster `listFrom` calls. 
- [Retry](https://togithub.com/delta-io/delta/pull/2812) loading the
`_last_checkpoint` checkpoint in case of transient failures. Loading the
last checkpoint info from this file helps construct the Delta table
state faster.
- [Optimization](https://togithub.com/delta-io/delta/pull/2817) to
minimize the number of listing calls to object store when trying to find
a last checkpoint at or before a version.

-   Other notable changes include:
- [Support](https://togithub.com/delta-io/delta/pull/2651) for `IS_NULL`
expression. Now the `Predicate` passed to Kernel
[`ScanBuilder`](https://docs.delta.io/3.2.0/api/java/kernel/io/delta/kernel/ScanBuilder.html#withFilter-io.delta.kernel.engine.Engine-io.delta.kernel.expressions.Predicate-)
can include `IS_NULL` predicates.
- [Support](https://togithub.com/delta-io/delta/pull/2701) for custom
`ParquetHandler` implementations to multiple Parquet files in parallel.
The current default implementation reads one file at a time, but the
connectors can implement their own custom `ParquetHandler` to read the
Parquet files in parallel.

In this release we also added **preview** version of APIs that allows
connectors to:

-   Create tables
- Insert data into tables. Current support is just for blind appends
only.
-   Insert data using idempotent writes.
The above functionality is available both for the partitioned and
unpartitioned tables. Refer to the
[examples](https://togithub.com/delta-io/delta/tree/branch-3.2/kernel/examples/kernel-examples/src/main/java/io/delta/kernel/examples)
for sample connector code to create and blind append data to the tables.
We are still developing and evolving these APIs. Please give it a try
and provide us feedback.

For more information, refer to:

- [User
guide](https://togithub.com/delta-io/delta/blob/branch-3.2/kernel/USER_GUIDE.md)
on step-by-step process of using Kernel in a standalone Java program or
in a distributed processing connector.
-
[Slides](https://docs.google.com/presentation/d/1PGSSuJ8ndghucSF9GpYgCi9oeRpWolFyehjQbPh92-U/edit)
explaining the rationale behind Kernel and the API design.
- Example [Java
programs](https://togithub.com/delta-io/delta/tree/branch-3.2/kernel/examples/table-reader/src/main/java/io/delta/kernel/examples)
that illustrate how to read Delta tables using the Kernel APIs.
- Table and default Engine API Java
[documentation](https://docs.delta.io/3.2.0/api/java/kernel/index.html)
- [Migration
guide](https://togithub.com/delta-io/delta/blob/master/kernel/USER_GUIDE.md#migration-from-delta-lake-version-310-to-320)
to upgrade your connector to use the 3.2.0 APIs

#### Credits

Adam Binford, Ala Luszczak, Allison Portis, Ami Oka, Andreas
Chatzistergiou, Arun Ravi M V, Babatunde Micheal Okutubo, Bo Gao, Carmen
Kwan, Chirag Singh, Chloe Xia, Christos Stavrakakis, Costas Zarifis,
Daniel Tenedorio, Davin Tjong, Dhruv Arya, Felipe Pessoto, Fred Storage
Liu, Fredrik Klauss, Gabriel Russo, Hao Jiang, Hyukjin Kwon, Ian
Streeter, Jason Teoh, Jiaheng Tang, Jing Zhan, Jintian Liang, Johan
Lasperas, Jonas Irgens Kylling, Juliusz Sompolski, Kaiqi Jin, Lars
Kroll, Lin Zhou, Miles Cole, Nick Lanham, Ole Sasse, Paddy Xu, Prakhar
Jain, Rachel Bushrian, Rajesh Parangi, Renan Tomazoni Pinzon, Sabir
Akhadov, Scott Sandre, Simon Dahlbacka, Sumeet Varma, Tai Le, Tathagata
Das, Thang Long Vu, Tim Brown, Tom van Bussel, Venki Korukanti, Wei Luo,
Wenchen Fan, Xupeng Li, Yousof Hosny, Gene Pang, Jintao Shen, Kam Cheung
Ting, panbingkun, ram-seek, Sabir Akhadov, sokolat, tangjiafu

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined),
Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you
are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the
rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update
again.

---

- [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check
this box

---

This PR has been generated by [Mend
Renovate](https://www.mend.io/free-developer-tools/renovate/). View
repository job log
[here](https://developer.mend.io/github/agile-lab-dev/whitefox).

<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNy4zNTEuMiIsInVwZGF0ZWRJblZlciI6IjM3LjM1MS4yIiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6W119-->

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
  • Loading branch information
renovate[bot] authored May 13, 2024
1 parent b6fb871 commit 5fbcba0
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion whitefox-platform/build.gradle.kts
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ dependencies {
api("org.apache.hadoop:hadoop-client-api:3.4.0")
api("org.apache.hadoop:hadoop-client-runtime:3.4.0")
api("io.delta:delta-standalone_2.13:3.2.0")
api("io.delta:delta-sharing-spark_2.13:3.1.0")
api("io.delta:delta-sharing-spark_2.13:3.2.0")
api("org.apache.spark:spark-sql_2.13:3.5.1")
api("org.apache.iceberg:iceberg-api:1.5.2")
api("org.apache.iceberg:iceberg-core:1.5.2")
Expand Down

0 comments on commit 5fbcba0

Please sign in to comment.