Skip to content

Commit

Permalink
Docs: Add examples for DataFrame branch writes (#10644)
Browse files Browse the repository at this point in the history
  • Loading branch information
anuragmantri authored Jul 17, 2024
1 parent 5a562bb commit 319f29e
Showing 1 changed file with 25 additions and 6 deletions.
31 changes: 25 additions & 6 deletions docs/docs/spark-writes.md
Original file line number Diff line number Diff line change
Expand Up @@ -195,16 +195,19 @@ WHERE EXISTS (SELECT oid FROM prod.db.returned_orders WHERE t1.oid = oid)
For more complex row-level updates based on incoming data, see the section on `MERGE INTO`.

## Writing to Branches
Branch writes can be performed via SQL by providing a branch identifier, `branch_yourBranch` in the operation.
Branch writes can also be performed as part of a write-audit-publish (WAP) workflow by specifying the `spark.wap.branch` config.
Note WAP branch and branch identifier cannot both be specified.
Also, the branch must exist before performing the write.
The operation does **not** create the branch if it does not exist.
For more information on branches please refer to [branches](branching.md).

The branch must exist before performing write. Operations do **not** create the branch if it does not exist.
A branch can be created using [Spark DDL](spark-ddl.md#branching-and-tagging-ddl).

!!! info
Note: When writing to a branch, the current schema of the table will be used for validation.

### Via SQL

Branch writes can be performed by providing a branch identifier, `branch_yourBranch` in the operation.

Branch writes can also be performed as part of a write-audit-publish (WAP) workflow by specifying the `spark.wap.branch` config.
Note WAP branch and branch identifier cannot both be specified.

```sql
-- INSERT (1,' a') (2, 'b') into the audit branch.
Expand All @@ -228,6 +231,22 @@ SET spark.wap.branch = audit-branch
INSERT INTO prod.db.table VALUES (3, 'c');
```

### Via DataFrames

Branch writes via DataFrames can be performed by providing a branch identifier, `branch_yourBranch` in the operation.

```scala
// To insert into `audit` branch
val data: DataFrame = ...
data.writeTo("prod.db.table.branch_audit").append()
```

```scala
// To overwrite `audit` branch
val data: DataFrame = ...
data.writeTo("prod.db.table.branch_audit").overwritePartitions()
```

## Writing with DataFrames

Spark 3 introduced the new `DataFrameWriterV2` API for writing to tables using data frames. The v2 API is recommended for several reasons:
Expand Down

0 comments on commit 319f29e

Please sign in to comment.