Skip to content

Commit

Permalink
adding info related to oss unity catalog to the docs
Browse files Browse the repository at this point in the history
  • Loading branch information
sagarlakshmipathy authored and the-other-tim-brown committed Jun 17, 2024
1 parent 5bc6c16 commit df51515
Showing 1 changed file with 28 additions and 6 deletions.
34 changes: 28 additions & 6 deletions website/docs/unity-catalog.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,9 @@ title: "Unity Catalog"
---

# Syncing to Unity Catalog
This document walks through the steps to register an Apache XTable™ (Incubating) synced Delta table in Unity Catalog on Databricks.
This document walks through the steps to register an Apache XTable™ (Incubating) synced Delta table in Unity Catalog on Databricks and open-source Unity Catalog.

## Pre-requisites
## Pre-requisites (for Databricks Unity Catalog)
1. Source table(s) (Hudi/Iceberg) already written to external storage locations like S3/GCS/ADLS.
If you don't have a source table written in S3/GCS/ADLS,
you can follow the steps in [this](/docs/hms) tutorial to set it up.
Expand All @@ -19,6 +19,12 @@ This document walks through the steps to register an Apache XTable™ (Incubatin
5. Clone the Apache XTable™ (Incubating) [repository](https://github.com/apache/incubator-xtable) and create the
`xtable-utilities-0.1.0-SNAPSHOT-bundled.jar` by following the steps on the [Installation page](/docs/setup)

## Pre-requisites (for open-source Unity Catalog)
1. Source table(s) (Hudi/Iceberg) already written to external storage locations like S3/GCS/ADLS or local.
In this guide, we will use the local file system.
But for S3/GCS/ADLS, you must add additional properties related to the respective cloud object storage system you're working with as mentioned [here](https://github.com/unitycatalog/unitycatalog/blob/main/docs/server.md)
2. Clone the Unity Catalog repository from [here](https://github.com/unitycatalog/unitycatalog) and build the project by following the steps outlined [here](https://github.com/unitycatalog/unitycatalog?tab=readme-ov-file#prerequisites)

## Steps
### Running sync
Create `my_config.yaml` in the cloned Apache XTable™ (Incubating) directory.
Expand Down Expand Up @@ -50,8 +56,8 @@ At this point, if you check your bucket path, you will be able to see `_delta_lo
00000000000000000000.json which contains the logs that helps query engines to interpret the source table as a Delta table.
:::

### Register the target table in Unity Catalog
In your Databricks workspace, under SQL editor, run the following queries.
### Register the target table in Databricks Unity Catalog
(After making sure you complete the pre-requisites mentioned for Databricks Unity Catalog above) In your Databricks workspace, under SQL editor, run the following queries.

```sql md title="SQL"
CREATE CATALOG xtable;
Expand All @@ -75,8 +81,24 @@ You can now see the created delta table in **Unity Catalog** under **Catalog** a
SELECT * FROM xtable.synced_delta_schema.<table_name>;
```

### Register the target table in open-source Unity Catalog using the CLI
(After making sure you complete the pre-requisites mentioned for open-source Unity Catalog above) In your terminal start the UC server by following the steps outlined [here](https://github.com/unitycatalog/unitycatalog/tree/main?tab=readme-ov-file#quickstart---hello-uc)

In a different terminal, run the following commands to register the target table in Unity Catalog.

```shell md title="shell"
bin/uc table create --full_name unity.default.people --columns "id INT, name STRING, age INT, city STRING, create_ts STRING" --storage_location /tmp/delta-dataset/people
```

### Validating the results
You can now read the table registered in Unity Catalog using the below command.

```shell md title="shell"
bin/uc table read --full_name unity.default.people
```

## Conclusion
In this guide we saw how to,
1. sync a source table to create metadata for the desired target table formats using Apache XTable™ (Incubating)
2. catalog the data in Delta format in Unity Catalog on Databricks
3. query the Delta table using Databricks SQL editor
2. catalog the data in Delta format in Unity Catalog on Databricks, and also open-source Unity Catalog
3. query the Delta table using Databricks SQL editor, and open-source Unity Catalog CLI.

0 comments on commit df51515

Please sign in to comment.