Skip to content

Commit

Permalink
fix readme
Browse files Browse the repository at this point in the history
  • Loading branch information
renardeinside committed Jan 13, 2025
1 parent dde2508 commit 69fd794
Show file tree
Hide file tree
Showing 24 changed files with 2,961 additions and 2,331 deletions.
6 changes: 6 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -32,3 +32,9 @@ solacc:
hatch run python tests/integration/source_code/solacc.py

.PHONY: all clean dev lint fmt test integration coverage known solacc

docs-install:
yarn --cwd docs/ucx install

docs-serve:
yarn --cwd docs/ucx start
2,235 changes: 19 additions & 2,216 deletions README.md

Large diffs are not rendered by default.

6 changes: 2 additions & 4 deletions docs/ucx/docs/dev/contributing.mdx
Original file line number Diff line number Diff line change
@@ -1,8 +1,6 @@
---
sidebar_position: 4
---

# Contributing to UCX

# Developer Guide

## First Principles

Expand Down
13 changes: 6 additions & 7 deletions docs/ucx/docs/dev/docs_authoring.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -35,21 +35,20 @@ npm install --global yarn
To set up the documentation locally, follow these steps:

1. Clone the UCX repository
2. Navigate to the `docs/ucx` directory:
2. Run:

```bash
cd docs/ucx
make docs-install
```

3. Start the development server:
## Running the Documentation Locally

To run the documentation locally, use the following command:

```bash
yarn start
make docs-serve
```

4. Open your browser and navigate to `http://localhost:3000/` to view the documentation site.


## Adding images

To add images to your documentation, place the image files in the `static/img` directory.
Expand Down
234 changes: 234 additions & 0 deletions docs/ucx/docs/installation/cross_workspace.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,234 @@
# Cross-workspace installations

When installing UCX across multiple workspaces, administrators need to keep UCX configurations in sync.
UCX will prompt you to select an account profile that has been defined in `~/.databrickscfg`. If you don't have one,
authenticate your machine with:

* `databricks auth login --host https://accounts.cloud.databricks.com/` (AWS)
* `databricks auth login --host https://accounts.azuredatabricks.net/` (Azure)

Ask your Databricks Account admin to run the [`sync-workspace-info` command](#sync-workspace-info-command) to sync the
workspace information with the UCX installations. Once the workspace information is synced, you can run the
[`create-table-mapping` command](#create-table-mapping-command) to align your tables with the Unity Catalog.

[[back to top](#databricks-labs-ucx)]

## `sync-workspace-info` command

```text
databricks --profile ACCOUNTS labs ucx sync-workspace-info
14:07:07 INFO [databricks.sdk] Using Azure CLI authentication with AAD tokens
14:07:07 INFO [d.labs.ucx] Account ID: ...
14:07:10 INFO [d.l.blueprint.parallel][finding_ucx_installations_16] finding ucx installations 10/88, rps: 16.415/sec
14:07:10 INFO [d.l.blueprint.parallel][finding_ucx_installations_0] finding ucx installations 20/88, rps: 32.110/sec
14:07:11 INFO [d.l.blueprint.parallel][finding_ucx_installations_18] finding ucx installations 30/88, rps: 39.786/sec
...
```

> Requires Databricks Account Administrator privileges. Use `--profile` to select the Databricks cli profile configured
> with access to the Databricks account console (with endpoint "https://accounts.cloud.databricks.com/"
> or "https://accounts.azuredatabricks.net").
This command uploads the workspace config to all workspaces in the account where `ucx` is installed. This command is
necessary to create an immutable default catalog mapping for [table migration](#Table-Migration) process and is the prerequisite
for [`create-table-mapping` command](#create-table-mapping-command).

If you cannot get account administrator privileges in reasonable time, you can take the risk and
run [`manual-workspace-info` command](#manual-workspace-info-command) to enter Databricks Workspace IDs and Databricks
Workspace names.

[[back to top](#databricks-labs-ucx)]

## `manual-workspace-info` command

```text
$ databricks labs ucx manual-workspace-info
14:20:36 WARN [d.l.ucx.account] You are strongly recommended to run "databricks labs ucx sync-workspace-info" by account admin,
... otherwise there is a significant risk of inconsistencies between different workspaces. This command will overwrite all UCX
... installations on this given workspace. Result may be consistent only within https://adb-987654321.10.azuredatabricks.net
Workspace name for 987654321 (default: workspace-987654321): labs-workspace
Next workspace id (default: stop): 12345
Workspace name for 12345 (default: workspace-12345): other-workspace
Next workspace id (default: stop):
14:21:19 INFO [d.l.blueprint.parallel][finding_ucx_installations_11] finding ucx installations 10/89, rps: 24.577/sec
14:21:19 INFO [d.l.blueprint.parallel][finding_ucx_installations_15] finding ucx installations 20/89, rps: 48.305/sec
...
14:21:20 INFO [d.l.ucx.account] Synchronised workspace id mapping for installations on current workspace
```

This command is only supposed to be run if the [`sync-workspace-info` command](#sync-workspace-info-command) cannot be
run. It prompts the user to enter the required information manually and creates the workspace info. This command is
useful for workspace administrators who are unable to use the `sync-workspace-info` command, because they are not
Databricks Account Administrators. It can also be used to manually create the workspace info in a new workspace.

[[back to top](#databricks-labs-ucx)]

## `create-account-groups` command

```text
$ databricks labs ucx create-account-groups [--workspace-ids 123,456,789]
```

**Requires Databricks Account Administrator privileges.** This command creates account-level groups if a workspace local
group is not present in the account. It crawls all workspaces configured in `--workspace-ids` flag, then creates
account level groups if a WS local group is not present in the account. If `--workspace-ids` flag is not specified, UCX
will create account groups for all workspaces configured in the account.

The following scenarios are supported, if a group X:
- Exist in workspaces A,B,C, and it has same members in there, it will be created in the account
- Exist in workspaces A,B but not in C, it will be created in the account
- Exist in workspaces A,B,C. It has same members in A,B, but not in C. Then, X and C_X will be created in the account

This command is useful for the setups, that don't have SCIM provisioning in place.

Once you're done with this command, proceed to the [group migration workflow](#group-migration-workflow).

[[back to top](#databricks-labs-ucx)]

## `validate-groups-membership` command

```text
$ databricks labs ucx validate-groups-membership
...
14:30:36 INFO [d.l.u.workspace_access.groups] Found 483 account groups
14:30:36 INFO [d.l.u.workspace_access.groups] No group listing provided, all matching groups will be migrated
14:30:36 INFO [d.l.u.workspace_access.groups] There are no groups with different membership between account and workspace
Workspace Group Name Members Count Account Group Name Members Count Difference
```

This command validates the groups to see if the groups at the account level and workspace level have different membership.
This command is useful for administrators who want to ensure that the groups have the correct membership. It can also be
used to debug issues related to group membership. See [group migration](docs/local-group-migration.md) and
[group migration](#group-migration-workflow) for more details.

Valid group membership is important to ensure users has correct access after legacy table ACL is migrated in [table migration process](#Table-Migration)

[[back to top](#databricks-labs-ucx)]

## `validate-table-locations` command

```text
$ databricks labs ucx validate-table-locations [--workspace-ids 123,456,789]
...
11:39:36 WARN [d.l.u.account.aggregate] Workspace 99999999 does not have UCX installed
11:39:37 WARN [d.l.u.account.aggregate] Overlapping table locations: 123456789:hive_metastore.database.table and 987654321:hive_metastore.database.table
11:39:37 WARN [d.l.u.account.aggregate] Overlapping table locations: 123456789:hive_metastore.database.table and 123456789:hive_metastore.another_database.table
```

This command validates the table locations by checking for overlapping table locations in the workspace and across
workspaces. Unity catalog does not allow overlapping table locations, also not between tables in different catalogs.
Overlapping table locations need to be resolved by the user before running the table migration.

Options to resolve tables with overlapping locations are:
- Move one table and [skip](#skip-command) the other(s).
- Duplicate the tables by copying the data into a managed table and [skip](#skip-command) the original tables.

Considerations when resolving tables with overlapping locations are:
- Migrate the tables one workspace at a time:
- Let later migrated workspaces read tables from the earlier migrated workspace catalogs.
- [Move](#move-command) tables between schemas and catalogs when it fits the data management model.
- The tables might have different:
- Metadata, like:
- Column schema (names, types, order)
- Description
- Tags
- ACLs

[[back to top](#databricks-labs-ucx)]

## `cluster-remap` command

```text
$ databricks labs ucx cluster-remap
21:29:38 INFO [d.labs.ucx] Remapping the Clusters to UC
Cluster Name Cluster Id
Field Eng Shared UC LTS Cluster 0601-182128-dcbte59m
Shared Autoscaling Americas cluster 0329-145545-rugby794
```
```text
Please provide the cluster id's as comma separated value from the above list (default: <ALL>):
```

Once you're done with the [code migration](#code-migration-commands), you can run this command to remap the clusters to UC enabled.

This command will remap the cluster to uc enabled one. When we run this command it will list all the clusters
and its id's and asks to provide the cluster id's as comma separated value which has to be remapped, by default it will take all cluster ids.
Once we provide the cluster id's it will update these clusters to UC enabled.Back up of the existing cluster
config will be stored in backup folder inside the installed location(backup/clusters/cluster_id.json) as a json file.This will help
to revert the cluster remapping.

You can revert the cluster remapping using the [`revert-cluster-remap` command](#revert-cluster-remap-command).

[[back to top](#databricks-labs-ucx)]

## `revert-cluster-remap` command

```text
$ databricks labs ucx revert-cluster-remap
21:31:29 INFO [d.labs.ucx] Reverting the Remapping of the Clusters from UC
21:31:33 INFO [d.labs.ucx] 0301-055912-4ske39iq
21:31:33 INFO [d.labs.ucx] 0306-121015-v1llqff6
Please provide the cluster id's as comma separated value from the above list (default: <ALL>):
```

If a customer want's to revert the cluster remap done using the [`cluster-remap` command](#cluster-remap-command) they can use this command to revert
its configuration from UC to original one.It will iterate through the list of clusters from the backup folder and reverts the
cluster configurations to original one.This will also ask the user to provide the list of clusters that has to be reverted as a prompt.
By default, it will revert all the clusters present in the backup folder

[[back to top](#databricks-labs-ucx)]

## `upload` command

```text
$ databricks labs ucx upload --file <file_path> --run-as-collection True
21:31:29 WARNING [d.labs.ucx] The schema of CSV files is NOT validated, ensure it is correct
21:31:29 INFO [d.labs.ucx] Finished uploading: <file_path>
```

Upload a file to a single workspace (`--run-as-collection False`) or a collection of workspaces
(`--run-as-collection True`). This command is especially useful when uploading the same file to multiple workspaces.

## `download` command

```text
$ databricks labs ucx download --file <file_path> --run-as-collection True
21:31:29 INFO [d.labs.ucx] Finished downloading: <file_path>
```

Download a csv file from a single workspace (`--run-as-collection False`) or a collection of workspaces
(`--run-as-collection True`). This command is especially useful when downloading the same file from multiple workspaces.

## `join-collection` command

```text
$ databricks labs ucx join-collection --workspace-ids <comma seperate list of workspace ids> --profile <account-profile>
```

`join-collection` command joins 2 or more workspaces into a collection. This helps in running supported cli commands as a collection
`join-collection` command updates config.yml file on each workspace ucx installation with installed_workspace_ids attribute.
In order to run `join-collectioon` command a user should:
- be an Account admin on the Databricks account
- be a Workspace admin on all the workspaces to be joined as a collection) or a collection of workspaces
- have installed UCX on the workspace
The `join-collection` command will fail and throw an error msg if the above conditions are not met.

## collection eligible command

Once `join-collection` command is run, it allows user to run multiple cli commands as a collection. The following cli commands
are eligible to be run as a collection. User can run the below commands as collection by passing an additional flag `--run-as-collection=True`
- `ensure-assessment-run`
- `create-table-mapping`
- `principal-prefix-access`
- `migrate-credentials`
- `create-uber-principal`
- `create-missing-principals`
- `validate-external-location`
- `migrate-locations`
- `create-catalog-schemas`
- `migrate-tables`
- `migrate-acls`
- `migrate-dbsql-dashboards`
- `validate-group-membership`
Ex: `databricks labs ucx ensure-assessment-run --run-as-collection=True`

Loading

0 comments on commit 69fd794

Please sign in to comment.