Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use table ownership heuristics for migrating ACLs #3075

Closed
wants to merge 6 commits into from

Conversation

nfx
Copy link
Collaborator

@nfx nfx commented Oct 24, 2024

This PR prepends OWN grants when migrating ACLs for tables. For detailed logic on ownership heuristics, see #3066

@nfx nfx requested a review from FastLee October 24, 2024 18:44
@nfx nfx requested a review from a team as a code owner October 24, 2024 18:44
Base automatically changed from feat/table-ownership to main October 24, 2024 18:46
@nfx nfx force-pushed the feat/table-ownership-acls branch from 144451e to db0ba07 Compare October 24, 2024 18:52
Copy link

github-actions bot commented Oct 24, 2024

❌ 51/55 passed, 4 failed, 6 skipped, 3h48m40s total

❌ test_migration_job_ext_hms[regular]: AssertionError: dummy_tf4sn not found in dummy_cxmua.migrate_cczy7 (24m26.317s)
AssertionError: dummy_tf4sn not found in dummy_cxmua.migrate_cczy7
assert False
[gw7] linux -- Python 3.10.15 /home/runner/work/ucx/ucx/.venv/bin/python
14:20 DEBUG [databricks.labs.ucx.install] Cannot find previous installation: Path (/Users/0a330eb5-dd51-4d97-b6e4-c474356b1d5d/.4u0E/config.yml) doesn't exist.
14:20 INFO [databricks.labs.ucx.install] Please answer a couple of questions to configure Unity Catalog migration
14:20 INFO [databricks.labs.ucx.installer.hms_lineage] HMS Lineage feature creates one system table named system.hms_to_uc_migration.table_access and helps in your migration process from HMS to UC by allowing you to programmatically query HMS lineage data.
14:20 INFO [databricks.labs.ucx.install] Fetching installations...
14:20 WARNING [databricks.labs.ucx.install] Existing installation at /Users/0a330eb5-dd51-4d97-b6e4-c474356b1d5d/.4u0E is corrupted. Skipping...
14:20 INFO [databricks.labs.ucx.installer.policy] Setting up an external metastore
14:20 INFO [databricks.labs.ucx.installer.policy] Creating UCX cluster policy.
14:20 DEBUG [tests.integration.conftest] Waiting for clusters to start...
14:20 DEBUG [tests.integration.conftest] Waiting for clusters to start...
14:20 INFO [databricks.labs.ucx.install] Installing UCX v0.47.1+3820241030142031
14:20 INFO [databricks.labs.ucx.install] Creating ucx schemas...
14:20 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=migrate-groups
14:20 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=migrate-external-hiveserde-tables-in-place-experimental
14:20 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=validate-groups-permissions
14:20 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=migrate-groups-experimental
14:20 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=assessment
14:20 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=migrate-data-reconciliation
14:20 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=migration-progress-experimental
14:20 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=migrate-external-tables-ctas
14:20 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=failing
14:20 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=remove-workspace-local-backup-groups
14:20 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=scan-tables-in-mounts-experimental
14:20 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=migrate-tables
14:20 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=migrate-tables-in-mounts-experimental
14:21 INFO [databricks.labs.ucx.install] Creating dashboards...
14:21 DEBUG [databricks.labs.ucx.install] Reading step folder /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/views...
14:21 DEBUG [databricks.labs.ucx.install] Reading step folder /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/assessment...
14:21 DEBUG [databricks.labs.ucx.install] Reading step folder /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/migration...
14:21 DEBUG [databricks.labs.ucx.install] Reading step folder /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/progress...
14:21 INFO [databricks.labs.ucx.install] Creating dashboard in /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/assessment/interactive...
14:21 INFO [databricks.labs.ucx.install] Creating dashboard in /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/assessment/estimates...
14:21 INFO [databricks.labs.ucx.install] Creating dashboard in /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/assessment/main...
14:21 INFO [databricks.labs.ucx.install] Creating dashboard in /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/assessment/CLOUD_ENV...
14:21 INFO [databricks.labs.ucx.install] Creating dashboard in /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/migration/groups...
14:21 INFO [databricks.labs.ucx.install] Creating dashboard in /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/migration/main...
14:21 INFO [databricks.labs.ucx.install] Creating dashboard in /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/progress/main...
14:21 INFO [databricks.labs.ucx.installer.mixins] Fetching warehouse_id from a config
14:21 INFO [databricks.labs.ucx.installer.mixins] Fetching warehouse_id from a config
14:21 INFO [databricks.labs.ucx.installer.mixins] Fetching warehouse_id from a config
14:21 INFO [databricks.labs.ucx.installer.mixins] Fetching warehouse_id from a config
14:21 INFO [databricks.labs.ucx.installer.mixins] Fetching warehouse_id from a config
14:21 INFO [databricks.labs.ucx.installer.mixins] Fetching warehouse_id from a config
14:21 INFO [databricks.labs.ucx.installer.mixins] Fetching warehouse_id from a config
14:21 INFO [databricks.labs.ucx.install] Installation completed successfully! Please refer to the https://DATABRICKS_HOST/#workspace/Users/0a330eb5-dd51-4d97-b6e4-c474356b1d5d/.4u0E/README for the next steps.
14:21 DEBUG [databricks.labs.ucx.installer.workflows] starting migrate-tables job: https://DATABRICKS_HOST#job/871037108711256
14:21 INFO [databricks.labs.ucx.installer.workflows] Started migrate-tables job: https://DATABRICKS_HOST#job/871037108711256/runs/866956941296398
14:21 DEBUG [databricks.labs.ucx.installer.workflows] Waiting for completion of migrate-tables job: https://DATABRICKS_HOST#job/871037108711256/runs/866956941296398
14:28 INFO [databricks.labs.ucx.installer.workflows] Completed migrate-tables job run 866956941296398 with state: RunResultState.SUCCESS
14:28 INFO [databricks.labs.ucx.installer.workflows] Completed migrate-tables job run 866956941296398 duration: 0:06:47.742000 (2024-10-30 14:21:54.302000+00:00 thru 2024-10-30 14:28:42.044000+00:00)
14:28 DEBUG [databricks.labs.ucx.installer.workflows] Validating migrate-tables workflow: https://DATABRICKS_HOST#job/871037108711256
14:20 DEBUG [databricks.labs.ucx.install] Cannot find previous installation: Path (/Users/0a330eb5-dd51-4d97-b6e4-c474356b1d5d/.4u0E/config.yml) doesn't exist.
14:20 INFO [databricks.labs.ucx.install] Please answer a couple of questions to configure Unity Catalog migration
14:20 INFO [databricks.labs.ucx.installer.hms_lineage] HMS Lineage feature creates one system table named system.hms_to_uc_migration.table_access and helps in your migration process from HMS to UC by allowing you to programmatically query HMS lineage data.
14:20 INFO [databricks.labs.ucx.install] Fetching installations...
14:20 WARNING [databricks.labs.ucx.install] Existing installation at /Users/0a330eb5-dd51-4d97-b6e4-c474356b1d5d/.4u0E is corrupted. Skipping...
14:20 INFO [databricks.labs.ucx.installer.policy] Setting up an external metastore
14:20 INFO [databricks.labs.ucx.installer.policy] Creating UCX cluster policy.
14:20 DEBUG [tests.integration.conftest] Waiting for clusters to start...
14:20 DEBUG [tests.integration.conftest] Waiting for clusters to start...
14:20 INFO [databricks.labs.ucx.install] Installing UCX v0.47.1+3820241030142031
14:20 INFO [databricks.labs.ucx.install] Creating ucx schemas...
14:20 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=migrate-groups
14:20 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=migrate-external-hiveserde-tables-in-place-experimental
14:20 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=validate-groups-permissions
14:20 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=migrate-groups-experimental
14:20 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=assessment
14:20 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=migrate-data-reconciliation
14:20 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=migration-progress-experimental
14:20 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=migrate-external-tables-ctas
14:20 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=failing
14:20 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=remove-workspace-local-backup-groups
14:20 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=scan-tables-in-mounts-experimental
14:20 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=migrate-tables
14:20 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=migrate-tables-in-mounts-experimental
14:21 INFO [databricks.labs.ucx.install] Creating dashboards...
14:21 DEBUG [databricks.labs.ucx.install] Reading step folder /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/views...
14:21 DEBUG [databricks.labs.ucx.install] Reading step folder /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/assessment...
14:21 DEBUG [databricks.labs.ucx.install] Reading step folder /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/migration...
14:21 DEBUG [databricks.labs.ucx.install] Reading step folder /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/progress...
14:21 INFO [databricks.labs.ucx.install] Creating dashboard in /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/assessment/interactive...
14:21 INFO [databricks.labs.ucx.install] Creating dashboard in /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/assessment/estimates...
14:21 INFO [databricks.labs.ucx.install] Creating dashboard in /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/assessment/main...
14:21 INFO [databricks.labs.ucx.install] Creating dashboard in /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/assessment/CLOUD_ENV...
14:21 INFO [databricks.labs.ucx.install] Creating dashboard in /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/migration/groups...
14:21 INFO [databricks.labs.ucx.install] Creating dashboard in /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/migration/main...
14:21 INFO [databricks.labs.ucx.install] Creating dashboard in /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/progress/main...
14:21 INFO [databricks.labs.ucx.installer.mixins] Fetching warehouse_id from a config
14:21 INFO [databricks.labs.ucx.installer.mixins] Fetching warehouse_id from a config
14:21 INFO [databricks.labs.ucx.installer.mixins] Fetching warehouse_id from a config
14:21 INFO [databricks.labs.ucx.installer.mixins] Fetching warehouse_id from a config
14:21 INFO [databricks.labs.ucx.installer.mixins] Fetching warehouse_id from a config
14:21 INFO [databricks.labs.ucx.installer.mixins] Fetching warehouse_id from a config
14:21 INFO [databricks.labs.ucx.installer.mixins] Fetching warehouse_id from a config
14:21 INFO [databricks.labs.ucx.install] Installation completed successfully! Please refer to the https://DATABRICKS_HOST/#workspace/Users/0a330eb5-dd51-4d97-b6e4-c474356b1d5d/.4u0E/README for the next steps.
14:21 DEBUG [databricks.labs.ucx.installer.workflows] starting migrate-tables job: https://DATABRICKS_HOST#job/871037108711256
14:21 INFO [databricks.labs.ucx.installer.workflows] Started migrate-tables job: https://DATABRICKS_HOST#job/871037108711256/runs/866956941296398
14:21 DEBUG [databricks.labs.ucx.installer.workflows] Waiting for completion of migrate-tables job: https://DATABRICKS_HOST#job/871037108711256/runs/866956941296398
14:28 INFO [databricks.labs.ucx.installer.workflows] Completed migrate-tables job run 866956941296398 with state: RunResultState.SUCCESS
14:28 INFO [databricks.labs.ucx.installer.workflows] Completed migrate-tables job run 866956941296398 duration: 0:06:47.742000 (2024-10-30 14:21:54.302000+00:00 thru 2024-10-30 14:28:42.044000+00:00)
14:28 DEBUG [databricks.labs.ucx.installer.workflows] Validating migrate-tables workflow: https://DATABRICKS_HOST#job/871037108711256
14:28 INFO [databricks.labs.ucx.install] Deleting UCX v0.47.1+3820241030142031 from https://DATABRICKS_HOST
14:28 INFO [databricks.labs.ucx.install] Deleting inventory database dummy_s452o
14:28 INFO [databricks.labs.ucx.installer.workflows] Removing job_id=970398122737112, as it is no longer needed
14:28 INFO [databricks.labs.ucx.installer.workflows] Removing job_id=281844928007737, as it is no longer needed
14:28 INFO [databricks.labs.ucx.installer.workflows] Removing job_id=89571543541052, as it is no longer needed
14:28 INFO [databricks.labs.ucx.installer.workflows] Removing job_id=480412001792090, as it is no longer needed
14:28 INFO [databricks.labs.ucx.installer.workflows] Removing job_id=983454309767304, as it is no longer needed
14:28 INFO [databricks.labs.ucx.installer.workflows] Removing job_id=858664293608887, as it is no longer needed
14:28 INFO [databricks.labs.ucx.installer.workflows] Removing job_id=641960411808380, as it is no longer needed
14:28 INFO [databricks.labs.ucx.installer.workflows] Removing job_id=506146823846814, as it is no longer needed
14:28 INFO [databricks.labs.ucx.installer.workflows] Removing job_id=269913969083426, as it is no longer needed
14:28 INFO [databricks.labs.ucx.installer.workflows] Removing job_id=367134087545975, as it is no longer needed
14:28 INFO [databricks.labs.ucx.installer.workflows] Removing job_id=502569858542196, as it is no longer needed
14:29 INFO [databricks.labs.ucx.installer.workflows] Removing job_id=871037108711256, as it is no longer needed
14:29 INFO [databricks.labs.ucx.installer.workflows] Removing job_id=578351794140601, as it is no longer needed
14:29 INFO [databricks.labs.ucx.install] Deleting cluster policy
14:29 INFO [databricks.labs.ucx.install] Deleting secret scope
14:29 INFO [databricks.labs.ucx.install] UnInstalling UCX complete
[gw7] linux -- Python 3.10.15 /home/runner/work/ucx/ucx/.venv/bin/python
❌ test_table_migration_job_publishes_remaining_tables[regular]: AssertionError: assert 'hive_metasto...z.dummy_tpvv3' == 'hive_metasto...z.dummy_t5m93' (18m52.492s)
AssertionError: assert 'hive_metasto...z.dummy_tpvv3' == 'hive_metasto...z.dummy_t5m93'
  
  - hive_metastore.migrate_25azz.dummy_t5m93
  ?                                     ^^^
  + hive_metastore.migrate_25azz.dummy_tpvv3
  ?                                     ^^^
[gw6] linux -- Python 3.10.15 /home/runner/work/ucx/ucx/.venv/bin/python
14:13 DEBUG [databricks.labs.ucx.install] Cannot find previous installation: Path (/Users/0a330eb5-dd51-4d97-b6e4-c474356b1d5d/.exEf/config.yml) doesn't exist.
14:13 INFO [databricks.labs.ucx.install] Please answer a couple of questions to configure Unity Catalog migration
14:13 INFO [databricks.labs.ucx.installer.hms_lineage] HMS Lineage feature creates one system table named system.hms_to_uc_migration.table_access and helps in your migration process from HMS to UC by allowing you to programmatically query HMS lineage data.
14:13 INFO [databricks.labs.ucx.install] Fetching installations...
14:13 WARNING [databricks.labs.ucx.install] Existing installation at /Users/0a330eb5-dd51-4d97-b6e4-c474356b1d5d/.exEf is corrupted. Skipping...
14:13 INFO [databricks.labs.ucx.installer.policy] Creating UCX cluster policy.
14:13 DEBUG [tests.integration.conftest] Waiting for clusters to start...
14:19 DEBUG [tests.integration.conftest] Waiting for clusters to start...
14:19 INFO [databricks.labs.ucx.install] Installing UCX v0.47.1+3820241030141918
14:19 INFO [databricks.labs.ucx.install] Creating ucx schemas...
14:19 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=migrate-external-hiveserde-tables-in-place-experimental
14:19 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=scan-tables-in-mounts-experimental
14:19 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=failing
14:19 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=assessment
14:19 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=migration-progress-experimental
14:19 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=migrate-tables
14:19 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=migrate-external-tables-ctas
14:19 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=validate-groups-permissions
14:19 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=migrate-groups
14:19 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=migrate-data-reconciliation
14:19 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=remove-workspace-local-backup-groups
14:19 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=migrate-groups-experimental
14:19 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=migrate-tables-in-mounts-experimental
14:19 INFO [databricks.labs.ucx.install] Creating dashboards...
14:19 DEBUG [databricks.labs.ucx.install] Reading step folder /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/views...
14:19 DEBUG [databricks.labs.ucx.install] Reading step folder /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/assessment...
14:19 DEBUG [databricks.labs.ucx.install] Reading step folder /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/migration...
14:19 DEBUG [databricks.labs.ucx.install] Reading step folder /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/progress...
14:19 INFO [databricks.labs.ucx.install] Creating dashboard in /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/assessment/interactive...
14:19 INFO [databricks.labs.ucx.install] Creating dashboard in /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/assessment/estimates...
14:19 INFO [databricks.labs.ucx.install] Creating dashboard in /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/assessment/main...
14:19 INFO [databricks.labs.ucx.install] Creating dashboard in /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/assessment/CLOUD_ENV...
14:19 INFO [databricks.labs.ucx.install] Creating dashboard in /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/migration/groups...
14:19 INFO [databricks.labs.ucx.install] Creating dashboard in /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/migration/main...
14:19 INFO [databricks.labs.ucx.install] Creating dashboard in /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/progress/main...
14:19 INFO [databricks.labs.ucx.installer.mixins] Fetching warehouse_id from a config
14:19 INFO [databricks.labs.ucx.installer.mixins] Fetching warehouse_id from a config
14:19 INFO [databricks.labs.ucx.installer.mixins] Fetching warehouse_id from a config
14:19 INFO [databricks.labs.ucx.installer.mixins] Fetching warehouse_id from a config
14:19 INFO [databricks.labs.ucx.installer.mixins] Fetching warehouse_id from a config
14:19 INFO [databricks.labs.ucx.installer.mixins] Fetching warehouse_id from a config
14:19 INFO [databricks.labs.ucx.installer.mixins] Fetching warehouse_id from a config
14:19 INFO [databricks.labs.ucx.install] Installation completed successfully! Please refer to the https://DATABRICKS_HOST/#workspace/Users/0a330eb5-dd51-4d97-b6e4-c474356b1d5d/.exEf/README for the next steps.
14:19 DEBUG [databricks.labs.ucx.installer.workflows] starting migrate-tables job: https://DATABRICKS_HOST#job/701326484073868
14:19 INFO [databricks.labs.ucx.installer.workflows] Started migrate-tables job: https://DATABRICKS_HOST#job/701326484073868/runs/291861562660232
14:19 DEBUG [databricks.labs.ucx.installer.workflows] Waiting for completion of migrate-tables job: https://DATABRICKS_HOST#job/701326484073868/runs/291861562660232
14:31 INFO [databricks.labs.ucx.installer.workflows] Completed migrate-tables job run 291861562660232 with state: RunResultState.SUCCESS
14:31 INFO [databricks.labs.ucx.installer.workflows] Completed migrate-tables job run 291861562660232 duration: 0:11:43.723000 (2024-10-30 14:19:46.991000+00:00 thru 2024-10-30 14:31:30.714000+00:00)
14:31 DEBUG [databricks.labs.ucx.installer.workflows] Validating migrate-tables workflow: https://DATABRICKS_HOST#job/701326484073868
14:13 DEBUG [databricks.labs.ucx.install] Cannot find previous installation: Path (/Users/0a330eb5-dd51-4d97-b6e4-c474356b1d5d/.exEf/config.yml) doesn't exist.
14:13 INFO [databricks.labs.ucx.install] Please answer a couple of questions to configure Unity Catalog migration
14:13 INFO [databricks.labs.ucx.installer.hms_lineage] HMS Lineage feature creates one system table named system.hms_to_uc_migration.table_access and helps in your migration process from HMS to UC by allowing you to programmatically query HMS lineage data.
14:13 INFO [databricks.labs.ucx.install] Fetching installations...
14:13 WARNING [databricks.labs.ucx.install] Existing installation at /Users/0a330eb5-dd51-4d97-b6e4-c474356b1d5d/.exEf is corrupted. Skipping...
14:13 INFO [databricks.labs.ucx.installer.policy] Creating UCX cluster policy.
14:13 DEBUG [tests.integration.conftest] Waiting for clusters to start...
14:19 DEBUG [tests.integration.conftest] Waiting for clusters to start...
14:19 INFO [databricks.labs.ucx.install] Installing UCX v0.47.1+3820241030141918
14:19 INFO [databricks.labs.ucx.install] Creating ucx schemas...
14:19 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=migrate-external-hiveserde-tables-in-place-experimental
14:19 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=scan-tables-in-mounts-experimental
14:19 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=failing
14:19 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=assessment
14:19 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=migration-progress-experimental
14:19 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=migrate-tables
14:19 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=migrate-external-tables-ctas
14:19 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=validate-groups-permissions
14:19 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=migrate-groups
14:19 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=migrate-data-reconciliation
14:19 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=remove-workspace-local-backup-groups
14:19 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=migrate-groups-experimental
14:19 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=migrate-tables-in-mounts-experimental
14:19 INFO [databricks.labs.ucx.install] Creating dashboards...
14:19 DEBUG [databricks.labs.ucx.install] Reading step folder /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/views...
14:19 DEBUG [databricks.labs.ucx.install] Reading step folder /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/assessment...
14:19 DEBUG [databricks.labs.ucx.install] Reading step folder /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/migration...
14:19 DEBUG [databricks.labs.ucx.install] Reading step folder /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/progress...
14:19 INFO [databricks.labs.ucx.install] Creating dashboard in /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/assessment/interactive...
14:19 INFO [databricks.labs.ucx.install] Creating dashboard in /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/assessment/estimates...
14:19 INFO [databricks.labs.ucx.install] Creating dashboard in /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/assessment/main...
14:19 INFO [databricks.labs.ucx.install] Creating dashboard in /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/assessment/CLOUD_ENV...
14:19 INFO [databricks.labs.ucx.install] Creating dashboard in /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/migration/groups...
14:19 INFO [databricks.labs.ucx.install] Creating dashboard in /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/migration/main...
14:19 INFO [databricks.labs.ucx.install] Creating dashboard in /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/progress/main...
14:19 INFO [databricks.labs.ucx.installer.mixins] Fetching warehouse_id from a config
14:19 INFO [databricks.labs.ucx.installer.mixins] Fetching warehouse_id from a config
14:19 INFO [databricks.labs.ucx.installer.mixins] Fetching warehouse_id from a config
14:19 INFO [databricks.labs.ucx.installer.mixins] Fetching warehouse_id from a config
14:19 INFO [databricks.labs.ucx.installer.mixins] Fetching warehouse_id from a config
14:19 INFO [databricks.labs.ucx.installer.mixins] Fetching warehouse_id from a config
14:19 INFO [databricks.labs.ucx.installer.mixins] Fetching warehouse_id from a config
14:19 INFO [databricks.labs.ucx.install] Installation completed successfully! Please refer to the https://DATABRICKS_HOST/#workspace/Users/0a330eb5-dd51-4d97-b6e4-c474356b1d5d/.exEf/README for the next steps.
14:19 DEBUG [databricks.labs.ucx.installer.workflows] starting migrate-tables job: https://DATABRICKS_HOST#job/701326484073868
14:19 INFO [databricks.labs.ucx.installer.workflows] Started migrate-tables job: https://DATABRICKS_HOST#job/701326484073868/runs/291861562660232
14:19 DEBUG [databricks.labs.ucx.installer.workflows] Waiting for completion of migrate-tables job: https://DATABRICKS_HOST#job/701326484073868/runs/291861562660232
14:31 INFO [databricks.labs.ucx.installer.workflows] Completed migrate-tables job run 291861562660232 with state: RunResultState.SUCCESS
14:31 INFO [databricks.labs.ucx.installer.workflows] Completed migrate-tables job run 291861562660232 duration: 0:11:43.723000 (2024-10-30 14:19:46.991000+00:00 thru 2024-10-30 14:31:30.714000+00:00)
14:31 DEBUG [databricks.labs.ucx.installer.workflows] Validating migrate-tables workflow: https://DATABRICKS_HOST#job/701326484073868
14:31 INFO [databricks.labs.ucx.install] Deleting UCX v0.47.1+3820241030141918 from https://DATABRICKS_HOST
14:31 INFO [databricks.labs.ucx.install] Deleting inventory database dummy_sthcn
14:31 INFO [databricks.labs.ucx.installer.workflows] Removing job_id=85151483145099, as it is no longer needed
14:31 INFO [databricks.labs.ucx.installer.workflows] Removing job_id=58680813733377, as it is no longer needed
14:31 INFO [databricks.labs.ucx.installer.workflows] Removing job_id=209519110720635, as it is no longer needed
14:31 INFO [databricks.labs.ucx.installer.workflows] Removing job_id=566527140132225, as it is no longer needed
14:31 INFO [databricks.labs.ucx.installer.workflows] Removing job_id=423124116175411, as it is no longer needed
14:31 INFO [databricks.labs.ucx.installer.workflows] Removing job_id=701326484073868, as it is no longer needed
14:31 INFO [databricks.labs.ucx.installer.workflows] Removing job_id=776375999097149, as it is no longer needed
14:31 INFO [databricks.labs.ucx.installer.workflows] Removing job_id=548967063429000, as it is no longer needed
14:31 INFO [databricks.labs.ucx.installer.workflows] Removing job_id=211548950671214, as it is no longer needed
14:31 INFO [databricks.labs.ucx.installer.workflows] Removing job_id=143076843372831, as it is no longer needed
14:31 INFO [databricks.labs.ucx.installer.workflows] Removing job_id=121295930415316, as it is no longer needed
14:31 INFO [databricks.labs.ucx.installer.workflows] Removing job_id=702441070066079, as it is no longer needed
14:31 INFO [databricks.labs.ucx.installer.workflows] Removing job_id=942554530570065, as it is no longer needed
14:31 INFO [databricks.labs.ucx.install] Deleting cluster policy
14:31 INFO [databricks.labs.ucx.install] Deleting secret scope
14:31 INFO [databricks.labs.ucx.install] UnInstalling UCX complete
[gw6] linux -- Python 3.10.15 /home/runner/work/ucx/ucx/.venv/bin/python
❌ test_table_migration_job_refreshes_migration_status[regular-migrate-tables]: AssertionError: No destination schema found for TableType.VIEW hive_metastore.migrate_ctgmz.dummy_tx11n (19m28.216s)
AssertionError: No destination schema found for TableType.VIEW hive_metastore.migrate_ctgmz.dummy_tx11n
  No destination schema found for TableType.VIEW hive_metastore.migrate_ctgmz.dummy_t8oru given migration statuses Row(src_schema='migrate_ctgmz', src_table='dummy_tjxgh', dst_catalog='dummy_c0ii2', dst_schema='migrate_ctgmz', dst_table='dummy_tjxgh', update_ts='1730298642.86562')
  Row(src_schema='migrate_ctgmz', src_table='dummy_tjofj', dst_catalog='dummy_c0ii2', dst_schema='migrate_ctgmz', dst_table='dummy_tjofj', update_ts='1730298642.86562')
  Row(src_schema='migrate_ctgmz', src_table='dummy_tmk1m', dst_catalog='dummy_c0ii2', dst_schema='migrate_ctgmz', dst_table='dummy_tmk1m', update_ts='1730298642.86562')
  Row(src_schema='migrate_ctgmz', src_table='dummy_tx11n', dst_catalog=None, dst_schema=None, dst_table=None, update_ts='1730298642.86562')
  Row(src_schema='migrate_ctgmz', src_table='dummy_t8oru', dst_catalog=None, dst_schema=None, dst_table=None, update_ts='1730298642.86562')
assert 2 == 0
 +  where 2 = len(['No destination schema found for TableType.VIEW hive_metastore.migrate_ctgmz.dummy_tx11n', 'No destination schema found for TableType.VIEW hive_metastore.migrate_ctgmz.dummy_t8oru'])
[gw0] linux -- Python 3.10.15 /home/runner/work/ucx/ucx/.venv/bin/python
14:13 DEBUG [databricks.labs.ucx.install] Cannot find previous installation: Path (/Users/0a330eb5-dd51-4d97-b6e4-c474356b1d5d/.il1F/config.yml) doesn't exist.
14:13 INFO [databricks.labs.ucx.install] Please answer a couple of questions to configure Unity Catalog migration
14:13 INFO [databricks.labs.ucx.installer.hms_lineage] HMS Lineage feature creates one system table named system.hms_to_uc_migration.table_access and helps in your migration process from HMS to UC by allowing you to programmatically query HMS lineage data.
14:13 INFO [databricks.labs.ucx.install] Fetching installations...
14:13 WARNING [databricks.labs.ucx.install] Existing installation at /Users/0a330eb5-dd51-4d97-b6e4-c474356b1d5d/.il1F is corrupted. Skipping...
14:13 INFO [databricks.labs.ucx.installer.policy] Creating UCX cluster policy.
14:13 DEBUG [tests.integration.conftest] Waiting for clusters to start...
14:19 DEBUG [tests.integration.conftest] Waiting for clusters to start...
14:19 INFO [databricks.labs.ucx.install] Installing UCX v0.47.1+3820241030141920
14:19 INFO [databricks.labs.ucx.install] Creating ucx schemas...
14:19 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=migrate-tables
14:19 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=remove-workspace-local-backup-groups
14:19 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=migrate-external-tables-ctas
14:19 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=migrate-data-reconciliation
14:19 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=migrate-tables-in-mounts-experimental
14:19 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=failing
14:19 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=migrate-external-hiveserde-tables-in-place-experimental
14:19 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=assessment
14:19 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=migrate-groups-experimental
14:19 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=migrate-groups
14:19 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=validate-groups-permissions
14:19 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=migration-progress-experimental
14:19 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=scan-tables-in-mounts-experimental
14:19 INFO [databricks.labs.ucx.install] Creating dashboards...
14:19 DEBUG [databricks.labs.ucx.install] Reading step folder /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/views...
14:19 DEBUG [databricks.labs.ucx.install] Reading step folder /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/assessment...
14:19 DEBUG [databricks.labs.ucx.install] Reading step folder /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/migration...
14:19 DEBUG [databricks.labs.ucx.install] Reading step folder /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/progress...
14:19 INFO [databricks.labs.ucx.install] Creating dashboard in /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/assessment/interactive...
14:19 INFO [databricks.labs.ucx.install] Creating dashboard in /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/assessment/estimates...
14:19 INFO [databricks.labs.ucx.install] Creating dashboard in /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/assessment/main...
14:19 INFO [databricks.labs.ucx.install] Creating dashboard in /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/assessment/CLOUD_ENV...
14:19 INFO [databricks.labs.ucx.install] Creating dashboard in /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/migration/groups...
14:19 INFO [databricks.labs.ucx.install] Creating dashboard in /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/migration/main...
14:19 INFO [databricks.labs.ucx.install] Creating dashboard in /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/progress/main...
14:19 INFO [databricks.labs.ucx.installer.mixins] Fetching warehouse_id from a config
14:19 INFO [databricks.labs.ucx.installer.mixins] Fetching warehouse_id from a config
14:19 INFO [databricks.labs.ucx.installer.mixins] Fetching warehouse_id from a config
14:19 INFO [databricks.labs.ucx.installer.mixins] Fetching warehouse_id from a config
14:19 INFO [databricks.labs.ucx.installer.mixins] Fetching warehouse_id from a config
14:19 INFO [databricks.labs.ucx.installer.mixins] Fetching warehouse_id from a config
14:19 INFO [databricks.labs.ucx.installer.mixins] Fetching warehouse_id from a config
14:19 INFO [databricks.labs.ucx.install] Installation completed successfully! Please refer to the https://DATABRICKS_HOST/#workspace/Users/0a330eb5-dd51-4d97-b6e4-c474356b1d5d/.il1F/README for the next steps.
14:19 DEBUG [databricks.labs.ucx.installer.workflows] starting migrate-tables job: https://DATABRICKS_HOST#job/1029686360687216
14:19 INFO [databricks.labs.ucx.installer.workflows] Started migrate-tables job: https://DATABRICKS_HOST#job/1029686360687216/runs/786935881612413
14:19 DEBUG [databricks.labs.ucx.installer.workflows] Waiting for completion of migrate-tables job: https://DATABRICKS_HOST#job/1029686360687216/runs/786935881612413
14:31 INFO [databricks.labs.ucx.installer.workflows] Completed migrate-tables job run 786935881612413 with state: RunResultState.SUCCESS
14:31 INFO [databricks.labs.ucx.installer.workflows] Completed migrate-tables job run 786935881612413 duration: 0:11:54.148000 (2024-10-30 14:19:44.833000+00:00 thru 2024-10-30 14:31:38.981000+00:00)
14:13 DEBUG [databricks.labs.ucx.install] Cannot find previous installation: Path (/Users/0a330eb5-dd51-4d97-b6e4-c474356b1d5d/.il1F/config.yml) doesn't exist.
14:13 INFO [databricks.labs.ucx.install] Please answer a couple of questions to configure Unity Catalog migration
14:13 INFO [databricks.labs.ucx.installer.hms_lineage] HMS Lineage feature creates one system table named system.hms_to_uc_migration.table_access and helps in your migration process from HMS to UC by allowing you to programmatically query HMS lineage data.
14:13 INFO [databricks.labs.ucx.install] Fetching installations...
14:13 WARNING [databricks.labs.ucx.install] Existing installation at /Users/0a330eb5-dd51-4d97-b6e4-c474356b1d5d/.il1F is corrupted. Skipping...
14:13 INFO [databricks.labs.ucx.installer.policy] Creating UCX cluster policy.
14:13 DEBUG [tests.integration.conftest] Waiting for clusters to start...
14:19 DEBUG [tests.integration.conftest] Waiting for clusters to start...
14:19 INFO [databricks.labs.ucx.install] Installing UCX v0.47.1+3820241030141920
14:19 INFO [databricks.labs.ucx.install] Creating ucx schemas...
14:19 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=migrate-tables
14:19 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=remove-workspace-local-backup-groups
14:19 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=migrate-external-tables-ctas
14:19 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=migrate-data-reconciliation
14:19 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=migrate-tables-in-mounts-experimental
14:19 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=failing
14:19 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=migrate-external-hiveserde-tables-in-place-experimental
14:19 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=assessment
14:19 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=migrate-groups-experimental
14:19 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=migrate-groups
14:19 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=validate-groups-permissions
14:19 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=migration-progress-experimental
14:19 INFO [databricks.labs.ucx.installer.workflows] Creating new job configuration for step=scan-tables-in-mounts-experimental
14:19 INFO [databricks.labs.ucx.install] Creating dashboards...
14:19 DEBUG [databricks.labs.ucx.install] Reading step folder /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/views...
14:19 DEBUG [databricks.labs.ucx.install] Reading step folder /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/assessment...
14:19 DEBUG [databricks.labs.ucx.install] Reading step folder /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/migration...
14:19 DEBUG [databricks.labs.ucx.install] Reading step folder /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/progress...
14:19 INFO [databricks.labs.ucx.install] Creating dashboard in /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/assessment/interactive...
14:19 INFO [databricks.labs.ucx.install] Creating dashboard in /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/assessment/estimates...
14:19 INFO [databricks.labs.ucx.install] Creating dashboard in /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/assessment/main...
14:19 INFO [databricks.labs.ucx.install] Creating dashboard in /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/assessment/CLOUD_ENV...
14:19 INFO [databricks.labs.ucx.install] Creating dashboard in /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/migration/groups...
14:19 INFO [databricks.labs.ucx.install] Creating dashboard in /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/migration/main...
14:19 INFO [databricks.labs.ucx.install] Creating dashboard in /home/runner/work/ucx/ucx/src/databricks/labs/ucx/queries/progress/main...
14:19 INFO [databricks.labs.ucx.installer.mixins] Fetching warehouse_id from a config
14:19 INFO [databricks.labs.ucx.installer.mixins] Fetching warehouse_id from a config
14:19 INFO [databricks.labs.ucx.installer.mixins] Fetching warehouse_id from a config
14:19 INFO [databricks.labs.ucx.installer.mixins] Fetching warehouse_id from a config
14:19 INFO [databricks.labs.ucx.installer.mixins] Fetching warehouse_id from a config
14:19 INFO [databricks.labs.ucx.installer.mixins] Fetching warehouse_id from a config
14:19 INFO [databricks.labs.ucx.installer.mixins] Fetching warehouse_id from a config
14:19 INFO [databricks.labs.ucx.install] Installation completed successfully! Please refer to the https://DATABRICKS_HOST/#workspace/Users/0a330eb5-dd51-4d97-b6e4-c474356b1d5d/.il1F/README for the next steps.
14:19 DEBUG [databricks.labs.ucx.installer.workflows] starting migrate-tables job: https://DATABRICKS_HOST#job/1029686360687216
14:19 INFO [databricks.labs.ucx.installer.workflows] Started migrate-tables job: https://DATABRICKS_HOST#job/1029686360687216/runs/786935881612413
14:19 DEBUG [databricks.labs.ucx.installer.workflows] Waiting for completion of migrate-tables job: https://DATABRICKS_HOST#job/1029686360687216/runs/786935881612413
14:31 INFO [databricks.labs.ucx.installer.workflows] Completed migrate-tables job run 786935881612413 with state: RunResultState.SUCCESS
14:31 INFO [databricks.labs.ucx.installer.workflows] Completed migrate-tables job run 786935881612413 duration: 0:11:54.148000 (2024-10-30 14:19:44.833000+00:00 thru 2024-10-30 14:31:38.981000+00:00)
14:31 INFO [databricks.labs.ucx.install] Deleting UCX v0.47.1+3820241030141920 from https://DATABRICKS_HOST
14:31 INFO [databricks.labs.ucx.install] Deleting inventory database dummy_ssfsc
14:31 INFO [databricks.labs.ucx.installer.workflows] Removing job_id=1029686360687216, as it is no longer needed
14:31 INFO [databricks.labs.ucx.installer.workflows] Removing job_id=543864415396850, as it is no longer needed
14:31 INFO [databricks.labs.ucx.installer.workflows] Removing job_id=524043854802748, as it is no longer needed
14:31 INFO [databricks.labs.ucx.installer.workflows] Removing job_id=1012423696482560, as it is no longer needed
14:31 INFO [databricks.labs.ucx.installer.workflows] Removing job_id=609830252260667, as it is no longer needed
14:31 INFO [databricks.labs.ucx.installer.workflows] Removing job_id=561081538334835, as it is no longer needed
14:31 INFO [databricks.labs.ucx.installer.workflows] Removing job_id=22675904851804, as it is no longer needed
14:31 INFO [databricks.labs.ucx.installer.workflows] Removing job_id=13222867042424, as it is no longer needed
14:31 INFO [databricks.labs.ucx.installer.workflows] Removing job_id=1115396416401228, as it is no longer needed
14:31 INFO [databricks.labs.ucx.installer.workflows] Removing job_id=1628184222154, as it is no longer needed
14:31 INFO [databricks.labs.ucx.installer.workflows] Removing job_id=596666979070458, as it is no longer needed
14:31 INFO [databricks.labs.ucx.installer.workflows] Removing job_id=79197171572356, as it is no longer needed
14:31 INFO [databricks.labs.ucx.installer.workflows] Removing job_id=463178768284303, as it is no longer needed
14:31 INFO [databricks.labs.ucx.install] Deleting cluster policy
14:31 INFO [databricks.labs.ucx.install] Deleting secret scope
14:31 INFO [databricks.labs.ucx.install] UnInstalling UCX complete
[gw0] linux -- Python 3.10.15 /home/runner/work/ucx/ucx/.venv/bin/python
❌ test_running_real_migration_progress_job: TimeoutError: timed out after 0:20:00: (27m20.463s)
... (skipped 1094367 bytes)
ing grants inventory
14:39 DEBUG [databricks.labs.lsql.backends:crawl_grants] [spark][fetch] SELECT * FROM `hive_metastore`.`dummy_s5105`.`grants`
14:39 DEBUG [databricks.labs.ucx.framework.crawlers:crawl_grants] [hive_metastore.dummy_s5105.grants] crawling new set of snapshot data for grants
14:39 DEBUG [databricks.labs.ucx.framework.crawlers:crawl_grants] [hive_metastore.dummy_s5105.tables] fetching tables inventory
14:39 DEBUG [databricks.labs.lsql.backends:crawl_grants] [spark][fetch] SELECT * FROM `hive_metastore`.`dummy_s5105`.`tables`
14:39 DEBUG [databricks.labs.ucx.framework.crawlers:crawl_grants] [hive_metastore.dummy_s5105.udfs] fetching udfs inventory
14:39 DEBUG [databricks.labs.lsql.backends:crawl_grants] [spark][fetch] SELECT * FROM `hive_metastore`.`dummy_s5105`.`udfs`
14:39 DEBUG [databricks.labs.ucx.framework.crawlers:crawl_grants] [hive_metastore.dummy_s5105.udfs] crawling new set of snapshot data for udfs
14:39 DEBUG [databricks.labs.lsql.backends:crawl_grants] [spark][execute] USE CATALOG `hive_metastore`;
14:39 DEBUG [databricks.labs.ucx.hive_metastore.udfs:crawl_grants] [hive_metastore.dummy_syiem] listing udfs
14:39 DEBUG [databricks.labs.lsql.backends:crawl_grants] [spark][fetch] SHOW USER FUNCTIONS FROM `hive_metastore`.`dummy_syiem`;
14:39 DEBUG [databricks.labs.ucx.framework.crawlers:crawl_grants] [hive_metastore.dummy_s5105.udfs] found 0 new records for udfs
14:39 DEBUG [databricks.labs.lsql.backends:crawl_grants] [spark][execute] CREATE TABLE IF NOT EXISTS hive_metastore.dummy_s5105.udfs (catalog STRING NOT NULL, database ST... (290 more bytes)
14:39 DEBUG [databricks.labs.blueprint.parallel:crawl_grants] Starting 5 tasks in 8 threads
14:39 DEBUG [databricks.labs.lsql.backends:crawl_grants] [spark][fetch] SHOW GRANTS ON CATALOG `hive_metastore`
14:39 DEBUG [databricks.labs.lsql.backends:crawl_grants] [spark][fetch] SHOW GRANTS ON ANY FILE 
14:39 DEBUG [databricks.labs.lsql.backends:crawl_grants] [spark][fetch] SHOW GRANTS ON ANONYMOUS FUNCTION 
14:39 DEBUG [databricks.labs.lsql.backends:crawl_grants] [spark][fetch] SHOW GRANTS ON DATABASE `hive_metastore`.`dummy_syiem`
14:39 DEBUG [databricks.labs.lsql.backends:crawl_grants] [spark][fetch] SHOW GRANTS ON TABLE `hive_metastore`.`dummy_syiem`.`dummy_twfeu`
14:39 ERROR [databricks.labs.ucx.hive_metastore.grants:crawl_grants] Couldn't fetch grants for object ANY FILE : An error occurred while calling o400.sql.
: com.databricks.common.client.DatabricksServiceHttpClientException: TEMPORARILY_UNAVAILABLE: The service at /api/2.0/sql-acl/get-permissions is taking too long to process your request. Please try again later or try a faster operation. [TraceId: 00-9fa47890b951e3a40080b9364dd67370-db5deb8b51632dfe-00]
	at com.databricks.common.client.DatabricksServiceHttpClientException.copy(DBHttpClient.scala:1557)
	at com.databricks.common.client.RawDBHttpClient.getResponseBody(DBHttpClient.scala:1399)
	at com.databricks.common.client.RawDBHttpClient.$anonfun$httpRequestInternal$1(DBHttpClient.scala:1344)
	at com.databricks.logging.UsageLogging.$anonfun$recordOperation$1(UsageLogging.scala:527)
	at com.databricks.logging.UsageLogging.executeThunkAndCaptureResultTags$1(UsageLogging.scala:631)
	at com.databricks.logging.UsageLogging.$anonfun$recordOperationWithResultTags$4(UsageLogging.scala:651)
	at com.databricks.logging.AttributionContextTracing.$anonfun$withAttributionContext$1(AttributionContextTracing.scala:48)
	at com.databricks.logging.AttributionContext$.$anonfun$withValue$1(AttributionContext.scala:276)
	at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
	at com.databricks.logging.AttributionContext$.withValue(AttributionContext.scala:272)
	at com.databricks.logging.AttributionContextTracing.withAttributionContext(AttributionContextTracing.scala:46)
	at com.databricks.logging.AttributionContextTracing.withAttributionContext$(AttributionContextTracing.scala:43)
	at com.databricks.common.client.RawDBHttpClient.withAttributionContext(DBHttpClient.scala:645)
	at com.databricks.logging.AttributionContextTracing.withAttributionTags(AttributionContextTracing.scala:95)
	at com.databricks.logging.AttributionContextTracing.withAttributionTags$(AttributionContextTracing.scala:76)
	at com.databricks.common.client.RawDBHttpClient.withAttributionTags(DBHttpClient.scala:645)
	at com.databricks.logging.UsageLogging.recordOperationWithResultTags(UsageLogging.scala:626)
	at com.databricks.logging.UsageLogging.recordOperationWithResultTags$(UsageLogging.scala:536)
	at com.databricks.common.client.RawDBHttpClient.recordOperationWithResultTags(DBHttpClient.scala:645)
	at com.databricks.logging.UsageLogging.recordOperation(UsageLogging.scala:528)
	at com.databricks.logging.UsageLogging.recordOperation$(UsageLogging.scala:496)
	at com.databricks.common.client.RawDBHttpClient.recordOperation(DBHttpClient.scala:645)
	at com.databricks.common.client.RawDBHttpClient.httpRequestInternal(DBHttpClient.scala:1306)
	at com.databricks.common.client.RawDBHttpClient.entityEnclosingRequestInternal(DBHttpClient.scala:1292)
	at com.databricks.common.client.RawDBHttpClient.getInternal(DBHttpClient.scala:1238)
	at com.databricks.common.client.RawDBHttpClient.get(DBHttpClient.scala:739)
	at com.databricks.common.client.RawDBHttpClient.getWithHeaders(DBHttpClient.scala:781)
	at com.databricks.common.client.RawDBHttpClient.get(DBHttpClient.scala:705)
	at com.databricks.common.client.RawDBHttpClient.get(DBHttpClient.scala:716)
	at com.databricks.spark.sql.acl.client.DriverToWebappSqlAclClient.$anonfun$getPermissions$1(DriverToWebappSqlAclClient.scala:76)
	at com.databricks.common.client.DBHttpClient$.retryWithDeadline(DBHttpClient.scala:408)
	at com.databricks.spark.sql.acl.client.DriverToWebappSqlAclClient.reliably(DriverToWebappSqlAclClient.scala:130)
	at com.databricks.spark.sql.acl.client.DriverToWebappSqlAclClient.getPermissions(DriverToWebappSqlAclClient.scala:76)
	at com.databricks.spark.sql.acl.client.SqlAclClientWrapper.getPermissions(SparkSqlAclClient.scala:87)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at com.databricks.sql.acl.ReflectionBackedAclClient.$anonfun$getPermissions$1(ReflectionBackedAclClient.scala:119)
	at com.databricks.sql.acl.ReflectionBackedAclClient.stripReflectionException(ReflectionBackedAclClient.scala:97)
	at com.databricks.sql.acl.ReflectionBackedAclClient.getPermissions(ReflectionBackedAclClient.scala:117)
	at com.databricks.sql.acl.ShowPermissionsCommand.$anonfun$run$8(commands.scala:245)
	at scala.Option.map(Option.scala:230)
	at com.databricks.sql.acl.AclCommand.mapIfExists(commands.scala:78)
	at com.databricks.sql.acl.AclCommand.mapIfExists$(commands.scala:75)
	at com.databricks.sql.acl.ShowPermissionsCommand.mapIfExists(commands.scala:226)
	at com.databricks.sql.acl.ShowPermissionsCommand.run(commands.scala:244)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.$anonfun$sideEffectResult$2(commands.scala:84)
	at org.apache.spark.sql.execution.SparkPlan.runCommandWithAetherOff(SparkPlan.scala:180)
	at org.apache.spark.sql.execution.SparkPlan.runCommandInAetherOrSpark(SparkPlan.scala:191)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.$anonfun$sideEffectResult$1(commands.scala:84)
	at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:94)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:81)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:80)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:94)
	at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.$anonfun$applyOrElse$5(QueryExecution.scala:385)
	at com.databricks.util.LexicalThreadLocal$Handle.runWith(LexicalThreadLocal.scala:63)
	at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.$anonfun$applyOrElse$4(QueryExecution.scala:385)
	at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:195)
	at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.$anonfun$applyOrElse$3(QueryExecution.scala:385)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId0$10(SQLExecution.scala:455)
	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:793)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId0$1(SQLExecution.scala:333)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:1184)
	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId0(SQLExecution.scala:204)
	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:730)
	at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.$anonfun$applyOrElse$2(QueryExecution.scala:381)
	at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:1177)
	at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.$anonfun$applyOrElse$1(QueryExecution.scala:377)
	at org.apache.spark.sql.execution.QueryExecution.org$apache$spark$sql$execution$QueryExecution$$withMVTagsIfNecessary(QueryExecution.scala:327)
	at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.applyOrElse(QueryExecution.scala:374)
	at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.applyOrElse(QueryExecution.scala:349)
	at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:505)
	at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(origin.scala:85)
	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:505)
	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:40)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:379)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:375)
	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:40)
	at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:40)
	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:481)
	at org.apache.spark.sql.execution.QueryExecution.$anonfun$eagerlyExecuteCommands$1(QueryExecution.scala:349)
	at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:436)
	at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:349)
	at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:286)
	at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:283)
	at org.apache.spark.sql.Dataset.<init>(Dataset.scala:343)
	at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:131)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:1184)
	at org.apache.spark.sql.SparkSession.$anonfun$withActiveAndFrameProfiler$1(SparkSession.scala:1191)
	at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:94)
	at org.apache.spark.sql.SparkSession.withActiveAndFrameProfiler(SparkSession.scala:1191)
	at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:122)
	at org.apache.spark.sql.SparkSession.$anonfun$sql$2(SparkSession.scala:903)
	at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withMainTracker(QueryPlanningTracker.scala:188)
	at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:891)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:1184)
	at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:891)
	at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:926)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:397)
	at py4j.Gateway.invoke(Gateway.java:306)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:199)
	at py4j.ClientServerConnection.run(ClientServerConnection.java:119)
	at java.lang.Thread.run(Thread.java:750)

14:39 INFO [databricks.labs.blueprint.parallel:crawl_grants] listing grants for hive_metastore 5/5, rps: 0.008/sec
14:39 INFO [databricks.labs.blueprint.parallel:crawl_grants] Finished 'listing grants for hive_metastore' tasks: 100% results available (5/5). Took 0:11:03.374449
14:39 DEBUG [databricks.labs.ucx.framework.crawlers:crawl_grants] [hive_metastore.dummy_s5105.grants] found 2 new records for grants
14:39 INFO [databricks.labs.ucx.installer.workflows] ------ END REMOTE LOGS (SO FAR) -----
14:39 INFO [databricks.labs.ucx.install] Deleting UCX v0.47.1+3820241030141917 from https://DATABRICKS_HOST
14:39 INFO [databricks.labs.ucx.install] Deleting inventory database dummy_s5105
14:39 INFO [databricks.labs.ucx.installer.workflows] Removing job_id=935396207992581, as it is no longer needed
14:39 INFO [databricks.labs.ucx.installer.workflows] Removing job_id=355802195297901, as it is no longer needed
14:39 INFO [databricks.labs.ucx.installer.workflows] Removing job_id=870595227864687, as it is no longer needed
14:39 INFO [databricks.labs.ucx.installer.workflows] Removing job_id=147997973921447, as it is no longer needed
14:39 INFO [databricks.labs.ucx.installer.workflows] Removing job_id=737939400820718, as it is no longer needed
14:39 INFO [databricks.labs.ucx.installer.workflows] Removing job_id=430487333523370, as it is no longer needed
14:39 INFO [databricks.labs.ucx.installer.workflows] Removing job_id=691685576833354, as it is no longer needed
14:40 INFO [databricks.labs.ucx.installer.workflows] Removing job_id=689554654308162, as it is no longer needed
14:40 INFO [databricks.labs.ucx.installer.workflows] Removing job_id=1054485756162167, as it is no longer needed
14:40 INFO [databricks.labs.ucx.installer.workflows] Removing job_id=428290770254466, as it is no longer needed
14:40 INFO [databricks.labs.ucx.installer.workflows] Removing job_id=460936475145852, as it is no longer needed
14:40 INFO [databricks.labs.ucx.installer.workflows] Removing job_id=807325223786771, as it is no longer needed
14:40 INFO [databricks.labs.ucx.installer.workflows] Removing job_id=793768199717038, as it is no longer needed
14:40 INFO [databricks.labs.ucx.install] Deleting cluster policy
14:40 INFO [databricks.labs.ucx.install] Deleting secret scope
14:40 INFO [databricks.labs.ucx.install] UnInstalling UCX complete
[gw9] linux -- Python 3.10.15 /home/runner/work/ucx/ucx/.venv/bin/python

Running from acceptance #7141

@nfx nfx marked this pull request as draft October 25, 2024 12:15
@nfx
Copy link
Collaborator Author

nfx commented Oct 25, 2024

fails integration tests

@nfx
Copy link
Collaborator Author

nfx commented Oct 25, 2024

need local scheduler to improve on debuggability

import logging

from databricks.labs.ucx.contexts.workflow_task import RuntimeContext
from databricks.labs.ucx.runtime import Workflows

logger = logging.getLogger(__name__)


class InProcessDeployedWorkflows:
    def __init__(self, ctx: RuntimeContext):
        self._workflows = {workflow.name: workflow for workflow in Workflows.definitions()}
        self._ctx = ctx

    def run_workflow(self, step: str, **_):
        workflow = self._workflows[step]
        incoming = {task.name: 0 for task in workflow.tasks()}
        queue = []
        for task in workflow.tasks():
            task.workflow = workflow.name
            incoming[task.name] += len(task.depends_on)
        for task in workflow.tasks():
            if incoming[task.name] == 0:
                queue.append(task)
        while queue:
            task = queue.pop(0)
            fn = getattr(workflow, task.name)
            fn(self._ctx)
            for dep in task.depends_on:
                incoming[dep] -= 1
                if incoming[dep] == 0:
                    queue.append(dep)

    def relay_logs(self, step: str):
        pass  # noop

@JCZuurmond JCZuurmond self-requested a review October 25, 2024 13:02
nfx added 6 commits October 30, 2024 10:02
… rules:

- If a table is owned by a principal in the grants table, then that principal is the owner.
- If a table is written to by a query, then the owner of that query is the owner of the table.
- If a table is written to by a notebook or file, then the owner of the path is the owner of the table.
This PR prepends `OWN` grants when migrating ACLs for tables. For detailed logic on ownership heuristics, see #3066
@FastLee FastLee force-pushed the feat/table-ownership-acls branch from db0ba07 to 398f4e4 Compare October 30, 2024 14:03
@FastLee FastLee marked this pull request as ready for review October 30, 2024 14:03
@cached_property
def migrate_grants(self) -> MigrateGrants:
# owner grants have to come first
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because of match_grants returning the first matched grant

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add that to the comment

grant_loaders: list[Callable[[], Iterable[Grant]]] = [
self.table_ownership_grant_loader.load,
self.grants_crawler.snapshot,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the OWN grants in this crawler be ignored?

@FastLee
Copy link
Contributor

FastLee commented Oct 31, 2024

Created a separate PR to handle.

@nfx nfx marked this pull request as draft November 8, 2024 15:33
github-merge-queue bot pushed a commit that referenced this pull request Nov 18, 2024
replaces #3075 
close #2890

---------

Co-authored-by: Serge Smertin <[email protected]>
gueniai added a commit that referenced this pull request Dec 2, 2024
* Added `assign-owner-group` command ([#3111](#3111)). The Databricks Labs Unity Catalog Exporter (UCX) tool now includes a new `assign-owner-group` command, allowing users to assign an owner group to the workspace. This group will be designated as the owner for all migrated tables and views, providing better control and organization of resources. The command can be executed in the context of a specific workspace or across multiple workspaces. The implementation includes new classes, methods, and attributes in various files, such as `cli.py`, `config.py`, and `groups.py`, enhancing ownership management functionality. The `assign-owner-group` command replaces the functionality of issue [#3075](#3075) and addresses issue [#2890](#2890), ensuring proper schema ownership and handling of crawled grants. Developers should be aware that running the `migrate-tables` workflow will result in assigning a new owner group for the Hive Metastore instance in the workspace installation.
* Added `opencensus` to known list ([#3052](#3052)). In this release, we have added OpenCensus to the list of known libraries in our configuration file. OpenCensus is a popular set of tools for distributed tracing and monitoring, and its inclusion in our system will enhance support and integration for users who utilize this tool. This change does not affect existing functionality, but instead adds a new entry in the configuration file for OpenCensus. This enhancement will allow our library to better recognize and work with OpenCensus, enabling improved performance and functionality for our users.
* Added default owner group selection to the installer ([#3370](#3370)). A new class, AccountGroupLookup, has been added to the AccountGroupLookup module to select the default owner group during the installer process, addressing previous issue [#3111](#3111). This class uses the workspace_client to determine the owner group, and a pick_owner_group method to prompt the user for a selection if necessary. The ownership selection process has been improved with the addition of a check in the installer's `_static_owner` method to determine if the current user is part of the default owner group. The GroupManager class has been updated to use the new AccountGroupLookup class and its methods, `pick_owner_group` and `validate_owner_group`. A new variable, `default_owner_group`, is introduced in the ConfigureGroups class to configure groups during installation based on user input. The installer now includes a unit test, "test_configure_with_default_owner_group", to demonstrate how it sets expected workspace configuration values when a default owner group is specified during installation.
* Added handling for non UTF-8 encoded notebook error explicitly ([#3376](#3376)). A new enhancement has been implemented to address the issue of non-UTF-8 encoded notebooks failing to load by introducing explicit error handling for this case. A UnicodeDecodeError exception is now caught and logged as a warning, while the notebook is skipped and returned as None. This change is implemented in the load_dependency method in the loaders.py file, which is a part of the assessment workflow. Additionally, a new unit test has been added to verify the behavior of this change, and the assessment workflow has been updated accordingly. The new test function in test_loaders.py checks for different types of exceptions, specifically PermissionError and UnicodeDecodeError, ensuring that the system can handle notebooks with non-UTF-8 encoding gracefully. This enhancement resolves issue [#3374](#3374), thereby improving the overall robustness of the application.
* Added migration progress documentation ([#3333](#3333)). In this release, we have updated the `migration-progress-experimental` workflow to track the migration progress of a subset of inventory tables related to workspace resources being migrated to Unity Catalog (UCX). The workflow updates the inventory tables and tracks the migration progress in the UCX catalog tables. To use this workflow, users must attach a UC metastore to the workspace, create a UCX catalog, and ensure that the assessment job has run successfully. The `Migration Progress` section in the documentation has been updated with a new markdown file that provides details about the migration progress, including a migration progress dashboard and an experimental migration progress workflow that generates historical records of inventory objects relevant to the migration progress. These records are stored in the UCX UC catalog, which contains a historical table with information about the object type, object ID, data, failures, owner, and UCX version. The migration process also tracks dangling Hive or workspace objects that are not referenced by business resources, and the progress is persisted in the UCX UC catalog, allowing for cross-workspace tracking of migration progress.
* Added note about running assessment once ([#3398](#3398)). In this release, we have introduced an update to the UCX assessment workflow, which will now only be executed once and will not update existing results in repeated runs. To accommodate this change, we have updated the README file with a note clarifying that the assessment workflow is a one-time process. Additionally, we have provided instructions on how to update the inventory and findings by uninstalling and reinstalling the UCX. This will ensure that the inventory and findings for a workspace are up-to-date and accurate. We recommend that software engineers take note of this change and follow the updated instructions when using the UCX assessment workflow.
* Allowing skipping TACLs migration during table migration ([#3384](#3384)). A new optional flag, "skip_tacl_migration", has been added to the configuration file, providing users with more flexibility during migration. This flag allows users to control whether or not to skip the Table Access Control Language (TACL) migration during table migrations. It can be set when creating catalogs and schemas, as well as when migrating tables or using the `migrate_grants` method in `application.py`. Additionally, the `install.py` file now includes a new variable, `skip_tacl_migration`, which can be set to `True` during the installation process to skip TACL migration. New test cases have been added to verify the functionality of skipping TACL migration during grants management and table migration. These changes enhance the flexibility of the system for users managing table migrations and TACL operations in their infrastructure, addressing issues [#3384](#3384) and [#3042](#3042).
* Bump `databricks-sdk` and `databricks-labs-lsql` dependencies ([#3332](#3332)). In this update, the `databricks-sdk` and `databricks-labs-lsql` dependencies are upgraded to versions 0.38 and 0.14.0, respectively. The `databricks-sdk` update addresses conflicts, bug fixes, and introduces new API additions and changes, notably impacting methods like `create()`, `execute_message_query()`, and others in workspace-level services. While `databricks-labs-lsql` updates ensure compatibility, its changelog and specific commits are not provided. This pull request also includes ignore conditions for the `databricks-sdk` dependency to prevent future Dependabot requests. It is strongly advised to rigorously test these updates to avoid any compatibility issues or breaking changes with the existing codebase. This pull request mirrors another ([#3329](#3329)), resolving integration CI issues that prevented the original from merging.
* Explain failures when cluster encounters Py4J error ([#3318](#3318)). In this release, we have made significant improvements to the error handling mechanism in our open-source library. Specifically, we have addressed issue [#3318](#3318), which involved handling failures when the cluster encounters Py4J errors in the `databricks/labs/ucx/hive_metastore/tables.py` file. We have added code to raise noisy failures instead of swallowing the error with a warning when a Py4J error occurs. The functions `_all_databases()` and `_list_tables()` have been updated to check if the error message contains "py4j.security.Py4JSecurityException", and if so, log an error message with instructions to update or reinstall UCX. If the error message does not contain "py4j.security.Py4JSecurityException", the functions log a warning message and return an empty list. These changes also resolve the linked issue [#3271](#3271). The functionality has been thoroughly tested and verified on the labs environment. These improvements provide more informative error messages and enhance the overall reliability of our library.
* Rearranged job summary dashboard columns and make job_name clickable ([#3311](#3311)). In this update, the job summary dashboard columns have been improved and the need for the `30_3_job_details.sql` file, which contained a SQL query for selecting job details from the `inventory.jobs` table, has been eliminated. The dashboard columns have been rearranged, and the `job_name` column is now clickable, providing easy access to job details via the corresponding job ID. The changes include modifying the dashboard widget and adding new methods for making the `job_name` column clickable and linking it to the job ID. Additionally, the column titles have been updated to display more relevant information. These improvements have been manually tested and verified in a labs environment.
* Refactor refreshing of migration-status information for tables, eliminate another redundant refresh ([#3270](#3270)). This pull request refactors the way table records are enriched with migration-status information during encoding for the history log in the `migration-progress-experimental` workflow. It ensures that the refresh of migration-status information is explicit and under the control of the workflow, addressing a previously expressed intent. A redundant refresh of migration-status information has been eliminated and additional unit test coverage has been added to the `migration-progress-experimental` workflow. The changes include modifying the existing workflow, adding new methods for refreshing table migration status without updating the history log, and splitting the crawl and update-history-log tasks into three steps. The `TableMigrationStatusRefresher` class has been introduced to obtain the migration status of a table, and new tests have been added to ensure correctness, making the `migration-progress-experimental` workflow more efficient and reliable.
* Safe read files in more places ([#3394](#3394)). This release introduces significant improvements to file handling, addressing issue [#3386](#3386). A new function, `safe_read_text`, has been implemented for safe reading of files, catching and handling exceptions and returning None if reading fails. This function is utilized in the `is_a_notebook` function and replaces the existing `read_text` method in specific locations, enhancing error handling and robustness. The `databricks labs ucx lint-local-code` command and the `assessment` workflow have been updated accordingly. Additionally, new test files and methods have been added under the `tests/integration/source_code` directory to ensure comprehensive testing of file handling, including handling of unsupported file types, encoding checks, and ignorable files.
* Track `DirectFsAccess` on `JobsProgressEncoder` ([#3375](#3375)). In this release, the open-source library has been updated with new features related to tracking Direct File System Access (DirectFsAccess) in the JobsProgressEncoder. This change includes the addition of a new `_direct_fs_accesses` method, which detects direct filesystem access by code used in a job and generates corresponding failure messages. The DirectFsAccessCrawler object is used to crawl and track file system access for directories and queries, providing more detailed tracking and encoding of job progress. Additionally, new methods `make_job` and `make_dashboard` have been added to create instances of Job and Dashboard, respectively, and new unit and integration tests have been added to ensure the proper functionality of the updated code. These changes improve the functionality of JobsProgressEncoder by providing more comprehensive job progress information, making the code more modular and maintainable for easier management of jobs and dashboards. This release resolves issue [#3059](#3059) and enhances the tracking and encoding of job progress in the system, ensuring more comprehensive and accurate reporting of job status and issues.
* Track `UsedTables` on `TableProgressEncoder` ([#3373](#3373)). In this release, the tracking of `UsedTables` has been implemented on the `TableProgressEncoder` in the `tables_progress` function, addressing issue [#3061](#3061). The workflow `migration-progress-experimental` has been updated to incorporate this change. New objects, `self.used_tables_crawler_for_paths` and `self.used_tables_crawler_for_queries`, have been added as instances of a class responsible for crawling used tables. A `full_name` property has been introduced as a read-only attribute for a source code class, providing a more convenient way of accessing and manipulating the full name of the source code object. A new integration test for the `TableProgressEncoder` component has also been added, specifically testing table failure scenarios. The `TableProgressEncoder` class has been updated to track `UsedTables` using the `UsedTablesCrawler` class, and a new class, `UsedTable`, has been introduced to represent the catalog, schema, and table name of a table. Two new unit tests have been added to ensure the correct functionality of this feature.
@gueniai gueniai mentioned this pull request Dec 2, 2024
gueniai added a commit that referenced this pull request Dec 2, 2024
* Added `assign-owner-group` command
([#3111](#3111)). The
Databricks Labs Unity Catalog Exporter (UCX) tool now includes a new
`assign-owner-group` command, allowing users to assign an owner group to
the workspace. This group will be designated as the owner for all
migrated tables and views, providing better control and organization of
resources. The command can be executed in the context of a specific
workspace or across multiple workspaces. The implementation includes new
classes, methods, and attributes in various files, such as `cli.py`,
`config.py`, and `groups.py`, enhancing ownership management
functionality. The `assign-owner-group` command replaces the
functionality of issue
[#3075](#3075) and addresses
issue [#2890](#2890),
ensuring proper schema ownership and handling of crawled grants.
Developers should be aware that running the `migrate-tables` workflow
will result in assigning a new owner group for the Hive Metastore
instance in the workspace installation.
* Added `opencensus` to known list
([#3052](#3052)). In this
release, we have added OpenCensus to the list of known libraries in our
configuration file. OpenCensus is a popular set of tools for distributed
tracing and monitoring, and its inclusion in our system will enhance
support and integration for users who utilize this tool. This change
does not affect existing functionality, but instead adds a new entry in
the configuration file for OpenCensus. This enhancement will allow our
library to better recognize and work with OpenCensus, enabling improved
performance and functionality for our users.
* Added default owner group selection to the installer
([#3370](#3370)). A new
class, AccountGroupLookup, has been added to the AccountGroupLookup
module to select the default owner group during the installer process,
addressing previous issue
[#3111](#3111). This class
uses the workspace_client to determine the owner group, and a
pick_owner_group method to prompt the user for a selection if necessary.
The ownership selection process has been improved with the addition of a
check in the installer's `_static_owner` method to determine if the
current user is part of the default owner group. The GroupManager class
has been updated to use the new AccountGroupLookup class and its
methods, `pick_owner_group` and `validate_owner_group`. A new variable,
`default_owner_group`, is introduced in the ConfigureGroups class to
configure groups during installation based on user input. The installer
now includes a unit test, "test_configure_with_default_owner_group", to
demonstrate how it sets expected workspace configuration values when a
default owner group is specified during installation.
* Added handling for non UTF-8 encoded notebook error explicitly
([#3376](#3376)). A new
enhancement has been implemented to address the issue of non-UTF-8
encoded notebooks failing to load by introducing explicit error handling
for this case. A UnicodeDecodeError exception is now caught and logged
as a warning, while the notebook is skipped and returned as None. This
change is implemented in the load_dependency method in the loaders.py
file, which is a part of the assessment workflow. Additionally, a new
unit test has been added to verify the behavior of this change, and the
assessment workflow has been updated accordingly. The new test function
in test_loaders.py checks for different types of exceptions,
specifically PermissionError and UnicodeDecodeError, ensuring that the
system can handle notebooks with non-UTF-8 encoding gracefully. This
enhancement resolves issue
[#3374](#3374), thereby
improving the overall robustness of the application.
* Added migration progress documentation
([#3333](#3333)). In this
release, we have updated the `migration-progress-experimental` workflow
to track the migration progress of a subset of inventory tables related
to workspace resources being migrated to Unity Catalog (UCX). The
workflow updates the inventory tables and tracks the migration progress
in the UCX catalog tables. To use this workflow, users must attach a UC
metastore to the workspace, create a UCX catalog, and ensure that the
assessment job has run successfully. The `Migration Progress` section in
the documentation has been updated with a new markdown file that
provides details about the migration progress, including a migration
progress dashboard and an experimental migration progress workflow that
generates historical records of inventory objects relevant to the
migration progress. These records are stored in the UCX UC catalog,
which contains a historical table with information about the object
type, object ID, data, failures, owner, and UCX version. The migration
process also tracks dangling Hive or workspace objects that are not
referenced by business resources, and the progress is persisted in the
UCX UC catalog, allowing for cross-workspace tracking of migration
progress.
* Added note about running assessment once
([#3398](#3398)). In this
release, we have introduced an update to the UCX assessment workflow,
which will now only be executed once and will not update existing
results in repeated runs. To accommodate this change, we have updated
the README file with a note clarifying that the assessment workflow is a
one-time process. Additionally, we have provided instructions on how to
update the inventory and findings by uninstalling and reinstalling the
UCX. This will ensure that the inventory and findings for a workspace
are up-to-date and accurate. We recommend that software engineers take
note of this change and follow the updated instructions when using the
UCX assessment workflow.
* Allowing skipping TACLs migration during table migration
([#3384](#3384)). A new
optional flag, "skip_tacl_migration", has been added to the
configuration file, providing users with more flexibility during
migration. This flag allows users to control whether or not to skip the
Table Access Control Language (TACL) migration during table migrations.
It can be set when creating catalogs and schemas, as well as when
migrating tables or using the `migrate_grants` method in
`application.py`. Additionally, the `install.py` file now includes a new
variable, `skip_tacl_migration`, which can be set to `True` during the
installation process to skip TACL migration. New test cases have been
added to verify the functionality of skipping TACL migration during
grants management and table migration. These changes enhance the
flexibility of the system for users managing table migrations and TACL
operations in their infrastructure, addressing issues
[#3384](#3384) and
[#3042](#3042).
* Bump `databricks-sdk` and `databricks-labs-lsql` dependencies
([#3332](#3332)). In this
update, the `databricks-sdk` and `databricks-labs-lsql` dependencies are
upgraded to versions 0.38 and 0.14.0, respectively. The `databricks-sdk`
update addresses conflicts, bug fixes, and introduces new API additions
and changes, notably impacting methods like `create()`,
`execute_message_query()`, and others in workspace-level services. While
`databricks-labs-lsql` updates ensure compatibility, its changelog and
specific commits are not provided. This pull request also includes
ignore conditions for the `databricks-sdk` dependency to prevent future
Dependabot requests. It is strongly advised to rigorously test these
updates to avoid any compatibility issues or breaking changes with the
existing codebase. This pull request mirrors another
([#3329](#3329)), resolving
integration CI issues that prevented the original from merging.
* Explain failures when cluster encounters Py4J error
([#3318](#3318)). In this
release, we have made significant improvements to the error handling
mechanism in our open-source library. Specifically, we have addressed
issue [#3318](#3318), which
involved handling failures when the cluster encounters Py4J errors in
the `databricks/labs/ucx/hive_metastore/tables.py` file. We have added
code to raise noisy failures instead of swallowing the error with a
warning when a Py4J error occurs. The functions `_all_databases()` and
`_list_tables()` have been updated to check if the error message
contains "py4j.security.Py4JSecurityException", and if so, log an error
message with instructions to update or reinstall UCX. If the error
message does not contain "py4j.security.Py4JSecurityException", the
functions log a warning message and return an empty list. These changes
also resolve the linked issue
[#3271](#3271). The
functionality has been thoroughly tested and verified on the labs
environment. These improvements provide more informative error messages
and enhance the overall reliability of our library.
* Rearranged job summary dashboard columns and make job_name clickable
([#3311](#3311)). In this
update, the job summary dashboard columns have been improved and the
need for the `30_3_job_details.sql` file, which contained a SQL query
for selecting job details from the `inventory.jobs` table, has been
eliminated. The dashboard columns have been rearranged, and the
`job_name` column is now clickable, providing easy access to job details
via the corresponding job ID. The changes include modifying the
dashboard widget and adding new methods for making the `job_name` column
clickable and linking it to the job ID. Additionally, the column titles
have been updated to display more relevant information. These
improvements have been manually tested and verified in a labs
environment.
* Refactor refreshing of migration-status information for tables,
eliminate another redundant refresh
([#3270](#3270)). This pull
request refactors the way table records are enriched with
migration-status information during encoding for the history log in the
`migration-progress-experimental` workflow. It ensures that the refresh
of migration-status information is explicit and under the control of the
workflow, addressing a previously expressed intent. A redundant refresh
of migration-status information has been eliminated and additional unit
test coverage has been added to the `migration-progress-experimental`
workflow. The changes include modifying the existing workflow, adding
new methods for refreshing table migration status without updating the
history log, and splitting the crawl and update-history-log tasks into
three steps. The `TableMigrationStatusRefresher` class has been
introduced to obtain the migration status of a table, and new tests have
been added to ensure correctness, making the
`migration-progress-experimental` workflow more efficient and reliable.
* Safe read files in more places
([#3394](#3394)). This
release introduces significant improvements to file handling, addressing
issue [#3386](#3386). A new
function, `safe_read_text`, has been implemented for safe reading of
files, catching and handling exceptions and returning None if reading
fails. This function is utilized in the `is_a_notebook` function and
replaces the existing `read_text` method in specific locations,
enhancing error handling and robustness. The `databricks labs ucx
lint-local-code` command and the `assessment` workflow have been updated
accordingly. Additionally, new test files and methods have been added
under the `tests/integration/source_code` directory to ensure
comprehensive testing of file handling, including handling of
unsupported file types, encoding checks, and ignorable files.
* Track `DirectFsAccess` on `JobsProgressEncoder`
([#3375](#3375)). In this
release, the open-source library has been updated with new features
related to tracking Direct File System Access (DirectFsAccess) in the
JobsProgressEncoder. This change includes the addition of a new
`_direct_fs_accesses` method, which detects direct filesystem access by
code used in a job and generates corresponding failure messages. The
DirectFsAccessCrawler object is used to crawl and track file system
access for directories and queries, providing more detailed tracking and
encoding of job progress. Additionally, new methods `make_job` and
`make_dashboard` have been added to create instances of Job and
Dashboard, respectively, and new unit and integration tests have been
added to ensure the proper functionality of the updated code. These
changes improve the functionality of JobsProgressEncoder by providing
more comprehensive job progress information, making the code more
modular and maintainable for easier management of jobs and dashboards.
This release resolves issue
[#3059](#3059) and enhances
the tracking and encoding of job progress in the system, ensuring more
comprehensive and accurate reporting of job status and issues.
* Track `UsedTables` on `TableProgressEncoder`
([#3373](#3373)). In this
release, the tracking of `UsedTables` has been implemented on the
`TableProgressEncoder` in the `tables_progress` function, addressing
issue [#3061](#3061). The
workflow `migration-progress-experimental` has been updated to
incorporate this change. New objects,
`self.used_tables_crawler_for_paths` and
`self.used_tables_crawler_for_queries`, have been added as instances of
a class responsible for crawling used tables. A `full_name` property has
been introduced as a read-only attribute for a source code class,
providing a more convenient way of accessing and manipulating the full
name of the source code object. A new integration test for the
`TableProgressEncoder` component has also been added, specifically
testing table failure scenarios. The `TableProgressEncoder` class has
been updated to track `UsedTables` using the `UsedTablesCrawler` class,
and a new class, `UsedTable`, has been introduced to represent the
catalog, schema, and table name of a table. Two new unit tests have been
added to ensure the correct functionality of this feature.
@JCZuurmond JCZuurmond assigned JCZuurmond and FastLee and unassigned JCZuurmond Dec 6, 2024
@FastLee FastLee closed this Dec 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants