-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hive Metastore on multiple workspaces may point to the same assets. We need to dedupe upgrades. #335
Comments
Thinking of an incremental approach here:
|
This ticket does not explicilty mention the scenario where workspace 1 is using a instance profile which only has read permission on a table, but workspace 2 is using different instance profile that has write permission on the same table. When we migrate a table to UC on the first workspace using Glue metastore, we need to make sure that all permissions are gathered across all workspaces |
Start with a federated query for a (new) validate command. Dashboard might come later |
Is already implemented for external locations in #2341 |
Verify if the command works for external hive metastore, if so close issue |
Waiting for Demo environment so that we can test this without disrupting other people's work |
Depends on #3563 |
Is there an existing issue for this?
Problem statement
Need to handle duplication of credentials & prefixes across different workspaces
Proposed Solution
Additional Context
Requires:
#910
Now for tables, there also needs to be a report on table/db inconsistency - like
A: db1.tbl1, db1.tbl3
B: db1.tbl2
And the team(s) that are driving UC Migration within account would make a decision after some time in review (of excel spreadsheet). By the way, we can split UCX installation across different Azure Subscriptions. And every installation would just focus on defining target catalog mapping per database. But here are unanswered questions:
two workspaces, same dbs, all different tables and columns (all managed tables, effectively)
two workspaces, same dbs, 90% same tables, 10% are different tables
two workspaces, two different dbs
We can technically support both db_to_catalog and workspace_to_catalog, and even at the same time, but db_to_catalog will override workspace_to_catalog. We also need default_catalog_for_workspace, if workspace_to_catalog is set (default catalog for all workspaces is set per metastore)..
We can also do another override for tables, but we have unanswered questions:
what if same db, same workspace, same table, but different columns/order/types? Ignore and keep in hive metastore? And then rerun the scan for tables and grants?
what if during migration catalog/database/table were deleted either from hms and/or uc?
Speaking of metastores, in the beginning, there needs to be workspace_to_metastore mapping with default_metastore_for_workspace. Can we come up with a good default mapping here? Coarse or fine grained? Select between the two? Ask for inline input? How many conflicts we expect to justify the need to create/support custom mapping?
the last very important question is what future-proof configuration format might we need for this mapping.
The text was updated successfully, but these errors were encountered: