-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CT-660] [Feature] Add Grant SQL to Global Project #5263
Comments
Copying from Slack message sent a few weeks ago: On some databases, On some databases, I think there are two approaches we can take:
Why would we prefer 2 over 1? Here's what I've heard from customers:
I think it's acceptable to pursue option 1 for now if it is indeed simpler. I anticipate we'll want to implement methods/macros for |
@jtcohen6 Note from convo with Gerda and Matt. We think we want grants after creation -- after the after_run method -- but these grants should run before post hooks are invoked. That sound right? |
@VersusFacit I agree that we should run grants right after the model is created/replaced, to minimize downtime for downstream queriers! Important to call out that "post hooks," as users define them, are run within the materialization here: dbt-core/core/dbt/include/global_project/macros/materializations/models/table/table.sql Line 58 in 0d8e061
dbt-core/core/dbt/include/global_project/macros/materializations/models/table/table.sql Line 68 in 0d8e061
The The |
added revocation logic to implementation block |
Summary of conversation just now "Call stack":
Default behaviorWhat should the default do? Correctest by default: revoke all grants, then add all the configured ones. This will always return the correct results, even if it requires an extra step where not strictly needed. On databases that allow us to do it, we should "batch" together those two statements so that they share the same connection/session/transaction: Pseudo code: {% macro get_revoke_all_sql(relation) %}
revoke all on {{ relation.type }} {{ relation }}
{% endmacro %}
{% macro get_grant_sql(relation, grant_config) %}
{% for privilege in grant_config.keys() %}
{% set recipients = grant_config[privilege] %}
grant {{ privilege }} on {{ relation.type }} {{ relation }} to {{ recipients | join(', ') }}
{% endfor %}
{% endmacro %}
{% macro apply_grants(relation, grant_config, revoke=True) %}
{% if grant_config %}
{% call statement('grants') %}
{{ get_revoke_all_sql(relation) if revoke else "" }}
{{ get_grant_sql(relation, grant_config) }}
{% endcall %}
{% endif %}
{% endmacro %} If the user hasn't configured grants AT ALL, I assume they don't want us messing with them! If they do want dbt to manage grants on that table, they could specify something like:
dbt would still revoke all grants from the table, and apply no new ones. If users want much more advanced revocation that calculates diffs dynamically, I'd be excited to see what they come up with! It will be totally possible to customize / reimplement Thinking through different materialization typesIncremental models (partial refresh), seeds, snapshots: To revoke or not to revoke?
I lean toward option 1, which has us revoking all before reapplying. Optimize for correct access over risk of milliseconds of downtime. That means, even on adapters which are Thinking through different adaptersAs a special case, on Postgres/Redshift/Snowflake, IFF we're completely replacing an object, we don't strictly need the BigQuery + Spark/Databricks copy over grants during Fuller gloss on Snowflake
|
feedback from the above comment is now reflected in the description. |
@nathaniel-may Mea culpa, I clearly didn't do my full research ahead of time. Glad we're figuring it all out now, rather than mid-implementation! This works (PostgreSQL), where revoke all on table dbt_jcohen.whatever from other; These do not: revoke all on table dbt_jcohen.whatever;
revoke all on table dbt_jcohen.whatever from all; The only way to know how to go here is by first running a metadata query to show who else has grants on the table: select distinct grantee
from information_schema.role_table_grants
where table_schema = 'dbt_jcohen'
and table_name = 'whatever'
and grantee != current_role; -- postgres offers us current_role, but we could also use {{ target.user }} (This is PostgreSQL, which queries the While many databases do support The good/bad news is, it means we do need to take a slightly cleverer approach, where we first inspect the object to see the grants / grantees on it, and then revoke grants if granted to not-the-current-user. There are still two forms this could take:
So I think we will need three macros:
The default case of ExampleImagine a case where we'd previously over-granted on a model, and have since trimmed down its grant config to just the necessary -- models/whatever.sql
{{ config(
materialized = 'incremental',
grants = {
'select': ['other, 'another'],
'insert': ['other', 'another'],
}
) }}
select ... So dbt previously ran (= as mocked by me in
But now, its config is just: -- models/whatever.sql
{{ config(materialized = 'incremental', grants = {'select': ['other']}) }} Putting that flow together in PostgreSQL:
Or, better yet ("sophisticated" approach):
Caching?Eventually, for truly optimal performance, we could think about including grant information for each dbt-controlled resource in the caching queries that dbt runs at the start, so that we're not running |
made minor changes to |
UpdateWe have open PRs in |
Description
This is the second in a two ticket series. #5189 is the first ticket. They will be merged sequentially, but both are required for this feature to be exposed to users.
Today users often configure a
post_hook
to grant permissions on models, seeds, and snapshots:These two tickets aim to make it easier to specify grants allowing it to be configured directly both in dbt_project.yml as well as in source files:
These grant configs will not necessarily look the same for each and every warehouse. The logic to generate the sql from these configs can be overriden by adapters.
Some complexity gets added when you consider that you cannot simply revoke all privileges from a table and apply the configured ones. This is because revoking all may also revoke dbt's ownership of the table, and because some warehouses require explicitly naming the grants to be removed. This implementation reads the grants first, then constructs a minimal revoke and grant statement from the resulting diff.
Implementation
get_show_grant_sql(relation: Relation) -> str
that retrieves grant information for that model. The default implementation should return a string in the form"show grants on table dbt_jcohen.whatever"
get_grant_sql(relation: Relation, grant_config: dict) -> str
that creates the warehouse-specific sql from grant portion of the config. Since it will be common, the default implementation should returngrant <privilege> on <relation.type> <relation> to <recipients>
. Warehouses that deviate from this can override. Use dispatch pattern (dbt docs) with adefault__
to enable this. The macro signature here is designed to accept different shapes of grant configs for each adapter to use whatever best fits the warehouse's permissions system.get_revoke_sql
that takes a relation, and a grant config dict and returns a string in the formf"revoke {grant_config.privilege} on {relation} from {grant_config.recipients}"
. The grant config will contain multiple priv-recipient mappings.apply_grants
which takes 3 parameters:revoke: Bool
,relation: Relation
,grant_config: dict
and returnsNone
. It callsget_show_grant_sql
to determine what grants are currently applied, and uses grant config to determine what grants need to be revoked, and which grants need to be granted. If therevoke
param isTrue
,get_revoke_sql
is called, thenget_grant_sql
with a new dictionary representing the diff to apply the grants. (seepersist_docs
for an example of similar implementation). This should be overridable by adapters so use the dispatch pattern (dbt docs) with adefault__
to enable this. For a complete example see @jtcohen6's comment below.apply_grants(..., revoke=True)
in all materializations. This includes incremental models, seeds, and snapshots even though it will usually not be necessary. Users can override if they come across a special case where they need to.Adapters
Links for your convenience. These do not need to be completed to close out the ticket.
The text was updated successfully, but these errors were encountered: