-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for BigQuery "Authorized Views" #1718
Comments
Hey @heisencoder - so many of the feature requests you've suggested would be such no-brainers if BQ's operational SQL support was a little bit more complete :D I'm really dismayed to see that this functionality requires API requests - that makes it so much harder to hook into dbt's sql compilation pipeline. I'm (temporarily) slapping a #wontfix on here, insofar as there's no way for us to implement this until dbt has some sort of first-class notion of datasets/schemas in a yaml file somewhere. I'm still not particularly excited about supporting that, but I am very open to discussing it further. In the meantime, I think my recommendation would be to manage these dataset permissions manually outside the scope of dbt. Let me know what you think! |
I'm a little sad that this can't be supported now, but can investigate ways to add non-dbt extensions to add authorized views and dataset/schema documentation. That said, I feel like adding a field to the schema.yml file for dataset/schema documentation would itself be a useful feature for the documentation server, and after this is in place, it would be straightforward to add a hook for specifying authorized views. |
Yep! Totally agree. I think for databases like BQ that are api-driven, we should figure out a good approach for exposing hooks into the api via the Whereas on Redshift, a hook like this might look like:
we should instead be able to do something like:
or similar. We'll need to be really careful with an approach like this. When dbt evaluates this jinja code, it will invoke an api call. We want that to happen at run-time (after the model has been built), not at parse-time! I don't want to pollute the |
I hadn't thought about the adapter.addAccessEntry route, but this makes sense as a possible approach. Maybe the implementation could be via a new bigquery macro that only runs the adapter hook if {% macro addAccessEntry(/* authorized view parameters */) -%}
{%- if execute: -%}
{{ adapter.addAccessEntry(/*authorized view parameters*/) }}
{%- endif -%}
{%- endmacro %} |
I'm interested in adding this functionality -- if you can nudge me in the right direction I'll be happy to put a pull request together |
Hi d-swift, Drew or Jake can correct me, but I think it would be roughly like this: include/bigquery/macros: adapters/bigquery/impl.py: class BigQueryAdapter:
...
@available
def addAccessEntry(self, ...):
conn = self.connections.get_thread_connection()
client = conn.handle
# See https://cloud.google.com/bigquery/docs/share-access-views There's also documentation updates and testing... But, before you start, make sure Drew will accept the pull request. |
Oof - I missed these updates! Thanks @heisencoder - this is super spot-on. My thinking has changed somewhat since making this comment above:
Things like this are pretty well suited for Python, and I don't want to shoehorn them into SQL/jinja if they don't actually belong there. For my part, I wish BigQuery would release DDL to create authorized views in SQL. It looks like the magic happens with:
So, we're going to need:
So, I'd add a method to the BigQuery
This method would implement the snippet shown above to add an access entry to the source dataset which authorizes selection from the newly created view. I'd add this logic at the bottom of the
The one big challenge here is that If anyone knows a PM over at Google, kindly inform them of the existence of |
Is adding support for authorized views on the roadmap? Need this feature for my company's dbt project so I'm happy to contribute on the solution if needed |
hey @azhard - this feature is not currently on the roadmap, and we don't intend to pick it up ourselves any time soon! It is a good feature though, and we'd be super happy to work with you on a PR that adds this functionality to dbt. Let us know how we can help! |
Awesome I'll give it a shot then! In terms of the question you asked about whether dbt should automatically derive the I have a few thoughts on the matter:
Based on that my thinking is auto-infer but throw an error if more than one |
Also separate question, I'm struggling to access the sources from |
Hey @drewbanin any thoughts on the above? |
hey @azhard - thanks for following up on this one! I just re-read the BQ docs for authorized views. This is an unfortunately implemented feature IMO, and I'm struggling to think through how dbt could work well in this environment. I think we should break this feature down into two parts:
My general thinking here is that step 1, granting reader permissions to a group on a dataset, is really only something that needs to happen once for a given group+dataset. It would be a little silly to add the same group as a reader on a dataset every single time dbt built a view. Opposite to that, dbt must try to add an access entry to a source dataset every time it creates a view. At a high-level, I can imagine this working with a run-operation + a model config. That might look like:
If we wanted to sugar this up, we could add a new field to the spec for the
The For the model config, I'm picturing:
I think that being explicit about the Referencing sources in a model's configSources are "captured" by dbt's jinja parser whenever a
Then So, this is a pretty involved issue, and it's made a lot gnarlier by BigQuery's implementation. In an ideal world, you could run a simple |
@drewbanin super helpful info thanks! Just pushed a PR, I decided to keep the implementation generic as there's a few ways that access_entries can be used |
Describe the feature
Add support for BigQuery "Authorized Views".
See https://cloud.google.com/bigquery/docs/share-access-views
Many BigQuery users use authorized views as a way to provide access to users for a subset of data in a database without having to grant access to the whole database.
For this to work, we'd need a way to describe these authorizations on a dataset (i.e., schema in non-BigQuery adapters). This could possibly be done via extensions to the yml file formats, either in the schema.yml or dbt_project.yml files. Since this is a dataset property (not a model property), it can't be described in the post_hooks section for the models.
See also #1714, which runs into a similar problem for documenting a dataset. That feature also needs a way to specify metadata on a dataset/schema.
The text was updated successfully, but these errors were encountered: