Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pass schema in credentials to Postgresql #1476

Merged
merged 9 commits into from
Jun 12, 2019

Conversation

buremba
Copy link
Contributor

@buremba buremba commented May 21, 2019

There is a property called schema in Postgresql credentials but it looks like DBT doesn't actually use it probably because Psycopg2 doesn't provide a native way to pass schema to Postgresql.

It's a bit tricky but there's actually a way to do that via options of libpq. I believe that we should support this feature since we have the schema parameter in Postgresql configuration.

@drewbanin
Copy link
Contributor

Hi @buremba - thanks for making this PR! Can you tell me more about why you're interested in setting the search path for the Postgres connection? This isn't something I've ever had to do before, and I'm curious what motivated this change on your end.

@buremba
Copy link
Contributor Author

buremba commented May 21, 2019

Hey @drewbanin, thanks for the answer. We use DBT within our product, the users often write SQL queries without referencing the actual schema since we have the concept of default schema and most of them actually use single schema. Requiring the schema in all table references is verbose in our case.

@drewbanin
Copy link
Contributor

hey @buremba - thanks for the additional information. Cool to hear that you're using dbt in your product! While the change you're describing sounds reasonable, I don't think dbt's connection logic is necessarily an appropriate place for this configuration.

Is it instead possible to use a pre-hook to run:

set search_path to '<search path>'

or can you maybe configure user accounts with:

ALTER ROLE <user> SET search_path = '<search path>'

@buremba
Copy link
Contributor Author

buremba commented May 21, 2019

@drewbanin both solutions that you suggested work in practice but I believe that they're kinda workaround considering dbt already provides a way to set the schema via credentials.

Is there any other place that you use the schema in this case? If not, I think that we need to remove that parameter in order to avoid confusion if we're not going to merge this PR. However; since the other adapters such as Snowflake already support this feature using the method in this PR, it would be much better to support it in PG & Redshift for convenience.

@beckjake
Copy link
Contributor

@buremba We use the schema in dbt all over the place - if you don't specify an override schema to your dbt models, we use the one assigned in the credentials.

@buremba
Copy link
Contributor Author

buremba commented May 21, 2019

@beckjake, could you please point me where the schema of PostgresCredentials being used in the codebase?

Let me explain the problem again:
We use both Snowflake and Postgresql and use schema parameter in both adapters. When we execute a simple query such as SELECT count(*) my_table in Snowflake, dbt automatically execute the query in the schema that I defined since it passes schema to Snowflake.
However; that's not the case in Postgresql, SELECT count(*) my_table references public.my_table no matter what I use as schema in my credentials.

@drewbanin
Copy link
Contributor

drewbanin commented May 21, 2019

@buremba the schema config in a profiles.yml target fundamentally configures the destination schema that dbt builds models into. This is the case on every database that dbt supports. I would advise against removing it from the target config, as that would cause dbt to no longer work.

I'm open to discussing this feature request, and the appropriate place to do that is in an issue. Please open a new issue indicating that you have a feature request to supply a value for the search_path in a Postgres/Redshift connection. This issue template does a good job of ensuring that feature requests are well defined. I will be more than happy to discuss the implementation details with you in that issue.

beckjake, could you please point where the schema of PostgrestCredential being used in the codebase?

Don't do that, man. Jake's written tens of thousands of lines of code in the dbt codebase - that doesn't mean that he's always right, but it does mean that you should give some thought to his feedback in a PR comment. In this case though, he is very right!

It sounds to me like you are using dbt in a specific way inside of your product. I am very happy to hear that! I also need you to understand that this is not the most common use-case for dbt, and the code that you've supplied in this PR would absolutely break people's dbt projects on Postgres/Redshift. I am happy to explain to you why this is the case, and I am also happy to suggest alternative approaches that will help you accomplish your goals while still ensuring that we don't break dbt projects for no reason.

I look forward to following up with you on your feature request. Closing.

@drewbanin drewbanin closed this May 21, 2019
@buremba
Copy link
Contributor Author

buremba commented May 21, 2019

@drewbanin @beckjake, sorry for the misunderstanding. I really thought that I was missing something so that's why I explained the case after that question.

@drewbanin
Copy link
Contributor

Discussed in #1477

@buremba
Copy link
Contributor Author

buremba commented May 22, 2019

Hey @drewbanin, it looks like the tests are failing even though I was able to make it work in my local environment. Somehow, it doesn't recognize the search_path property of PostgresCredentials even though it's already there, maybe it's related to a dependency issue? Do you mind taking a look at it, I'm kinda confused?

@drewbanin
Copy link
Contributor

Hmmm, yeah, this test failure is pretty perplexing. I'll check it out!

@beckjake
Copy link
Contributor

beckjake commented May 23, 2019

If the value isn't required, which it's not (as it shouldn't be!) you shouldn't use .search_path to look it up. You'll want to do something like:

search_path = credentials.get('search_path')
if search_path:
   ...

Instead of the current:

if credentials.search_path is not None and credentials.search_path != ''

Which will only work if the target has a search_path set.

@drewbanin
Copy link
Contributor

Just kicked off a build of the integration tests

Copy link
Contributor

@drewbanin drewbanin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM when the tests pass. @beckjake do you have anything to add?

.gitignore Outdated
@@ -76,3 +76,6 @@ target/

# Vim
*.sw*

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you remove this? you can create a .gitignore locally without including it in this repo's .gitignore. Instructions here: https://help.github.com/en/articles/ignoring-files#create-a-global-gitignore

Copy link
Contributor

@drewbanin drewbanin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There were a couple of flake8 (style guide) issues here: https://circleci.com/gh/fishtown-analytics/dbt/12915?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link

Should be pretty quick fixups - just make these lines < 80 chars and should be good to go

@@ -105,6 +108,12 @@ def open(cls, connection):
if keepalives_idle:
kwargs['keepalives_idle'] = keepalives_idle

# psycopg2 doesn't support search_path officially, see https://github.com/psycopg/psycopg2/issues/465
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plugins/postgres/dbt/adapters/postgres/connections.py:111:80: E501 line too long (109 > 79 characters)

# psycopg2 doesn't support search_path officially, see https://github.com/psycopg/psycopg2/issues/465
search_path = credentials.get('search_path')
if search_path is not None and search_path != '':
# see https://www.postgresql.org/docs/9.5/libpq-connect.html#LIBPQ-CONNECT-OPTIONS
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plugins/postgres/dbt/adapters/postgres/connections.py:114:80: E501 line too long (94 > 79 characters)

search_path = credentials.get('search_path')
if search_path is not None and search_path != '':
# see https://www.postgresql.org/docs/9.5/libpq-connect.html#LIBPQ-CONNECT-OPTIONS
kwargs['options'] = '-c search_path={}'.format(search_path.replace(' ', '\\ '))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plugins/postgres/dbt/adapters/postgres/connections.py:115:80: E501 line too long (91 > 79 characters)

@@ -75,7 +78,7 @@ def type(self):
return 'redshift'

def _connection_keys(self):
return ('host', 'port', 'user', 'database', 'schema', 'method')
return ('host', 'port', 'user', 'database', 'schema', 'method', 'search_path')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plugins/redshift/dbt/adapters/redshift/connections.py:81:80: E501 line too long (86 > 79 characters)

@beckjake
Copy link
Contributor

This looks great to me once the lint issues are resolved

@buremba
Copy link
Contributor Author

buremba commented May 27, 2019

It looks like the test failures are not related to the changes done in this PR. Can someone review the PR again, please?

@drewbanin
Copy link
Contributor

Just kicked off the tests again - looks like there's one remaining flake8 issue:

plugins/redshift/dbt/adapters/redshift/connections.py:130:21: E131 continuation line unaligned for hanging indent

@buremba
Copy link
Contributor Author

buremba commented May 28, 2019

Looks like PyCharm is not fully compatible with PEP8 as Format Code caused another PEP8 issue. 😬
Is there any way for me to kick off the tests for a PR so that I don't take your time?

@beckjake
Copy link
Contributor

You should be able to just run flake8 --select=E,W,F --ignore=W504 core/dbt plugins/*/dbt after installing dbt's dev_requirements.txt

@buremba
Copy link
Contributor Author

buremba commented May 28, 2019

@beckjake yep, that's what I did after seeing it intox.ini. However; your CI environment is doing much more than that (.circleci/config.yml) so it would be great it's triggered automatically.

@beckjake
Copy link
Contributor

CircleCI is just running flake8 via tox, as specified here - so that flake8 command should be it.

@buremba
Copy link
Contributor Author

buremba commented May 28, 2019

I assumed that the jobs for integration tests in that file are also active, aren't they? I will be running flake8 before pushing commits to the PRs but if I can be sure that all the tests are passing just fine before requesting a review, it would be much more efficient for both of us IMO.

@beckjake
Copy link
Contributor

Oh, yes, that will only test the flake8 part. The azure pipeline tests will run for unit and postgres integration tests without us interacting, I think if those + local flake8 tests pass the easiest thing is to wait for us to run them - running the integration tests locally requires a snowflake/redshift/bigquery account, which is why we have to kick those off manually: we don't want to expose authentication information to pull requests without verifying the code.

For this change, I think if your postgres + unit + flake8 tests all pass, you are probably going to be ok!

@buremba
Copy link
Contributor Author

buremba commented Jun 3, 2019

FYI, my last commit fixes the PEP8 Code Style issue so this PR should be good to go.

@drewbanin
Copy link
Contributor

Thanks for the update @buremba - this is going to get merged for 0.14.0!

We're working through some issues with our CI setup. Fear not though, this is definitely making it into our next release :)

@beckjake beckjake merged commit d760229 into dbt-labs:dev/wilt-chamberlain Jun 12, 2019
@beckjake
Copy link
Contributor

This looks good to me, thanks @buremba!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants