Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do case-insensitive schema comparisons to test for schema existance #1663

Merged
merged 2 commits into from
Aug 6, 2019

Conversation

edmundyan
Copy link
Contributor

@edmundyan edmundyan commented Aug 5, 2019

Resolves #1651

Looked through the unit tests and didn't find anything for dbt.task classes. Happy to help or contribute if you feel that's needed :)

Edit:
closes #913

Copy link
Contributor

@drewbanin drewbanin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One comment here, but this is looking really close!

required_schemas_lowered = set(map(lower_tuple, required_schemas))
existing_schemas_lowered = set(map(lower_tuple, existing_schemas))

for database, schema in (required_schemas_lowered - existing_schemas_lowered):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing worth pointing out here: this is going to call create_schema with the lower-cased versions of database and schema. The adapter.create_schema method will quote these identifiers as configured in the dbt_project.yml file, so I think that given:

quoting:
  database: True
  schema: True

and a target configured with:

database: MY_DATABASE
schema: MY_SCHEMA

dbt would errantly create a schema at "my_database"."my_schema", which would be incorrect.

I think that means that we want to generate some sort of mapping between the lower-cased schema tuples and their untransformed counterparts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh interesting, I didn't know about the quoting option. It feels like if someone specified quoting=True, dbt should actually be doing all db or schema checks with case-sensitivity? However, there are a lot of functions out there that are lower()ing things by default and that feels like a harder fix.

Would another solution be to do a schema check before creating it? The quote_policy is used in the check, but unsure how the internals work. Although I would be worried about projects with lots of required schemas.

if adapter.check_schema_exists(db, schema):
    adapter.create_schema(db, schema)

Anyway, I updated the PR to create the schema with the original model/target values. One remaining bug is that if your project and your target have different cases, this will call the create_schema function twice (because of that extra add() for the snowflake use {schema} bug)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels like if someone specified quoting=True, dbt should actually be doing all db or schema checks with case-sensitivity?

Yeah, I think that's generally right, but it can be pretty tricky to do exactly the right thing in every circumstance. Snowflake supports a quoted_identifier_ignore_case parameter, which is all the proof I need that it would be impossible for dbt to aways be exactly right on every database it supports :). Instead, we disallow dbt to operate on two databases/schemas that differ only in casing - I don't think there's a good use case for that, so dbt just fails hard if it finds that that's the case - it's usually a misconfiguration!

Would another solution be to do a schema check before creating it?

Yeah! That's a great idea! I like the way that you implemented this.

One remaining bug is that if your project and your target have different cases ... this will call the create_schema function twice

Really good point! I took a quick scan through the codebase, and I don't think dbt runs use schema ... anymore, so we could conceivably remove this line of code!! I'd need to think a lot harder about that, but if it turns out that we can remove this line, then we'd also be fixing #913 :D

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok - I spent some more time verifying that we don't run the use schema... query anywhere. Let's remove it!! Are you comfortable making that change in this PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed!

@edmundyan edmundyan force-pushed the ey_create_schemas_lowered branch from f03f9cc to 265f6d3 Compare August 5, 2019 18:13
@drewbanin drewbanin added the bug Something isn't working label Aug 6, 2019
@edmundyan
Copy link
Contributor Author

Redshift integration tests are breaking:

E           DatabaseException: Database Error
E             table 7958425 dropped by concurrent transaction

Is this a race condition in the tests?

@drewbanin
Copy link
Contributor

@edmundyan it is in fact a race condition in Redshift :(

That's one for another issue though! Let me re-run that particular test.

Copy link
Contributor

@drewbanin drewbanin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work @edmundyan!

@drewbanin drewbanin merged commit 3e3c69e into dbt-labs:dev/0.14.1 Aug 6, 2019
edmundyan pushed a commit to edmundyan/dbt that referenced this pull request Aug 6, 2019
…ered

Do case-insensitive schema comparisons to test for schema existance
@edmundyan edmundyan deleted the ey_create_schemas_lowered branch August 6, 2019 19:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
2 participants