Do case-insensitive schema comparisons to test for schema existance #1663

edmundyan · 2019-08-05T16:15:37Z

Resolves #1651

Looked through the unit tests and didn't find anything for dbt.task classes. Happy to help or contribute if you feel that's needed :)

Edit:
closes #913

drewbanin

One comment here, but this is looking really close!

drewbanin · 2019-08-05T16:50:52Z

core/dbt/task/runnable.py

+        required_schemas_lowered = set(map(lower_tuple, required_schemas))
+        existing_schemas_lowered = set(map(lower_tuple, existing_schemas))
+
+        for database, schema in (required_schemas_lowered - existing_schemas_lowered):


One thing worth pointing out here: this is going to call create_schema with the lower-cased versions of database and schema. The adapter.create_schema method will quote these identifiers as configured in the dbt_project.yml file, so I think that given:

quoting: database: True schema: True

and a target configured with:

database: MY_DATABASE schema: MY_SCHEMA

dbt would errantly create a schema at "my_database"."my_schema", which would be incorrect.

I think that means that we want to generate some sort of mapping between the lower-cased schema tuples and their untransformed counterparts.

Oh interesting, I didn't know about the quoting option. It feels like if someone specified quoting=True, dbt should actually be doing all db or schema checks with case-sensitivity? However, there are a lot of functions out there that are lower()ing things by default and that feels like a harder fix.

Would another solution be to do a schema check before creating it? The quote_policy is used in the check, but unsure how the internals work. Although I would be worried about projects with lots of required schemas.

if adapter.check_schema_exists(db, schema): adapter.create_schema(db, schema)

Anyway, I updated the PR to create the schema with the original model/target values. One remaining bug is that if your project and your target have different cases, this will call the create_schema function twice (because of that extra add() for the snowflake use {schema} bug)

It feels like if someone specified quoting=True, dbt should actually be doing all db or schema checks with case-sensitivity?

Yeah, I think that's generally right, but it can be pretty tricky to do exactly the right thing in every circumstance. Snowflake supports a quoted_identifier_ignore_case parameter, which is all the proof I need that it would be impossible for dbt to aways be exactly right on every database it supports :). Instead, we disallow dbt to operate on two databases/schemas that differ only in casing - I don't think there's a good use case for that, so dbt just fails hard if it finds that that's the case - it's usually a misconfiguration!

Would another solution be to do a schema check before creating it?

Yeah! That's a great idea! I like the way that you implemented this.

One remaining bug is that if your project and your target have different cases ... this will call the create_schema function twice

Really good point! I took a quick scan through the codebase, and I don't think dbt runs use schema ... anymore, so we could conceivably remove this line of code!! I'd need to think a lot harder about that, but if it turns out that we can remove this line, then we'd also be fixing #913 :D

Ok - I spent some more time verifying that we don't run the use schema... query anywhere. Let's remove it!! Are you comfortable making that change in this PR?

edmundyan · 2019-08-06T15:41:29Z

Redshift integration tests are breaking:

E           DatabaseException: Database Error
E             table 7958425 dropped by concurrent transaction

Is this a race condition in the tests?

drewbanin · 2019-08-06T15:42:48Z

@edmundyan it is in fact a race condition in Redshift :(

That's one for another issue though! Let me re-run that particular test.

drewbanin

Nice work @edmundyan!

…ered Do case-insensitive schema comparisons to test for schema existance

drewbanin requested changes Aug 5, 2019

View reviewed changes

Do case-insensitive schema comparison before trying to create a schema

265f6d3

edmundyan force-pushed the ey_create_schemas_lowered branch from f03f9cc to 265f6d3 Compare August 5, 2019 18:13

drewbanin added the bug Something isn't working label Aug 6, 2019

Remove profile schema as we no longer run 'use schema'

e867cfa

drewbanin approved these changes Aug 6, 2019

View reviewed changes

drewbanin merged commit 3e3c69e into dbt-labs:dev/0.14.1 Aug 6, 2019

edmundyan pushed a commit to edmundyan/dbt that referenced this pull request Aug 6, 2019

Merge pull request dbt-labs#1663 from edmundyan/ey_create_schemas_low…

8d0f8bb

…ered Do case-insensitive schema comparisons to test for schema existance

edmundyan deleted the ey_create_schemas_lowered branch August 6, 2019 19:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do case-insensitive schema comparisons to test for schema existance #1663

Do case-insensitive schema comparisons to test for schema existance #1663

edmundyan commented Aug 5, 2019 •

edited by drewbanin

Loading

drewbanin left a comment

drewbanin Aug 5, 2019

edmundyan Aug 5, 2019

drewbanin Aug 6, 2019

drewbanin Aug 6, 2019

edmundyan Aug 6, 2019

edmundyan commented Aug 6, 2019

drewbanin commented Aug 6, 2019

drewbanin left a comment

Do case-insensitive schema comparisons to test for schema existance #1663

Do case-insensitive schema comparisons to test for schema existance #1663

Conversation

edmundyan commented Aug 5, 2019 • edited by drewbanin Loading

drewbanin left a comment

Choose a reason for hiding this comment

drewbanin Aug 5, 2019

Choose a reason for hiding this comment

edmundyan Aug 5, 2019

Choose a reason for hiding this comment

drewbanin Aug 6, 2019

Choose a reason for hiding this comment

drewbanin Aug 6, 2019

Choose a reason for hiding this comment

edmundyan Aug 6, 2019

Choose a reason for hiding this comment

edmundyan commented Aug 6, 2019

drewbanin commented Aug 6, 2019

drewbanin left a comment

Choose a reason for hiding this comment

edmundyan commented Aug 5, 2019 •

edited by drewbanin

Loading