Fix run repeatability/caching issues (#1138, #1139, #1140) #1144

beckjake · 2018-11-19T16:59:19Z

This fixes a number of issues discovered in 0.12.1. Two are sinter-only issues, and one is a kind of egregious (but in practice mostly benign) bug around case comparisons that almost entirely manifests in snowflake.

I had a heck of a time reproducing user reports, and the new integration tests have to really dig into the cache a bit more than I'd like, but I do believe this fixes #1140 and the tests will catch any regressions.

Basically, cache dict keys are now always lowercased, and access to fields from the inner Relation are also lowercased, but the underlying Relation representation is not.

One thing that's a bit weird is that added nodes now show up in list_relations as their inserted text (though with proper quoting fields). dbt seems to handle that well and the tests are okay, but it is a potential source of issues for things that inspect list_relations output. I don't see a way around that without trying to guess what the db would do to your relation names, which we discussed in slack and decided is not a great plan.

… init time

drewbanin

I tested this out a bunch on Snowflake and Redshift. Everything worked as expected, and the logs (with --log-cache-events) all checked out.

Just for my understanding, you mentioned:

One thing that's a bit weird is that added nodes now show up in list_relations as their inserted text (though with proper quoting fields).

Can you give me an example of what that would look like in practice? I think it's ok, but just want to confirm.

beckjake · 2018-11-20T01:53:06Z

Sure! The easy example is on snowflake so I'll use that. Imagine you have models/model.sql and your schema is analytics. In snowflake, with quoting set to False and assuming you haven't set the server-side flag that defaults things to lowercase, the name of the relation will become ANALYTICS.MODEL on the database end when it's created.

Calling _list_relations() would then come back with schema="ANALYTICS", identifier="MODEL" with quoting set to True for schema/identifier on the resulting relation.

With this cache, when you call list_relations() and have a cache hit, you'll get schema="analytics", identifier="model" and quoting set to False. The old behavior is more precise, but the new behavior is as accurate as we can be without going out to the database and actually re-querying.

drewbanin · 2018-11-20T01:56:21Z

ok, thanks! That's really helpful. I think dbt has to work this way. It would be wholly unreasonable for us to predict the exact casing that Snowflake will pick. The user might set quoted_identifier_ignore_case to TRUE, for example, and checking before each cache entry would defeat the purpose of caching. This is great, 🚢 it

Jacob Beck added 7 commits November 19, 2018 09:51

set cache logger propagation during invocation time instead of at log…

6dd04b1

… init time

on dbt invocation, reset the adapters (to reset the cache)

7940b71

fix schema cache casing bugs

d4c3fb8

stop dropping renames...

cfd2d60

Add integration tests with interesting model/case/quoting settings

eb50b83

fix casing expectations in unit tests

7d3bf03

lowercase the cache's view of schema/identifier, fix drop

e82361c

beckjake requested a review from drewbanin November 19, 2018 17:00

drewbanin approved these changes Nov 20, 2018

View reviewed changes

beckjake merged commit b5aab26 into dev/grace-kelly Nov 20, 2018

beckjake deleted the fix/run-repeatability-issues branch November 20, 2018 02:04

This was referenced Nov 20, 2018

dbt should reset adapters in handle_and_check #1138

Closed

Turn off cache logging when --log-cache-events is not set #1139

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix run repeatability/caching issues (#1138, #1139, #1140) #1144

Fix run repeatability/caching issues (#1138, #1139, #1140) #1144

beckjake commented Nov 19, 2018

drewbanin left a comment

beckjake commented Nov 20, 2018 •

edited

Loading

drewbanin commented Nov 20, 2018 •

edited

Loading

Fix run repeatability/caching issues (#1138, #1139, #1140) #1144

Fix run repeatability/caching issues (#1138, #1139, #1140) #1144

Conversation

beckjake commented Nov 19, 2018

drewbanin left a comment

Choose a reason for hiding this comment

beckjake commented Nov 20, 2018 • edited Loading

drewbanin commented Nov 20, 2018 • edited Loading

beckjake commented Nov 20, 2018 •

edited

Loading

drewbanin commented Nov 20, 2018 •

edited

Loading