Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use relative_path to determine whether big seeds are modified #2927

Closed
jtcohen6 opened this issue Dec 2, 2020 · 2 comments · Fixed by #2939
Closed

Use relative_path to determine whether big seeds are modified #2927

jtcohen6 opened this issue Dec 2, 2020 · 2 comments · Fixed by #2939
Labels
bug Something isn't working state Stateful selection (state:modified, defer)

Comments

@jtcohen6
Copy link
Contributor

jtcohen6 commented Dec 2, 2020

Describe the bug

It is possible for state:modified to produce different behavior on dbt CLI vs. dbt Cloud. Why?

When we compare big (>1 MB) seed files—too big to efficiently hash contents—we instead store + compare a hash of the file path. (The operating principle: If it's a massive seed file, unless it's been renamed or moved around, we're just going to assume it's unchanged!) Today, that looks like:

https://github.com/fishtown-analytics/dbt/blob/34869fc2a2a354a18a232e21315c3901aafab0b6/core/dbt/contracts/files.py#L156-L161

Instead, we should use the relative_path, which handles the fact that, in deployment, files are regularly copied/cloned around and ultimately mounted from who-knows-where in S3.

This should be a one-line change, and it will require updating some tests.

Steps To Reproduce

Create a big seed file. Run dbt seed -s state:modified from dbt Cloud. It should always run, despite being unchanged and unmoved.

Expected behavior

Frankly, we don't recommend folks use dbt seed to load anything larger than Very Small Data, but we should still do our best to produce consistent behavior when they do.

The output of dbt --version:
v0.18.0 or v0.18.1

@jtcohen6 jtcohen6 added bug Something isn't working state Stateful selection (state:modified, defer) labels Dec 2, 2020
@jtcohen6 jtcohen6 added this to the Kiyoshi Kuromiya milestone Dec 2, 2020
@jtcohen6
Copy link
Contributor Author

jtcohen6 commented Dec 3, 2020

Update: I was confusing original_file_path with some of the other paths we have, such as full_path and root_path. The former really is what I was thinking to be the "relative path." So this needs some further investigation!

But then I confused myself again by reading original_file_path in the manifest, instead of absolute_path in the code I had so neatly linked above.

@jtcohen6 jtcohen6 closed this as completed Dec 3, 2020
@jtcohen6
Copy link
Contributor Author

jtcohen6 commented Dec 4, 2020

Update: I was right all along, except for when I was wrong about being wrong.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working state Stateful selection (state:modified, defer)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant