You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It is possible for state:modified to produce different behavior on dbt CLI vs. dbt Cloud. Why?
When we compare big (>1 MB) seed files—too big to efficiently hash contents—we instead store + compare a hash of the file path. (The operating principle: If it's a massive seed file, unless it's been renamed or moved around, we're just going to assume it's unchanged!) Today, that looks like:
Instead, we should use the relative_path, which handles the fact that, in deployment, files are regularly copied/cloned around and ultimately mounted from who-knows-where in S3.
This should be a one-line change, and it will require updating some tests.
Steps To Reproduce
Create a big seed file. Run dbt seed -s state:modified from dbt Cloud. It should always run, despite being unchanged and unmoved.
Expected behavior
Frankly, we don't recommend folks use dbt seed to load anything larger than Very Small Data, but we should still do our best to produce consistent behavior when they do.
The output of dbt --version:
v0.18.0 or v0.18.1
The text was updated successfully, but these errors were encountered:
Update: I was confusing original_file_path with some of the other paths we have, such as full_path and root_path. The former really is what I was thinking to be the "relative path." So this needs some further investigation!
But then I confused myself again by reading original_file_path in the manifest, instead of absolute_path in the code I had so neatly linked above.
Describe the bug
It is possible for
state:modified
to produce different behavior on dbt CLI vs. dbt Cloud. Why?When we compare big (>1 MB) seed files—too big to efficiently hash contents—we instead store + compare a hash of the file path. (The operating principle: If it's a massive seed file, unless it's been renamed or moved around, we're just going to assume it's unchanged!) Today, that looks like:
https://github.com/fishtown-analytics/dbt/blob/34869fc2a2a354a18a232e21315c3901aafab0b6/core/dbt/contracts/files.py#L156-L161
Instead, we should use the
relative_path
, which handles the fact that, in deployment, files are regularly copied/cloned around and ultimately mounted from who-knows-where in S3.This should be a one-line change, and it will require updating some tests.
Steps To Reproduce
Create a big seed file. Run
dbt seed -s state:modified
from dbt Cloud. It should always run, despite being unchanged and unmoved.Expected behavior
Frankly, we don't recommend folks use
dbt seed
to load anything larger than Very Small Data, but we should still do our best to produce consistent behavior when they do.The output of
dbt --version
:v0.18.0 or v0.18.1
The text was updated successfully, but these errors were encountered: