[CT-2381] [Bug] `state:modified+` selection doesn't behave as expected for `dbt build` when tests are modified #7289

b-luu · 2023-04-06T16:49:12Z

Is this a new bug in dbt-core?

I believe this is a new bug in dbt-core
I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

dbt build --select state:modified+ should behave like the other commands.

Specifically the following scenario doesn't yield the expected selection:

modify a single/bunch of tests (but only tests)
PREVIOUS_TARGET=~/previous_target
cp -r $DBT_PROJECT_DIR/target/* $PREVIOUS_TARGET
run dbt list --select state:modified+ --state $PREVIOUS_TARGET => correctly lists only the changed tests (nothing downstream)
run dbt build --select state:modified+ --state $PREVIOUS_TARGET => runs the changed tests but also ALL downstream nodes!!
while dbt run --select state:modified+ --state $PREVIOUS_TARGET does as expected: "Nothing to run"
and dbt test --select state:modified+ --state $PREVIOUS_TARGET as well: only runs the changed tests.

Expected Behavior

dbt build --select state:modified+ should NOT run models downstream from modified tests (if only the tests have been modified, not the tests' model(s) )

Steps To Reproduce

modify a single/bunch of tests (but only tests)
PREVIOUS_TARGET=~/previous_target
cp -r $DBT_PROJECT_DIR/target/* $PREVIOUS_TARGET
run dbt list --select state:modified+ --state $PREVIOUS_TARGET => correctly lists only the changed tests (nothing downstream)
run dbt build --select state:modified+ --state $PREVIOUS_TARGET => runs the changed tests but also ALL downstream nodes!!
while dbt run --select state:modified+ --state $PREVIOUS_TARGET does as expected: "Nothing to run"
and dbt test --select state:modified+ --state $PREVIOUS_TARGET as well: only runs the changed tests.

Relevant log output

No response

Environment

- OS: PopOS 22.04
- Python: 3.7
- dbt: 1.2.5

Which database adapter are you using with dbt?

snowflake

Additional Context

No response

The text was updated successfully, but these errors were encountered:

dbeatty10 · 2023-04-06T20:53:04Z

Thanks for reporting this @b-luu !

Excellent write-up -- I was able to reproduce what you described:

If we change test X anything downstream of it's parent node gets executed during dbt build --select state:modified+ ...

Full details of the reprex I used.

models/my_model_1.sql

select 1 as id

models/my_model_2.sql

select * from {{ ref("my_model_1") }}

models/my_model_3.sql

select * from {{ ref("my_model_2") }}

tests/my_test.sql

select * from {{ ref("my_model_2") }}
where 0=1

Prime the pump:

dbt build

Alter the test and save the state:

echo "-- $(date +%T)" >> tests/singular/my_test.sql
PREVIOUS_TARGET=~/previous_target
mkdir -p $PREVIOUS_TARGET
cp -r target/* $PREVIOUS_TARGET

Try a few different commands:

dbt list --select state:modified+ --state $PREVIOUS_TARGET
dbt run --select state:modified+ --state $PREVIOUS_TARGET
dbt test --select state:modified+ --state $PREVIOUS_TARGET
dbt build --select state:modified+ --state $PREVIOUS_TARGET

b-luu · 2023-04-07T07:20:16Z

Intuitively, I'd look into the order of execution between BuildTask.get_node_selector and BuildTask.compile_manifest.

I've gotta admit I've stopped looking/lost myself in the CompilerProtocol because it lacks the add_test_edges: bool kwarg that's passed in BuildTask's .compile_manifest() method... 🤔 🤷

jtcohen6 · 2023-04-07T07:27:46Z

In its current implementation, the build task does actually add an edge between a test on upstream model A, and model B depending on model A. Meaning that, even though model B doesn't directly depend on that test, there is an edge between them in the DAG, and so it's selected for the build task only.

I think it will be tricky to resolve this issue, given that's our current approach to achieving the build behavior. In practice, we don't see this issue come up so frequently. I agree it's confusing when it does happen, though, and "leaks" the underlying implementation.

b-luu · 2023-04-17T16:37:09Z

Thanks @jtcohen6 for the confirmation that it is indeed the expected behavior.

True the solution would be tricky, but here it's preventing me from using dbt build and forcing me to fallback to executing run, test, snapshot, and seed which isn't optimal because each type of task only runs after all the previous type have succeeded.

Anyway, as a quicker solution, do you think we could add a note on this caveat in the documentation? Or would it add more confusion than it would solve?

Actually, looking more deeply into it, I think the definitive solution isn't too complicated:

because the edges added between parent models' tests and children models are only really useful for the resolution of execution order during the actual execution (i.e not at compile time)
we could just add an attribute to these edges in the add_test_edges method the Compiler
and then filter out on this attribute in dbt.graph.Graph.descendants()

I could give it a go if you want?

jtcohen6 · 2023-04-17T18:56:50Z

@b-luu You're more than welcome to give a go! Conceptually, that approach makes sense to me. It would be worth a quick check to see if the added attribute and filtering step results in a meaningful slowdown at scale

b-luu added bug Something isn't working triage labels Apr 6, 2023

github-actions bot changed the title ~~[Bug] state:modified+ selection doesn't behave as expected for dbt build~~ [CT-2381] [Bug] state:modified+ selection doesn't behave as expected for dbt build Apr 6, 2023

dbeatty10 self-assigned this Apr 6, 2023

dbeatty10 removed the triage label Apr 6, 2023

dbeatty10 removed their assignment Apr 6, 2023

jtcohen6 changed the title ~~[CT-2381] [Bug] state:modified+ selection doesn't behave as expected for dbt build~~ [CT-2381] [Bug] state:modified+ selection doesn't behave as expected for dbt build when tests are modified Apr 7, 2023

jtcohen6 added help_wanted Trickier changes, with a clear starting point, good for previous/experienced contributors node selection Functionality and syntax for selecting DAG nodes Team:Execution labels Apr 19, 2023

b-luu mentioned this issue Apr 21, 2023

Ignore parent tests added edges for build selection #7431

Merged

6 tasks

iknox-fa added this to the v1.4.x milestone Apr 27, 2023

iknox-fa closed this as completed in #7431 Apr 28, 2023

jtcohen6 removed this from the v1.4.x milestone Apr 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CT-2381] [Bug] `state:modified+` selection doesn't behave as expected for `dbt build` when tests are modified #7289

[CT-2381] [Bug] `state:modified+` selection doesn't behave as expected for `dbt build` when tests are modified #7289

b-luu commented Apr 6, 2023

dbeatty10 commented Apr 6, 2023

b-luu commented Apr 7, 2023

jtcohen6 commented Apr 7, 2023

b-luu commented Apr 17, 2023

jtcohen6 commented Apr 17, 2023

[CT-2381] [Bug] state:modified+ selection doesn't behave as expected for dbt build when tests are modified #7289

[CT-2381] [Bug] state:modified+ selection doesn't behave as expected for dbt build when tests are modified #7289

Comments

b-luu commented Apr 6, 2023

Is this a new bug in dbt-core?

Current Behavior

Expected Behavior

Steps To Reproduce

Relevant log output

Environment

Which database adapter are you using with dbt?

Additional Context

dbeatty10 commented Apr 6, 2023

b-luu commented Apr 7, 2023

jtcohen6 commented Apr 7, 2023

b-luu commented Apr 17, 2023

jtcohen6 commented Apr 17, 2023

[CT-2381] [Bug] `state:modified+` selection doesn't behave as expected for `dbt build` when tests are modified #7289

[CT-2381] [Bug] `state:modified+` selection doesn't behave as expected for `dbt build` when tests are modified #7289