Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-2381] [Bug] state:modified+ selection doesn't behave as expected for dbt build when tests are modified #7289

Closed
2 tasks done
b-luu opened this issue Apr 6, 2023 · 5 comments · Fixed by #7431
Closed
2 tasks done
Labels
bug Something isn't working help_wanted Trickier changes, with a clear starting point, good for previous/experienced contributors node selection Functionality and syntax for selecting DAG nodes

Comments

@b-luu
Copy link
Contributor

b-luu commented Apr 6, 2023

Is this a new bug in dbt-core?

  • I believe this is a new bug in dbt-core
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

dbt build --select state:modified+ should behave like the other commands.

Specifically the following scenario doesn't yield the expected selection:

  • modify a single/bunch of tests (but only tests)
  • PREVIOUS_TARGET=~/previous_target
  • cp -r $DBT_PROJECT_DIR/target/* $PREVIOUS_TARGET
  • run dbt list --select state:modified+ --state $PREVIOUS_TARGET => correctly lists only the changed tests (nothing downstream)
  • run dbt build --select state:modified+ --state $PREVIOUS_TARGET => runs the changed tests but also ALL downstream nodes!!
  • while dbt run --select state:modified+ --state $PREVIOUS_TARGET does as expected: "Nothing to run"
  • and dbt test --select state:modified+ --state $PREVIOUS_TARGET as well: only runs the changed tests.

Expected Behavior

dbt build --select state:modified+ should NOT run models downstream from modified tests (if only the tests have been modified, not the tests' model(s) )

Steps To Reproduce

  • modify a single/bunch of tests (but only tests)
  • PREVIOUS_TARGET=~/previous_target
  • cp -r $DBT_PROJECT_DIR/target/* $PREVIOUS_TARGET
  • run dbt list --select state:modified+ --state $PREVIOUS_TARGET => correctly lists only the changed tests (nothing downstream)
  • run dbt build --select state:modified+ --state $PREVIOUS_TARGET => runs the changed tests but also ALL downstream nodes!!
  • while dbt run --select state:modified+ --state $PREVIOUS_TARGET does as expected: "Nothing to run"
  • and dbt test --select state:modified+ --state $PREVIOUS_TARGET as well: only runs the changed tests.

Relevant log output

No response

Environment

- OS: PopOS 22.04
- Python: 3.7
- dbt: 1.2.5

Which database adapter are you using with dbt?

snowflake

Additional Context

No response

@b-luu b-luu added bug Something isn't working triage labels Apr 6, 2023
@github-actions github-actions bot changed the title [Bug] state:modified+ selection doesn't behave as expected for dbt build [CT-2381] [Bug] state:modified+ selection doesn't behave as expected for dbt build Apr 6, 2023
@dbeatty10 dbeatty10 self-assigned this Apr 6, 2023
@dbeatty10
Copy link
Contributor

Thanks for reporting this @b-luu !

Excellent write-up -- I was able to reproduce what you described:

  • If we change test X anything downstream of it's parent node gets executed during dbt build --select state:modified+ ...
Full details of the reprex I used.

models/my_model_1.sql

select 1 as id

models/my_model_2.sql

select * from {{ ref("my_model_1") }}

models/my_model_3.sql

select * from {{ ref("my_model_2") }}

tests/my_test.sql

select * from {{ ref("my_model_2") }}
where 0=1

Prime the pump:

dbt build

Alter the test and save the state:

echo "-- $(date +%T)" >> tests/singular/my_test.sql
PREVIOUS_TARGET=~/previous_target
mkdir -p $PREVIOUS_TARGET
cp -r target/* $PREVIOUS_TARGET

Try a few different commands:

dbt list --select state:modified+ --state $PREVIOUS_TARGET
dbt run --select state:modified+ --state $PREVIOUS_TARGET
dbt test --select state:modified+ --state $PREVIOUS_TARGET
dbt build --select state:modified+ --state $PREVIOUS_TARGET

@dbeatty10 dbeatty10 removed the triage label Apr 6, 2023
@dbeatty10 dbeatty10 removed their assignment Apr 6, 2023
@b-luu
Copy link
Contributor Author

b-luu commented Apr 7, 2023

Intuitively, I'd look into the order of execution between BuildTask.get_node_selector and BuildTask.compile_manifest.

I've gotta admit I've stopped looking/lost myself in the CompilerProtocol because it lacks the add_test_edges: bool kwarg that's passed in BuildTask's .compile_manifest() method... 🤔 🤷

@jtcohen6
Copy link
Contributor

jtcohen6 commented Apr 7, 2023

In its current implementation, the build task does actually add an edge between a test on upstream model A, and model B depending on model A. Meaning that, even though model B doesn't directly depend on that test, there is an edge between them in the DAG, and so it's selected for the build task only.

I think it will be tricky to resolve this issue, given that's our current approach to achieving the build behavior. In practice, we don't see this issue come up so frequently. I agree it's confusing when it does happen, though, and "leaks" the underlying implementation.

@jtcohen6 jtcohen6 changed the title [CT-2381] [Bug] state:modified+ selection doesn't behave as expected for dbt build [CT-2381] [Bug] state:modified+ selection doesn't behave as expected for dbt build when tests are modified Apr 7, 2023
@b-luu
Copy link
Contributor Author

b-luu commented Apr 17, 2023

Thanks @jtcohen6 for the confirmation that it is indeed the expected behavior.

True the solution would be tricky, but here it's preventing me from using dbt build and forcing me to fallback to executing run, test, snapshot, and seed which isn't optimal because each type of task only runs after all the previous type have succeeded.

Anyway, as a quicker solution, do you think we could add a note on this caveat in the documentation? Or would it add more confusion than it would solve?

Actually, looking more deeply into it, I think the definitive solution isn't too complicated:

  • because the edges added between parent models' tests and children models are only really useful for the resolution of execution order during the actual execution (i.e not at compile time)
  • we could just add an attribute to these edges in the add_test_edges method the Compiler
  • and then filter out on this attribute in dbt.graph.Graph.descendants()

I could give it a go if you want?

@jtcohen6
Copy link
Contributor

@b-luu You're more than welcome to give a go! Conceptually, that approach makes sense to me. It would be worth a quick check to see if the added attribute and filtering step results in a meaningful slowdown at scale

@jtcohen6 jtcohen6 added help_wanted Trickier changes, with a clear starting point, good for previous/experienced contributors node selection Functionality and syntax for selecting DAG nodes Team:Execution labels Apr 19, 2023
@iknox-fa iknox-fa added this to the v1.4.x milestone Apr 27, 2023
@jtcohen6 jtcohen6 removed this from the v1.4.x milestone Apr 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help_wanted Trickier changes, with a clear starting point, good for previous/experienced contributors node selection Functionality and syntax for selecting DAG nodes
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants