Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-2317] invocation_id is cached between invocations in same Python process #7197

Closed
Tracked by #7162
jtcohen6 opened this issue Mar 20, 2023 · 2 comments · Fixed by #7317
Closed
Tracked by #7162

[CT-2317] invocation_id is cached between invocations in same Python process #7197

jtcohen6 opened this issue Mar 20, 2023 · 2 comments · Fixed by #7317
Assignees
Labels
bug Something isn't working logging

Comments

@jtcohen6
Copy link
Contributor

Originally reported by @jeremyyeo:

With dbt >= 1.4, we seem to be maintaining invocation_id across steps in a single run.

Starting in v1.4, invocation_id is now stored on EventManager:

self.invocation_id: str = str(uuid4())

I think we should clear or reset invocation_id as part of the steps in cleanup_event_logger:

def set_invocation_id() -> None:
# This is primarily for setting the invocation_id for separate
# commands in the dbt servers. It shouldn't be necessary for the CLI.
EVENT_MANAGER.invocation_id = str(uuid.uuid4())

def cleanup_event_logger():
# Reset to a no-op manager to release streams associated with logs. This is
# especially important for tests, since pytest replaces the stdout stream
# during test runs, and closes the stream after the test is over.
EVENT_MANAGER.loggers.clear()
EVENT_MANAGER.callbacks.clear()

Reproduction case

  • Create an on-run-start hook with {{ log(invocation_id, info = true) }}
  • Create a dbt Cloud job with multiple run steps
  • See if the invocation IDs are the same or different
@jtcohen6 jtcohen6 added bug Something isn't working Team:Language logging labels Mar 20, 2023
@jtcohen6 jtcohen6 added this to the v1.4.x milestone Mar 20, 2023
@github-actions github-actions bot changed the title [v1.4] In certain cases, invocation_id is cached between invocations [CT-2317] [v1.4] In certain cases, invocation_id is cached between invocations Mar 20, 2023
@jtcohen6
Copy link
Contributor Author

jtcohen6 commented Mar 24, 2023

We've confirmed that this is happening when multiple dbt-core commands are called from within the same Python process.

@peterallenwebb's recommendation:

I do think calling set_invocation_id() between in-process dbt calls will solve the issue.

Also, to fix this issue going forward, and guarantee that programmatic invocations (dbtRunner.invoke()) have distinct values of invocation_id, we should either:

  • Add a call to set_invocation_id() within setup_event_logger()
  • Call set_invocation_id() within the preflight decorator, right before/after calling setup_event_logger()
  • (Think about whether EventManager is the right place for invocation_id() to exist long-term)

@jtcohen6 jtcohen6 removed this from the v1.4.x milestone Mar 24, 2023
@jtcohen6 jtcohen6 changed the title [CT-2317] [v1.4] In certain cases, invocation_id is cached between invocations [CT-2317] invocation_id is cached between invocations in same Python process Mar 26, 2023
@peterallenwebb
Copy link
Contributor

The approach I took was to call set_invocation_id() within preflight. The unit tests show this approach will work for programmatic invocations of dbt through the official API.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working logging
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants