[CT-2124] [Feature] `--no-connection` flag for `dbt docs generate` #6980

jaypeedevlin · 2023-02-14T23:07:06Z

Is this your first time submitting a feature request?

I have read the expectations for open source contributors
I have searched the existing issues, and I could not find an existing issue for this feature
I am requesting a straightforward extension of existing dbt functionality, rather than a Big Idea better suited to a discussion

Describe the feature

Provide an optional flag to be used with doc generation that skips the metadata queries and generates catalog.json based only off what it can get from compilation (and/or the existing manifest.json if --no-compile is also passed).

The problem I am solving for is around enabling the docs to be used locally for exploratory purposes during local development in large projects where building the catalog would otherwise take a long time — in the case of our project building the catalog without compile takes >30 minutes (11k nodes, Snowflake, 8 threads, 2XL).

This problem and solution is actually described exactly in #3947 (comment):

I work with a few teams in the company and, as a developer, I would love to see their DAGs and understand better how they are developing/deploying their environment. How do other teams generate certain metrics, how do they use those metrics, and what kinds of tests do they typically perform across their metrics? I know that some teams use pre-hooks and post-hooks to determine things like row counts before and after runs to ensure things look right. I'd love to be able to explore this in the dbt docs serve format.

I can manually parse through this info on my own but digging through dozens of sql files across multiple folders is... a bit intense. That said, I'm not as interested in the actual data and don't have access anyways.

I understand the huge benefit to having connectivity to the database when generating docs but it would be great if you could put a flag on the command that allows you to forego any benefits of connectivity. Something like dbt docs generate --no_connection so that I could create enough of the docs to view their processes but knowing that I wouldn't have access to some of the metadata that I might get if I didn't use the flag.

The intention would be that it's not recommended to use this flag for building production-quality docs, but it would unlock quick exploration in large projects.

Describe alternatives you've considered

Using dbt Cloud's in-editor DAG
Parsing the manifest manually to create an explorable DAG (maybe this could be bundled as part of a vscode extension 😉)

Who will this benefit?

Developers of large projects.

Are you interested in contributing this feature?

Provided it is within my capabilities, then yes!

Anything else?

I imagine this feature would be (ab)used by many folks who want to build their docs in CI without a DB connection. I'm not sure that this is behaviour we should actually be preventing though, as long as they understand the tradeoffs.

The text was updated successfully, but these errors were encountered:

manugarri · 2023-02-16T13:41:46Z

+1 to this feature request... there was a similar discussion as back as 2021.

jtcohen6 · 2023-02-25T10:16:48Z

@jaypeedevlin Thanks for opening! This feels totally reasonable to me. I agree with your take that:

We could be clear about the intended use case
Even if someone were to misuse this, I don't think it's something we should actively prevent, as much as discourage / make sure we clearly communicate the trade-offs

Implementations

I believe such a flag could conditionally wrap the logic here in the GenerateTask. For example:

        # this should go at the top of the file
        import agate
        
        ...
        
        # initialize empty typed objects, based on return types of get_catalog
        catalog_table: agate.Table = agate.Table([])
        exceptions: List[Exception] = []

        if not self.args.no_connection: # or whatever we call this -- see below
            adapter = get_adapter(self.config)
            with adapter.connection_named("generate_catalog"):
                fire_event(BuildingCatalog())
                catalog_table, exceptions = adapter.get_catalog(self.manifest)
        
        # otherwise, use the empty objects, and proceed

What should the flag be called?

--no-connection
--no-catalog, --empty-catalog, --skip-catalog-generation
...?

Then:

$ dbt docs generate --no-compile --no-catalog

Would produce a manifest.json without any "compiled" SQL (since that could require introspective database queries), and without any catalog entries pulled from database metadata.

Going to mark this as a good first issue!

Incidentally, I think we could also solve for this via:

[CT-1303] Respect node selection in catalog queries run by docs generate #6014

$ dbt docs generate --exclude fqn:* source:*

Just sharing as an observation — I don't think that's a reason to not also offer this flag!

In the meantime, here's a fun hack, as a treat:

# write manifest.json -- starting in v1.5, no need to pass this flag, 'parse' always writes
$ dbt parse --write-manifest

# write an empty catalog.json
$ echo '{"metadata": {}, "nodes": {}, "sources": {}, "errors": null}' > target/catalog.json

$ dbt docs serve

~~Okay, I realize the last step required an Internet connection — but otherwise it was a totally didn't!~~ You could even pull down whatever docs site you wanted to use, instead of the one distributed by default with dbt-core, if you were someone who wanted to build/use your own :)

manugarri · 2023-03-07T10:34:57Z

FYI, providing the ability to bypass connection would also be very helpful for linting with sqlfluff, right now, in order to lint and test the sql/macros before actually running them it requires an active connection (which is an antipattern in my opinion) sqlfluff/sqlfluff#4397

jaypeedevlin · 2023-03-07T22:05:20Z

@manugarri this issue is about generating artifacts for docs, not for compilation.

At the risk of derailing this issue, the reason dbt needs a connection to compile is because jinja-sql supports functions like run_query() which call to the database at compile-time. Even if there were a mechanism for compilation without a connection, sqlfluff would not be able to support projects with this (reasonably frequent) pattern.

Disclosure, I don't work for dbt Labs and this is just my personal opinion

manugarri · 2023-03-08T11:58:51Z

@jaypeedevlin thanks for the clarification, I would argue that just because we want to support run_query (which might or might not be used in individual dbt projects) we are forcing everyone to setup a connection on their ci/cd. IMHO it should be the other way around, optional connection, and if a user wants to use something like run_query then that subset of users can choose to use a connection.

jaypeedevlin · 2023-03-08T21:52:18Z

I would encourage you to open a issue (or maybe a discussion is more appropriate) around this and/or find an existing one to continue the conversation!

AndyBys · 2023-03-21T16:09:36Z

Hey there!

Unsure if issue still high in demand, but I've made a linked draft with changes discussed here. Let me know if it makes sense and I'll carry on.

culpgrant · 2023-03-21T16:38:55Z

I would be interested in this

jtcohen6 · 2023-03-21T17:41:09Z

@manugarri @jaypeedevlin Regarding the other conversation, about compiling models without an active connection, this will be newly supported in v1.5:

$ dbt compile --no-introspect

Note that this will fail if you have run_query in your code:

$ dbt compile --no-introspect
17:38:54  Running with dbt=1.5.0-b4
...
dbt.exceptions.DbtRuntimeError: Runtime Error
  Runtime Error in model my_model (models/my_model.sql)
    connection already closed

Not very graceful, but it's a first cut. What really happened is, the connection was never opened!

jaypeedevlin · 2023-03-21T22:26:29Z

@AndyBys I haven't started work on this yet so consider it yours!

manugarri · 2023-03-23T00:24:46Z

@jtcohen6 great! does that mean we will be able to generate docs without an active connection in the future?

timle2 · 2023-03-29T13:54:09Z

Big plus one to this.
Here's some more info on how it would be helpful to us. Particularly as maintainers of a huuuge dbt project.
We have a 400+ materialized models in our project. At times, our warehouse slows down (it's a shared resourced across the company, not just our team), meaning docs generation can take 10-20 minutes on really slow days (yes really). We do docs generation as part of our CICD process (and ship out the docs to a server), which means catalog generation can become a massive bottleneck.
For some idea on how I'd use this: try classic catalog generation, time out after a few minutes (warehouse congested), and fall back to this option. Right now we time out after 10 minutes and just accept that the docs will be stale. In fact, we may even do all doc generation as --no-connection just to reduce DB traffic on builds for larger projects and be ok with the loss of fidelity as a trade off for not spamming our db with 400+ connections several times a day.

dwolfeu · 2023-04-05T05:35:59Z

If this goes ahead, then dbt docs generate and dbt docs serve should no longer require a profile if --no-connection is specified, right? This is now the case for dbt deps (see this PR).

jtcohen6 · 2023-04-05T09:55:47Z

@dwolfeu A valid profile is still required for project parsing today, because users may access values of {{ target }} within their configuration logic, and within macros such as generate_schema_name that are resolved at pares time. We had some initial conversation about what would be required to support parsing without a profile/adapter loaded:

[CT-1769] [Spike] Parsing sans adapter #6549

dwolfeu · 2023-04-05T11:04:22Z

@jtcohen6 Thank you for your quick reply! From the perspective of an end user (who knows nothing of the inner workings of dbt-core): We have an image that fetches the manifest and project files from external storage and then serves the dbt docs. Having to provide a profile/connect to the database seems conceptually unnecessary and can be problematic, for instance if one wants to avoid network traffic.* Perhaps it would be possible to allow dbt docs to run without a profile and throw an error if something like {{ target }} is referenced?

*We are not the only ones: see this workaround, for instance.

jtcohen6 · 2023-04-05T11:23:27Z

@dwolfeu To clarify my message above: dbt docs generate will still require a connection profile, in order to access the static configuration supplied therein ({{ target }} values) — but if you were to pass both --no-compile and --empty-catalog (as proposed in #7202), dbt would not actually connect to the database and run any queries. You wouldn't even need an Internet connection.

dwolfeu · 2023-04-05T11:41:08Z

@jtcohen6 We are using this dummy profiles.yml as a workaround:

profile-name:
  target: docs
  outputs:
    docs:
      type: postgres
      host: foo
      database: foo
      schema: foo
      port: 00
      threads: 1
      user: foo
      password: foo

It works, but it's hacky. It would be nicer not to have to include a profile at all.

Some context (so you don't think I'm crazy!): We of course have a "real" profiles.yml in our main dbt repo, but we don't want to include it in our other repo whose sole purpose is to generate the docs coming from the main repo (DRY and all that).

ntn-rjdn · 2023-07-11T00:56:16Z

Is this --no-connection flag available in the latest version of dbt-core? I couldn't get it work. I'm using dbt-databricks.

jaypeedevlin added enhancement New feature or request triage labels Feb 14, 2023

github-actions bot changed the title ~~[Feature] --no-connection flag for dbt docs generate~~ [CT-2124] [Feature] --no-connection flag for dbt docs generate Feb 14, 2023

jtcohen6 self-assigned this Feb 15, 2023

jtcohen6 added good_first_issue Straightforward + self-contained changes, good for new contributors! Team:Execution and removed triage labels Feb 25, 2023

jtcohen6 removed their assignment Feb 25, 2023

AndyBys mentioned this issue Mar 21, 2023

Empty catalog dbt docs generate #7202

Merged

6 tasks

iknox-fa closed this as completed in #7202 Apr 20, 2023

jtcohen6 mentioned this issue Jul 13, 2023

dbt docs generate --empty-catalog to skip catalog generation dbt-labs/docs.getdbt.com#3723

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CT-2124] [Feature] `--no-connection` flag for `dbt docs generate` #6980

[CT-2124] [Feature] `--no-connection` flag for `dbt docs generate` #6980

jaypeedevlin commented Feb 14, 2023 •

edited by dbeatty10

Loading

manugarri commented Feb 16, 2023

jtcohen6 commented Feb 25, 2023 •

edited

Loading

manugarri commented Mar 7, 2023

jaypeedevlin commented Mar 7, 2023

manugarri commented Mar 8, 2023

jaypeedevlin commented Mar 8, 2023

AndyBys commented Mar 21, 2023 •

edited

Loading

culpgrant commented Mar 21, 2023

jtcohen6 commented Mar 21, 2023 •

edited

Loading

jaypeedevlin commented Mar 21, 2023

manugarri commented Mar 23, 2023

timle2 commented Mar 29, 2023 •

edited

Loading

dwolfeu commented Apr 5, 2023

jtcohen6 commented Apr 5, 2023

dwolfeu commented Apr 5, 2023 •

edited

Loading

jtcohen6 commented Apr 5, 2023

dwolfeu commented Apr 5, 2023

ntn-rjdn commented Jul 11, 2023

[CT-2124] [Feature] --no-connection flag for dbt docs generate #6980

[CT-2124] [Feature] --no-connection flag for dbt docs generate #6980

Comments

jaypeedevlin commented Feb 14, 2023 • edited by dbeatty10 Loading

Is this your first time submitting a feature request?

Describe the feature

Describe alternatives you've considered

Who will this benefit?

Are you interested in contributing this feature?

Anything else?

manugarri commented Feb 16, 2023

jtcohen6 commented Feb 25, 2023 • edited Loading

Implementations

manugarri commented Mar 7, 2023

jaypeedevlin commented Mar 7, 2023

manugarri commented Mar 8, 2023

jaypeedevlin commented Mar 8, 2023

AndyBys commented Mar 21, 2023 • edited Loading

culpgrant commented Mar 21, 2023

jtcohen6 commented Mar 21, 2023 • edited Loading

jaypeedevlin commented Mar 21, 2023

manugarri commented Mar 23, 2023

timle2 commented Mar 29, 2023 • edited Loading

dwolfeu commented Apr 5, 2023

jtcohen6 commented Apr 5, 2023

dwolfeu commented Apr 5, 2023 • edited Loading

jtcohen6 commented Apr 5, 2023

dwolfeu commented Apr 5, 2023

ntn-rjdn commented Jul 11, 2023

[CT-2124] [Feature] `--no-connection` flag for `dbt docs generate` #6980

[CT-2124] [Feature] `--no-connection` flag for `dbt docs generate` #6980

jaypeedevlin commented Feb 14, 2023 •

edited by dbeatty10

Loading

jtcohen6 commented Feb 25, 2023 •

edited

Loading

AndyBys commented Mar 21, 2023 •

edited

Loading

jtcohen6 commented Mar 21, 2023 •

edited

Loading

timle2 commented Mar 29, 2023 •

edited

Loading

dwolfeu commented Apr 5, 2023 •

edited

Loading