Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs generate no longer works with dummy profile #3947

Closed
1 of 5 tasks
franciscojavierarceo opened this issue Sep 24, 2021 · 10 comments
Closed
1 of 5 tasks

docs generate no longer works with dummy profile #3947

franciscojavierarceo opened this issue Sep 24, 2021 · 10 comments
Labels
bug Something isn't working stale Issues that have gone stale

Comments

@franciscojavierarceo
Copy link

Describe the bug

Previously I was able to generate dbt docs using a dummy profile, now I am receiving an error from Snowflake suggesting a password is empty.

Steps To Reproduce

my profiles.yml file:

name: 'snowflake'
profile: 'snowflake'
fast-snowflake-db:
  target: myschema
  outputs:
    dev:
      type: SNOWFLAKE
      account: FAKEACCOUNTID
      user: FAKE_USER
      role: FAKE_ROLE 
      database: FAKE_DATABASE 
      warehouse: FAKE_WAREHOUSE
      schema: FAKE_SCHEMA
      threads: 1
      client_session_keep_alive: False
      query_tag: FAKE-TAG

Expected behavior

Successfully generate docs using:

dbt docs generate --profiles-dir ./profiles.yml

Previously running the code above used to successfully generate docs. Now I receive the error below:

Running with dbt=0.19.1
Found X models, X tests, X snapshot, X analyses, X macros, X operations, X seed files, X sources, X exposures

ERROR: Database Error
  251006: Password is empty

System information

Which database are you using dbt with?

  • postgres
  • redshift
  • bigquery
  • snowflake
  • other (specify: ____________)

The output of dbt --version:

installed version: 0.19.1
   latest version: 0.20.2

Your version of dbt is out of date! You can find instructions for upgrading here:
https://docs.getdbt.com/docs/installation

Plugins:
  - bigquery: 0.19.1
  - snowflake: 0.19.1
  - redshift: 0.19.1
  - postgres: 0.19.1

The operating system you're using:
Mac

The output of python --version:

Python 2.7.16
@franciscojavierarceo franciscojavierarceo added bug Something isn't working triage labels Sep 24, 2021
@jtcohen6 jtcohen6 removed the triage label Sep 24, 2021
@jtcohen6
Copy link
Contributor

@franciscojavierarceo In what version of dbt was this previously working? The docs generate command does run metadata queries against the database, in order to generate catalog.json, so it isn't surprising to hear that it raises an authentication error when you try to connect with phony credentials.

I'd also be curious to hear more about your use case for generating docs without a real database connection.If it's just an ability to visualize your project's DAG lineage, I think there may be workarounds—but as a primary feature of docs generate, I'm not sure this is something we need to support.

@franciscojavierarceo
Copy link
Author

@jtcohen6 currently I'm structuring a github action to generate the lineage and docs. For security reasons, we don't want to require a live connection.

@jtcohen6
Copy link
Contributor

@franciscojavierarceo Would it be possible to include docs generate as a step in your production deployments, and upload the artifacts from that deployment into external storage? Then, the job of your GitHub action can simply be to grab manifest.json, catalog.json, and index.html from that storage, and serve them at a stable URL.

That's the recommended workflow: separate steps for generating documentation and deploying documentation. The former should happen in your dbt deployment, where you're comfortable connecting to the database. The latter should be able to access artifacts from the former, in lieu of its own live database connection.

@chlirre
Copy link

chlirre commented Mar 2, 2022

I'm in the middle of setting up a "github action to generate the lineage and docs" myself.
I'm surprised to find that it requires a profile (connection).

I am currently using dbt Cloud to run (deploy) my configuration. And that doesn't allow for much control over what happens with the documentation.

I'm basically looking for a place to point people in my org where they can find the descriptions of my models without having to request (read-only) access to dbt Cloud.

@github-actions
Copy link
Contributor

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days.

@github-actions github-actions bot added the stale Issues that have gone stale label Aug 30, 2022
@github-actions
Copy link
Contributor

github-actions bot commented Sep 6, 2022

Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest; add a comment to notify the maintainers.

@github-actions github-actions bot closed this as completed Sep 6, 2022
@blackbass64
Copy link

I also don't want to connect to the actual database for security concerns. Currently, the simplest connection for generating docs I use is dbt-duckdb.

Here's a sample target in the profiles.yml file.

sample_project:
  outputs:
    docs:
      path: /tmp/dbt.duckdb
      type: duckdb
  target: docs

@a087861
Copy link

a087861 commented Dec 13, 2022

Hopefully this can be opened again or re-visited.

I work with a few teams in the company and, as a developer, I would love to see their DAGs and understand better how they are developing/deploying their environment. How do other teams generate certain metrics, how do they use those metrics, and what kinds of tests do they typically perform across their metrics? I know that some teams use pre-hooks and post-hooks to determine things like row counts before and after runs to ensure things look right. I'd love to be able to explore this in the dbt docs serve format.

I can manually parse through this info on my own but digging through dozens of sql files across multiple folders is... a bit intense. That said, I'm not as interested in the actual data and don't have access anyways.

I understand the huge benefit to having connectivity to the database when generating docs but it would be great if you could put a flag on the command that allows you to forego any benefits of connectivity. Something like dbt docs generate --no_connection so that I could create enough of the docs to view their processes but knowing that I wouldn't have access to some of the metadata that I might get if I didn't use the flag.

@manugarri
Copy link

to me it seems like a very big ask to force a valid connection in order to generate docs. In fact i wrote a related gh issue in sqlfluff, a project for linting that includes a dbt engine that forces you to have a valid connection to lint sql files!

Seems like dbt is first and most a local machine tool (as i, im an analyst that run dbt exclusively from my machine). This is fine up to certain point, production pipelines dont usually rely on an analyst's computer being turned on :D

@jannekeskitalo
Copy link

We're also struggling to generate docs as we use temporary VMs to run SQL based processing jobs and there's no database in correct state available when creating docs. It would really help to be able to create bare bones lineage documentation without DB connection. Our models have complex dependencies and without the lineage it's pretty painful to track the dependency chain.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working stale Issues that have gone stale
Projects
None yet
Development

No branches or pull requests

7 participants