Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor: run DBT from devcontainer #3515

Merged
merged 5 commits into from
Nov 20, 2024
Merged

Refactor: run DBT from devcontainer #3515

merged 5 commits into from
Nov 20, 2024

Conversation

thekaveman
Copy link
Member

@thekaveman thekaveman commented Oct 25, 2024

Description

After working on #3468 and trying to get things running on a new local environment, I thought it would be nice to encode the dependencies and tools into the devcontainer configuration. This makes it much easier to get DBT running from a fresh environment, and is system/platform agnostic.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation

How has this been tested?

See the README updates for steps to test/run this locally.

Post-merge follow-ups

Document any actions that must be taken post-merge to deploy or otherwise implement the changes in this PR (for example, running a full refresh of some incremental model in dbt). If these actions will take more than a few hours after the merge or if they will be completed by someone other than the PR author, please create a dedicated follow-up issue and link it here to track resolution.

  • No action required
  • Actions required (specified below)

@thekaveman thekaveman force-pushed the refactor/devcontainer branch from 404f963 to 5fb0819 Compare October 28, 2024 04:03
@thekaveman thekaveman self-assigned this Oct 28, 2024
@thekaveman thekaveman marked this pull request as ready for review October 28, 2024 17:21
@angela-tran
Copy link
Member

Starting a review of this. Gonna try opening up the devcontainer!

@angela-tran
Copy link
Member

angela-tran commented Oct 31, 2024

The devcontainer was created and opened for me 🎉 I think I'm now at the point of setting up my dbt profile (probably the command in postAttach.sh at line 13), and it seems to be hanging at a part where a browser is supposed to open for me to login:

Snippets of my output:
18:28:24  Running with dbt=1.5.1
18:28:24  Setting up your profile.
schema (usually your name; will be added as a prefix to schemas e.g. <schema>_mart_gtfs): angela
maximum_bytes_billed (the maximum number of bytes allowed per BigQuery query; default is 2 TB) [2000000000000]:
18:34:10  Profile calitp_warehouse written to /home/calitp/.dbt/profiles.yml using project's profile_template.yml and your supplied value
s. Run 'dbt debug' to validate the connection.
18:34:15  Running with dbt=1.5.1
18:34:15  dbt version: 1.5.1
18:34:15  python version: 3.9.20

...

18:34:18  1 check failed:
18:34:18  dbt was unable to connect to the specified database.
The database returned the following error:

  >Database Error
  Runtime Error

    dbt encountered an error while trying to read your profiles.yml file.

    Your default credentials were not found. To set up Application Default Credentials, see https://cloud.google.com/docs/authentication/
external/set-up-adc for more information.


Check your database credentials and try again. For more information, visit:
https://docs.getdbt.com/docs/configure-your-profile


Welcome! This command will take you through the configuration of gcloud.

Your current configuration has been set to: [default]

You can skip diagnostics next time by using the following flag:
  gcloud init --skip-diagnostics

Network diagnostic detects and fixes local network connection issues.
Checking network connection...done.
Reachability Check passed.
Network diagnostic passed (1/1 checks passed).

You must sign in to continue. Would you like to sign in (Y/n)?  Y

Your browser has been opened to visit:
...

Nothing happens here. 🤔

@angela-tran
Copy link
Member

angela-tran commented Oct 31, 2024

I tried copying the URL from the output and opening that in a browser manually, and that did show me a Google consent screen for Google Cloud SDK, but then when I finished going through the consent screens, it redirected me to a localhost:8085 URL. I guess this is not too surprising when you look at the parameters of the URL I pasted into my browser.

Screenshot of the localhost redirect

image

My devcontainer terminal is still waiting for me to log in. 🤔

Screenshot of devcontainer terminal

image

@thekaveman
Copy link
Member Author

That second screenshot with the links in the console, can you click from there? That's what I had to do, and it worked after I logged in to my Google account on the browser.

@angela-tran
Copy link
Member

That second screenshot with the links in the console, can you click from there? That's what I had to do, and it worked after I logged in to my Google account on the browser.

It doesn't let me click the full link, unfortunately. For some reason, it's getting cut off maybe by a line break or something. If it'd be helpful for me to get on a screenshare with you, let me know

@lalver1
Copy link
Member

lalver1 commented Nov 1, 2024

The devcontainer was created and it opened for me too 🙂. I did notice an error when postAttach.sh ran, but it didn't prevent the rest of the script from running 🤔:

[1722 ms] Start: Run in container: /bin/bash .devcontainer/postAttach.sh
pre-commit installed at .git/hooks/pre-commit
[INFO] Installing environment for https://github.com/pre-commit/pre-commit-hooks.
[INFO] Once installed this environment will be reused.
[INFO] This may take a few minutes...
[INFO] Installing environment for https://github.com/pycqa/flake8.
[INFO] Once installed this environment will be reused.
[INFO] This may take a few minutes...
[INFO] Installing environment for https://github.com/psf/black.
[INFO] Once installed this environment will be reused.
[INFO] This may take a few minutes...
[INFO] Installing environment for https://github.com/pycqa/isort.
[INFO] Once installed this environment will be reused.
[INFO] This may take a few minutes...
[INFO] Installing environment for https://github.com/pycqa/bandit.
[INFO] Once installed this environment will be reused.
[INFO] This may take a few minutes...
An unexpected error has occurred: CalledProcessError: command: ('/home/calitp/.cache/pre-commit/repoz_qq3lxw/py_env-python3/bin/python', '-mpip', 'install', '.')
return code: 1
stdout:
    Processing /home/calitp/.cache/pre-commit/repoz_qq3lxw
      Preparing metadata (setup.py): started
      Preparing metadata (setup.py): finished with status 'done'
    Collecting GitPython>=1.0.1 (from bandit==0.0.0)
      Downloading GitPython-3.1.43-py3-none-any.whl.metadata (13 kB)
    Collecting PyYAML>=5.3.1 (from bandit==0.0.0)
      Using cached PyYAML-6.0.2-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.metadata (2.1 kB)
    Collecting stevedore>=1.20.0 (from bandit==0.0.0)
      Downloading stevedore-5.3.0-py3-none-any.whl.metadata (2.3 kB)
    Collecting rich (from bandit==0.0.0)
      Using cached rich-13.9.3-py3-none-any.whl.metadata (18 kB)
    Collecting gitdb<5,>=4.0.1 (from GitPython>=1.0.1->bandit==0.0.0)
      Downloading gitdb-4.0.11-py3-none-any.whl.metadata (1.2 kB)
    Collecting pbr>=2.0.0 (from stevedore>=1.20.0->bandit==0.0.0)
      Using cached pbr-6.1.0-py2.py3-none-any.whl.metadata (3.4 kB)
    Collecting markdown-it-py>=2.2.0 (from rich->bandit==0.0.0)
      Using cached markdown_it_py-3.0.0-py3-none-any.whl.metadata (6.9 kB)
    Collecting pygments<3.0.0,>=2.13.0 (from rich->bandit==0.0.0)
      Using cached pygments-2.18.0-py3-none-any.whl.metadata (2.5 kB)
    Collecting typing-extensions<5.0,>=4.0.0 (from rich->bandit==0.0.0)
      Using cached typing_extensions-4.12.2-py3-none-any.whl.metadata (3.0 kB)
    Collecting smmap<6,>=3.0.1 (from gitdb<5,>=4.0.1->GitPython>=1.0.1->bandit==0.0.0)
      Downloading smmap-5.0.1-py3-none-any.whl.metadata (4.3 kB)
    Collecting mdurl~=0.1 (from markdown-it-py>=2.2.0->rich->bandit==0.0.0)
      Using cached mdurl-0.1.2-py3-none-any.whl.metadata (1.6 kB)
    Downloading GitPython-3.1.43-py3-none-any.whl (207 kB)
    Using cached PyYAML-6.0.2-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (720 kB)
    Downloading stevedore-5.3.0-py3-none-any.whl (49 kB)
    Using cached rich-13.9.3-py3-none-any.whl (242 kB)
    Downloading gitdb-4.0.11-py3-none-any.whl (62 kB)
    Using cached markdown_it_py-3.0.0-py3-none-any.whl (87 kB)
    Using cached pbr-6.1.0-py2.py3-none-any.whl (108 kB)
    Using cached pygments-2.18.0-py3-none-any.whl (1.2 MB)
    Using cached typing_extensions-4.12.2-py3-none-any.whl (37 kB)
    Using cached mdurl-0.1.2-py3-none-any.whl (10.0 kB)
    Downloading smmap-5.0.1-py3-none-any.whl (24 kB)
    Building wheels for collected packages: bandit
      Building wheel for bandit (setup.py): started
      Building wheel for bandit (setup.py): finished with status 'done'
      Created wheel for bandit: filename=bandit-0.0.0-py3-none-any.whl size=119913 sha256=4507954499c0c4f40fbbd8c35cbfa4a920ad4494375a71ba501d3f0649f2fe1a
      Stored in directory: /tmp/pip-ephem-wheel-cache-8gp8q082/wheels/6a/4d/55/ae8cdabeab19e9f4ffc7f4dab60e605c1b4201fce4f3a9d5b4
    Successfully built bandit
    Installing collected packages: typing-extensions, smmap, PyYAML, pygments, pbr, mdurl, stevedore, markdown-it-py, gitdb, rich, GitPython, bandit
stderr:
    ERROR: Could not install packages due to an OSError: [Errno 28] No space left on device: '/home/calitp/.cache/pre-commit/repoz_qq3lxw/py_env-python3/lib/python3.9/site-packages/pbr'
Check the log at /home/calitp/.cache/pre-commit/pre-commit.log
14:36:10  Running with dbt=1.5.1
14:36:10  Setting up your profile.

Then I was able to create my profile using my name as schema, use the default maximum_bytes_billed, and sign in with my @compiler.la email after going through the Google Cloud SDK consent screens. I then was able to select Pick cloud project to use.

@thekaveman thekaveman force-pushed the refactor/devcontainer branch from 5fb0819 to da3a916 Compare November 4, 2024 18:46
@thekaveman
Copy link
Member Author

@lalver1 I see in your output:

ERROR: Could not install packages due to an OSError: [Errno 28] No space left on device

This seems strange? Did your Docker VM run out of virtual hard disk?

@lalver1
Copy link
Member

lalver1 commented Nov 4, 2024

Looks like the problem was with my Docker Engine @thekaveman. I cleaned up several dangling images I had and rebuilt the container and postAttach.sh ran fine with no errors 👍

Copy link
Member

@angela-tran angela-tran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved from Windows 10 to Linux (for reasons unrelated to this PR). I no longer experience the issue with authentication and am able to get through the dbt profile set up now 🎉

Ran docker compose run dbt debug and got successful output:

image

- remove node install
- add google-cloud-cli install
- use non-root user
- extra mounted volumes
- env vars

modernize devcontainer.json syntax

use postAttach to ensure dbt and gcloud are configured
@thekaveman thekaveman force-pushed the refactor/devcontainer branch from da3a916 to 6263e2b Compare November 13, 2024 00:26
@evansiroky
Copy link
Member

Interesting. I'll defer to @vevetron and/or @tiffanychu90 for review.

Copy link
Contributor

@vevetron vevetron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ran it all on my mac and worked like a dream.

@vevetron
Copy link
Contributor

There was one moment towards the end of the install where it seems like it froze:

Credentials saved to file: [/home/calitp/.config/gcloud/application_default_credentials.json]

These credentials will be used by any library that requests Application Default Credentials (ADC).

Quota project "cal-itp-data-infra-staging" was added to ADC which can be used by Google client libraries for billing and quota. Note that some services may still bill the project owning the resource.

I'm still learning to use it in VS code but lines like
docker compose run dbt run -s +fct_service_alerts_messages_unnested+ from the .devcontainer folder work great.

@thekaveman
Copy link
Member Author

@vevetron

There was one moment towards the end of the install where it seems like it froze

Did it ever come back? Or was it just hung here?

Was this on your very first time launching the devcontainer in VS Code? What about on subsequent launches?

@vevetron
Copy link
Contributor

It didn't come back after I restarted. When I launched it again it jumped to the post-hook and then hit a terminal inside the container.

@thekaveman
Copy link
Member Author

Hmmm, do you see the "success" output from the postattach hook? Something like:

image

@vevetron
Copy link
Contributor

Hmmm, do you see the "success" output from the postattach hook? Something like:

image

Yes I do see the "success" output.

@thekaveman
Copy link
Member Author

Awesome! Then I'll merge this if that's OK with you?

@vevetron
Copy link
Contributor

Yes please merge it in!

@thekaveman thekaveman merged commit 12e3b01 into main Nov 20, 2024
4 checks passed
@thekaveman thekaveman deleted the refactor/devcontainer branch November 20, 2024 00:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

5 participants