Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Errno 104] Connection reset by peer – Github Actions with dbt commands fails randomly #352

Closed
oynek opened this issue May 25, 2023 · 6 comments
Labels
bug Something isn't working

Comments

@oynek
Copy link

oynek commented May 25, 2023

Describe the bug

I'm wondering if anyone else come across this error when running dbt commands for dbt in a Github Actions workflow? For about two to three days, the execution of dbt commands has been exceedingly unreliable. It always comes up with the following error message:

Error during request to server: [Errno 104] Connection reset by peer.

This may not be a problem with the adapter per se, however other users (see dbt-slack) seem to notice this problem and I wanted to give frustrated users a forum here as well, if they are also facing this challenge.

Local execution as well as execution inside a Databricks workflow of a dbt run works without problems – only the Github runner causes them.

Which commands - be it run, test or docs generate - fail and with which models is completely random in my observation. In the monitoring of the Databricks SQL warehouses I can see that the SQL queries were indeed executed successfully, but dbt on the Github runner does not notice this.

I have recreated the environments exactly the same in terms of adapter version, dbt version, Python version, dbt artifacts and executing identity. The error only occurs on Github, so I now suspect network issues between Github and Databricks.

Expected behavior

The execution of dbt commands should run successfully on the Github runner.

Screenshots and log output

08:02:40  Running with dbt=1.5.0
08:02:42  Found 358 models, 83 tests, 0 snapshots, 0 analyses, 553 macros, 0 operations, 1 seed file, 142 sources, 0 exposures, 0 metrics, 0 groups
08:02:42  
08:02:48  Concurrency: 8 threads (target='ci')
08:02:48  
08:02:48  1 of 3 START sql view model pr_122.*** ................. [RUN]
08:02:48  2 of 3 START sql view model pr_122.* ............. [RUN]
08:02:48  3 of 3 START sql view model pr_122.* ......... [RUN]
08:02:52  3 of 3 OK created sql view model pr_122.* .... [OK in 3.34s]
08:02:53  2 of 3 OK created sql view model pr_122.* ........ [OK in 4.97s]
08:07:34  1 of 3 ERROR creating sql view model pr_122.* ........ [ERROR in 285.48s]
08:07:35  
08:07:35  Finished running 3 view models in 0 hours 4 minutes and 53.18 seconds.
08:07:35  
08:07:35  Completed with 1 error and 0 warnings:
08:07:35  
08:07:35  Runtime Error in model *** (models/sources/***/***.sql)
08:07:35    Error during request to server: [Errno 104] Connection reset by peer
08:07:35  
08:07:35  Done. PASS=2 WARN=0 ERROR=1 SKIP=0 TOTAL=3
Run dbt docs generate
07:00:37  Running with dbt=1.5.0
07:00:38  Found 355 models, 83 tests, 0 snapshots, 0 analyses, 553 macros, 0 operations, 1 seed file, 139 sources, 0 exposures, 0 metrics, 0 groups
07:00:38  
07:00:40  Concurrency: 4 threads (target='ci')
07:00:40  
07:01:01  Building catalog
07:06:02  Encountered an error while generating catalog: Runtime Error
  Runtime Error
    Error during request to server: [Errno 104] Connection reset by peer
07:06:05  Encountered an error while generating catalog: Runtime Error
  Runtime Error
    Error during request to server: [Errno 104] Connection reset by peer
07:06:08  Encountered an error while generating catalog: Runtime Error
  Runtime Error
    Error during request to server: [Errno 104] Connection reset by peer
07:06:09  dbt encountered 3 failures while writing the catalog
07:06:09  Catalog written to /home/runner/work/dandi_dbt/dandi_dbt/target/catalog.json
Error: Process completed with exit code 1.

System information

The output of dbt --version:

dbt --version
Core:
  - installed: 1.5.0
  - latest:    1.5.0 - Up to date!

Plugins:
  - databricks: 1.5.2 - Up to date!
  - spark:      1.5.0 - Up to date!

The operating system you're using:
Github Runner ubuntu-latest, see Link.

The output of python --version:
Python 3.9.10

@susodapop
Copy link

Thanks for the fullsome report. Other customers have reported this through our support channels. The proximate cause appears to be in the GitHub Actions behaviour -- it's not clear at this point if a change to dbt-databricks or dbt-core could effectively work around it.

The last customer who encountered this issue worked around it successfully using a self-hosted actions runner.

@oynek
Copy link
Author

oynek commented May 31, 2023

Hi @susodapop,

I can confirm that it works with a self-hosted Github runner. We have now implemented this transitionally using this action machulav/ec2-github-runner (docu).

Can you tell me if the issue is actively communicated between Github and Databricks?

@rlh1994
Copy link

rlh1994 commented Jun 14, 2023

We're also getting this issue and looking for a fix

@andrefurlan-db
Copy link
Collaborator

We will work on databricks-sql-python improvements regarding http connections and retries that may help with this issue. From the dbt adapter there is very little we can do.

@ritchxu
Copy link

ritchxu commented Aug 2, 2023

Actions team recently rolled out an update to the runner image actions/runner-images#7860. Would you retry the problematic runs to see if the issue is mitigated?

@oynek
Copy link
Author

oynek commented Aug 3, 2023

This problem seems to have solved itself. Can be closed with it.

@oynek oynek closed this as completed Aug 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants