Retry deps requests made to the hub site #1451

drewbanin · 2019-05-09T20:15:05Z

Feature

Feature description

About once per month, we can see that many dbt deps invocations fail all at the same time. This happens because of some sort of intermittent error with the hub site host. While it may be worth taking action to understand and improve the uptime of the hub site, it's also a good idea to add retries to these requests.

In the registry._get method, dbt should retry any request that fails 1) without producing a response code or 2) that fails with a 5xx response code.

Since many dbt jobs run at specific wall clock times (like midnight UTC), we should randomize the timeout between retries to avoid a thundering herd scenario.

After the first failure, dbt should wait 5-10 seconds before retrying.
If the request fails again, dbt should wait 5-10 seconds again.
If that request fails, then dbt should raise the resulting exception

@beckjake @cmcarthur you guys have more experience with this class of problem than I do -- is this a reasonable solution? Would you recommend a different approach for the timeouts?

The text was updated successfully, but these errors were encountered:

beckjake · 2019-05-09T20:22:56Z

That all sounds reasonable enough to me. I assume the fail counter is per-attempt?

drewbanin · 2019-05-09T20:26:02Z

Yeah - i think that's right. dbt makes a couple of types of requests to the hub site:

get the index
get the versions for a specific package
get the contents for a specific version of a package

We should assume that any of these types of queries can fail, and the fail counter is indeed per-attempt/request.

…ries add a retry + sleep loop to registry calls (#1451)

drewbanin added the dependencies Changes to the version of dbt dependencies label May 9, 2019

drewbanin added enhancement New feature or request and removed dependencies Changes to the version of dbt dependencies labels May 9, 2019

drewbanin added this to the Wilt Chamberlain milestone May 29, 2019

beckjake mentioned this issue May 29, 2019

add a retry + sleep loop to registry calls (#1451) #1491

Merged

beckjake closed this as completed in #1491 May 30, 2019

beckjake added a commit that referenced this issue May 30, 2019

Merge pull request #1491 from fishtown-analytics/feature/hub-site-ret…

f14225f

…ries add a retry + sleep loop to registry calls (#1451)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retry deps requests made to the hub site #1451

Retry deps requests made to the hub site #1451

drewbanin commented May 9, 2019 •

edited

Loading

beckjake commented May 9, 2019

drewbanin commented May 9, 2019

Retry deps requests made to the hub site #1451

Retry deps requests made to the hub site #1451

Comments

drewbanin commented May 9, 2019 • edited Loading

Feature

Feature description

beckjake commented May 9, 2019

drewbanin commented May 9, 2019

drewbanin commented May 9, 2019 •

edited

Loading