Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry deployments #6801

Merged
merged 3 commits into from
Oct 1, 2024
Merged

Retry deployments #6801

merged 3 commits into from
Oct 1, 2024

Conversation

RamIdeas
Copy link
Contributor

@RamIdeas RamIdeas commented Sep 23, 2024

What this PR solves / how to test

This PR adds retries within wrangler deploy and wrangler versions upload to workaround spotty network connections and service flakes

We need to be careful about retrying the whole multi-step process because of some non-atomic steps.

Users have reported flakes at the upload step so we can safely add retries for that step to address those specific failures.

Fixes #0000

Author has addressed the following

  • Tests
    • TODO (before merge)
    • Tests included
    • Tests not necessary because:
  • E2E Tests CI Job required? (Use "e2e" label or ask maintainer to run separately)
    • I don't know
    • Required
    • Not required because: covered by other tests
  • Changeset (Changeset guidelines)
    • TODO (before merge)
    • Changeset included
    • Changeset not necessary because:
  • Public documentation
    • TODO (before merge)
    • Cloudflare docs PR(s):
    • Documentation not necessary because: transparent change

@RamIdeas RamIdeas requested a review from a team as a code owner September 23, 2024 12:22
Copy link

changeset-bot bot commented Sep 23, 2024

🦋 Changeset detected

Latest commit: d050578

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 2 packages
Name Type
wrangler Minor
@cloudflare/vitest-pool-workers Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

Copy link
Contributor

github-actions bot commented Sep 23, 2024

A wrangler prerelease is available for testing. You can install this latest build in your project with:

npm install --save-dev https://prerelease-registry.devprod.cloudflare.dev/workers-sdk/runs/11123200983/npm-package-wrangler-6801

You can reference the automatically updated head of this PR with:

npm install --save-dev https://prerelease-registry.devprod.cloudflare.dev/workers-sdk/prs/6801/npm-package-wrangler-6801

Or you can use npx with this latest build directly:

npx https://prerelease-registry.devprod.cloudflare.dev/workers-sdk/runs/11123200983/npm-package-wrangler-6801 dev path/to/script.js
Additional artifacts:
npx https://prerelease-registry.devprod.cloudflare.dev/workers-sdk/runs/11123200983/npm-package-create-cloudflare-6801 --no-auto-update
npm install https://prerelease-registry.devprod.cloudflare.dev/workers-sdk/runs/11123200983/npm-package-cloudflare-kv-asset-handler-6801
npm install https://prerelease-registry.devprod.cloudflare.dev/workers-sdk/runs/11123200983/npm-package-miniflare-6801
npm install https://prerelease-registry.devprod.cloudflare.dev/workers-sdk/runs/11123200983/npm-package-cloudflare-pages-shared-6801
npm install https://prerelease-registry.devprod.cloudflare.dev/workers-sdk/runs/11123200983/npm-package-cloudflare-vitest-pool-workers-6801
npm install https://prerelease-registry.devprod.cloudflare.dev/workers-sdk/runs/11123200983/npm-package-cloudflare-workers-editor-shared-6801
npm install https://prerelease-registry.devprod.cloudflare.dev/workers-sdk/runs/11123200983/npm-package-cloudflare-workers-shared-6801

Note that these links will no longer work once the GitHub Actions artifact expires.


[email protected] includes the following runtime dependencies:

Package Constraint Resolved
miniflare workspace:* 3.20240925.0
workerd 1.20240925.0 1.20240925.0
workerd --version 1.20240925.0 2024-09-25

Please ensure constraints are pinned, and miniflare/workerd minor versions match.

@jahands
Copy link
Contributor

jahands commented Sep 27, 2024

Thank you for this!! The flakiness in deploying made my deploy CI job so unreliable (to the point that I added a retry wrapper script around wrangler deploy.) This will help a lot <3

Copy link
Contributor

@andyjessop andyjessop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice 👍

One question I have is do we know this is going to fix the problem, Or will it just make failed runs take longer?

@jahands
Copy link
Contributor

jahands commented Oct 1, 2024

Very nice 👍

One question I have is do we know this is going to fix the problem, Or will it just make failed runs take longer?

I added up to 3 retries to my deployments months ago and have never had a flake since. Before that, it seemed like it had a 30% chance of failure due to 5XX errors.

@RamIdeas
Copy link
Contributor Author

RamIdeas commented Oct 1, 2024

Very nice 👍

One question I have is do we know this is going to fix the problem, Or will it just make failed runs take longer?

Failures due to flakey internet or backend will be fixed by this. This will improve the happy path for successful deployments. If there is a legit reason for failure, we can iterate on this solution to bail out of the retries – but I think for now we should optimise for the happy path here and take the hit of a few seconds on the sad path.

@RamIdeas RamIdeas merged commit 6009bb4 into main Oct 1, 2024
22 of 25 checks passed
@RamIdeas RamIdeas deleted the retry-deployment branch October 1, 2024 10:40
@workers-devprod workers-devprod mentioned this pull request Oct 1, 2024

export async function retryOnError<T>(
action: () => T | Promise<T>,
backoff = 2_000,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this signature means you have to pass a backoff if you want to define attempts, which isn't ideal. Can we put backoff and attempts into an options object instead, with both fields being optional?

throw err;
}

await setTimeout(backoff);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should the default be 0? what would a 2 second wait (by default) accomplish?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants