Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race on removing setup file for self-update in GitHub Actions windows-latest worker #2415

Closed
rhysd opened this issue Jul 13, 2020 · 17 comments
Closed
Labels

Comments

@rhysd
Copy link
Contributor

rhysd commented Jul 13, 2020

Problem

Recently, I noticed my project's CI often fails when preparing Rust toolchain when running GitHub Actions workflow on windows-latest worker.

https://github.com/rhysd/wain/runs/865787365?check_suite_focus=true

Here is an extracted log:

Run rustup set profile minimal
  rustup set profile minimal
  rustup update stable
  rustup default stable
  rustup show
  cargo --version
  shell: C:\Program Files\PowerShell\7\pwsh.EXE -command ". '{0}'"
info: profile set to 'minimal'
info: syncing channel updates for 'stable-x86_64-pc-windows-msvc'
info: checking for self-updates
info: downloading self-update
error: could not remove 'setup' file: 'C:\Users\runneradmin\.cargo\bin/rustup-init.exe'
error: caused by: Access is denied. (os error 5)
error: could not remove 'setup' file: 'C:\Users\runneradmin\.cargo\bin/rustup-init.exe'
error: caused by: Access is denied. (os error 5)
error: could not remove 'setup' file: 'C:\Users\runneradmin\.cargo\bin/rustup-init.exe'
error: caused by: Access is denied. (os error 5)

  stable-x86_64-pc-windows-msvc unchanged - rustc 1.44.1 (c7087fe00 2020-06-17)

##[error]Process completed with exit code 1.

Steps

  1. Make a repository on GitHub
  2. Put below workflow file as .github/workflows/ci.yml and make commit
  3. Push the commit and check GitHub Actions page of your repository

ci.yml:

name: CI
on: push

jobs:
  test:
    runs-on: windows-latest
    steps:
      - uses: actions/checkout@v2
      - name: Install Rust toolchain
        run: |
          rustup set profile minimal
          rustup update stable
          rustup default stable
          rustup show

Possible Solution(s)

I'm not sure the reason of the error. But it seems a race on removing setup file. I know that retry logic is already implemented to handle this, but retry seems not a sufficient solution. I don't have solid approach to fix this.

Notes

Output of rustup --version before the self-update:

rustup 1.21.1 (7832b2ebe 2019-12-20)

Output of rustup --version after the self-update (when success):

rustup 1.22.1 (b01adbbc3 2020-07-08)

Output of rustup show:

Default host: x86_64-pc-windows-msvc
rustup home:  C:\Users\runneradmin\.rustup

stable-x86_64-pc-windows-msvc (default)
rustc 1.44.1 (c7087fe00 2020-06-17)

Frequency of the error is not 100%. Today I ran the workflow 9 times and failed 5 times due to this error.

@rhysd rhysd added the bug label Jul 13, 2020
@rbtcollins
Copy link
Contributor

This is a file locking error; it could be an anti virus error of some sort I suppose, though I don't know what github actions runs for that. I wonder if pwsh holds a file handle on the binary. Can you try with cmd or some other shell around the binary? Perhaps have a look at the action configuration we use - we don't see that issue in our configuration.

@rhysd
Copy link
Contributor Author

rhysd commented Jul 15, 2020

OK, let me check if this issue occurs or not when using cmd.exe in my project.

@tesuji
Copy link
Contributor

tesuji commented Jul 15, 2020

Add --no-self-update when updating toolchain.

@rhysd
Copy link
Contributor Author

rhysd commented Jul 15, 2020

Add --no-self-update when updating toolchain.

Yes, I've added the option and now this issue no longer occurs on my project. rhysd/wain@d6856e5

But I think this is just a workaround and it means I need to use old rustup on CI.

@tesuji
Copy link
Contributor

tesuji commented Jul 15, 2020

GitHub Actions updates its images (including rustup) quite often. (6 weeks I think).

@rhysd
Copy link
Contributor Author

rhysd commented Jul 15, 2020

But as I described in this PR, rustup version is rustup 1.21.1 (7832b2ebe 2019-12-20), which is older than 6 weeks. I guess rustup version is not always updated on worker image update.

@rhysd
Copy link
Contributor Author

rhysd commented Jul 15, 2020

I tried cmd.exe but this issue still seems to occur.

@rhysd
Copy link
Contributor Author

rhysd commented Jul 15, 2020

Anyway, I'll continue to use --no-self-update option and check rustup is kept updated. Thanks @lzutao for the suggestion.

@tesuji
Copy link
Contributor

tesuji commented Jul 15, 2020

I mean (not sure about the duration) 6 weeks cycle. The new rustup was released not longer than 6 weeks.

@rhysd
Copy link
Contributor Author

rhysd commented Jul 15, 2020

Oh, I see. Now I understood rustup did not have new release since 2019-12-20 until 9 days ago.

スクリーンショット 2020-07-15 22 10 07

Then I should check rustup is updated on Windows worker next a few months.

@Darleev
Copy link

Darleev commented Jul 27, 2020

Hello @lzutao,
In order to reproduce the issue and figure out a root cause, we need a way to downgrade rustup on windows machines. Could you please assist?
I found a very similar request with the same question: #2366
We are looking forward to your reply.

@kinnison
Copy link
Contributor

@Darleev Rustup versions are archived at https://static.rust-lang.org/rustup/archive/VERSION/PLATFORM/rustup-init[.exe]. You could try where VERSION is something like 1.21.1 and PLATFORM is x86_64-pc-windows-msvc and you include the .exe suffix. i.e. https://static.rust-lang.org/rustup/archive/1.21.1/x86_64-pc-windows-msvc/rustup-init.exe

@Darleev
Copy link

Darleev commented Jul 27, 2020

Hello,
We were able to reproduce the issue on local Windows machine and confirm that this issue is not GA specific.
Repro steps

  1. Download installer from https://static.rust-lang.org/rustup/archive/1.21.1/x86_64-pc-windows-msvc/rustup-init.exe ;
  2. Start the installer and press "Enter" button two times;
  3. Open CMD and try to run:
    rustup self update && rustup --version
    If you run rustup twice without delay like command above, you will catch the same issue.
    The following issue appears:
error: could not remove 'setup' file: 'C:\Users\runneradmin\.cargo\bin/rustup-init.exe'
error: caused by: Access is denied. (os error 5)

The issue not only about agent machines, but it is also actual for common windows machines. So, I believe it could be an issue from Rustup side. Could you please take a look?
A bit more context:
Looks like cleanup_self_updater is invoked right after Rustup start and will fail "rustup --version" because rustup-init is still blocked by self-updater

@kinnison
Copy link
Contributor

I wonder if this is a reason to switch to rename-then-eventually-clean-up which we initially thought might be overkill @rbtcollins ?

@rbtcollins
Copy link
Contributor

Hmm, these instructions seem to suggestion running rustup while rustup-init is still running, which we would expect to fail with the current code.

But lets assume they are not suggesting that.

The underlying problem is probably self_update::self_replace() which exits the parent immediately then does file manipulation that will fail if other processes are executing rustup.

The thing that we need here is to able to deal with rustup being in use when we try to operate on it, which is the more complicated thing I've sketched out before: we need a way to tell those processes to exit, we need a retry loop to deal with new process execution attempts starting up while we're busy shutting things down.

@kinnison
Copy link
Contributor

Mmm, I think I tried to prototype an approach which renamed things out of the way and then had a background "try and clean things up" which was repeated each time a rustup proxy or rustup was run, swallowing errors.

i.e. instead of trying to remove/replace rustup.exe it would rename it to to-delete-rustup-XXYSUDS.tmp and then the cleanup would look at CARGO_BIN and attempt to remove all .tmp files.

It switched out the "might fail to replace" with a "short period of time where there isn't a $foo.exe present" though, so we discarded it at the time.

@rbtcollins
Copy link
Contributor

I've written up a holistic approach that details all the bits and should be implementable by anyone with some time and interest and a modicum of windows knowledge. #2441 - this subsumes this bug and other self update issues on windows.

github-merge-queue bot pushed a commit to firezone/firezone that referenced this issue May 6, 2024
Fixes these type of errors:
https://github.com/firezone/firezone/actions/runs/8973627864/job/24644251114

Using `--no-self-update` as recommended here:
rust-lang/rustup#2415 (comment)

Probably was a regression introduced by a version bump in the Github
runner's Rustup or something
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants