Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rustup gets confused when it can't create a temp directory #2546

Open
jyn514 opened this issue Nov 1, 2020 · 13 comments
Open

Rustup gets confused when it can't create a temp directory #2546

jyn514 opened this issue Nov 1, 2020 · 13 comments
Labels

Comments

@jyn514
Copy link
Member

jyn514 commented Nov 1, 2020

Problem

$ rustup component remove rust-docs
info: removing component 'rust-docs'
info: rolling back changes
error: could not create temp directory: /acct/jynelson/.local/lib/rustup/tmp/ibqoveyhjujffe52_dir
# I ran cargo clean so I had enough disk space
$ rustup component remove rust-docs
info: removing component 'rust-docs'
warning: during uninstall component rust-docs was not found
$ ls ~/.local/lib/rustup/toolchains/stable-x86_64-unknown-linux-gnu/share/
doc  man  zsh
$ rustup toolchain install stable
info: syncing channel updates for 'stable-x86_64-unknown-linux-gnu'

  stable-x86_64-unknown-linux-gnu unchanged - rustc 1.47.0 (18bf6b4f0 2020-10-07)

error: rustup is not installed at '/acct/jynelson/.local/lib/cargo'

Notes

Output of rustup --version: rustup 1.20.2 (13979c9 2019-10-16)
Output of rustup show:

Default host: x86_64-unknown-linux-gnu
rustup home:  /acct/jynelson/.local/lib/rustup

I'd test with a newer version, but apparently rustup self update is also confused:

$ rustup self update
error: rustup is not installed at '/acct/jynelson/.local/lib/cargo'
@jyn514 jyn514 added the bug label Nov 1, 2020
@rbtcollins
Copy link
Contributor

rustup shouldn't be installed at .local/lib/cargo anyway - have you set some environment variables perhaps? what does 'rustup show' output?

@rbtcollins
Copy link
Contributor

This might be a case of #2417 but I doubt it, I think something different is going on here.

@kinnison
Copy link
Contributor

kinnison commented Nov 2, 2020

I have no idea what we do when we get ENOSPC trying to make a tempdir, nor how we ended up with a broken set of metadata. How to attempt to reproduce this will be "interesting"

@jyn514
Copy link
Member Author

jyn514 commented Nov 2, 2020

@rbtcollins here's a smaller reproduction, with only CARGO_HOME set but not RUSTUP_HOME.

$ curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
$ cat /dev/urandom > oops-too-big
cat: write error: Input/output error
cat: write error: Disk quota exceeded
$ rustup component remove rust-docs
info: removing component 'rust-docs'
info: rolling back changes
error: could not create temp directory: /acct/jynelson/.rustup/tmp/mr5scujdx35d5xjb_dir
$ rustup component remove rust-docs
info: removing component 'rust-docs'
warning: during uninstall component rust-docs was not found

Strangely, rustup self update is no longer broken, and I can now uninstall the whole toolchain:

$ rustup self update
info: checking for self-updates
  rustup unchanged - 1.22.1
$ rustup toolchain uninstall stable
info: uninstalling toolchain 'stable-x86_64-unknown-linux-gnu'
info: toolchain 'stable-x86_64-unknown-linux-gnu' uninstalled

So I consider this 'works for me'. But it would still be nice to fix the inconsistent toolchain state.

@jyn514
Copy link
Member Author

jyn514 commented Nov 2, 2020

@kinnison here's that strace:

write(2, "\33[1m", 4)                   = 4
write(2, "info: ", 6info: )                   = 6
ioctl(2, TCGETS, {B38400 opost isig icanon echo ...}) = 0
write(2, "\33(B\33[m", 6)               = 6
write(2, "removing component 'rust-docs'", 30removing component 'rust-docs') = 30
write(2, "\n", 1
)                       = 1
statx(AT_FDCWD, "/acct/jynelson/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/components", AT_STATX_SYNC_AS_STAT, STATX_ALL, {stx_mask=STATX_BASIC_STATS, stx_attributes=0, stx_mode=S_IFREG|0644, stx_size=212, ...}) = 0
openat(AT_FDCWD, "/acct/jynelson/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/components", O_RDONLY|O_CLOEXEC) = 3
statx(3, "", AT_STATX_SYNC_AS_STAT|AT_EMPTY_PATH, STATX_ALL, {stx_mask=STATX_BASIC_STATS, stx_attributes=0, stx_mode=S_IFREG|0644, stx_size=212, ...}) = 0
read(3, "cargo-x86_64-unknown-linux-gnu\nc"..., 213) = 212
read(3, "", 1)                          = 0
close(3)                                = 0
statx(AT_FDCWD, "/acct/jynelson/.rustup/tmp", AT_STATX_SYNC_AS_STAT, STATX_ALL, {stx_mask=STATX_BASIC_STATS, stx_attributes=0, stx_mode=S_IFDIR|0755, stx_size=4096, ...}) = 0
getrandom(NULL, 0, GRND_NONBLOCK)       = 0
getrandom("\x7f\x61\xbf\x93\x73\x6d\xe7\x31\x49\x53\xb0\xbb\xee\xfc\x51\x0c\x6d\x02\xa8\x25\x9e\xc6\x48\xba\xb3\xff\x50\xce\x7c\x35\x7b\x76", 32, 0) = 32
statx(AT_FDCWD, "/acct/jynelson/.rustup/tmp/3buply84_zc_1f8f_file", AT_STATX_SYNC_AS_STAT, STATX_ALL, 0x7ffe97eb8710) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/acct/jynelson/.rustup/tmp/3buply84_zc_1f8f_file", O_WRONLY|O_CREAT|O_TRUNC|O_CLOEXEC, 0666) = 3
close(3)                                = 0
openat(AT_FDCWD, "/acct/jynelson/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/components", O_RDONLY|O_CLOEXEC) = 3
openat(AT_FDCWD, "/acct/jynelson/.rustup/tmp/3buply84_zc_1f8f_file", O_WRONLY|O_CREAT|O_TRUNC|O_CLOEXEC, 0666) = 4
read(3, "cargo-x86_64-unknown-linux-gnu\nc"..., 8192) = 212
read(3, "", 8192)                       = 0
write(4, "cargo-x86_64-unknown-linux-gnu\nc"..., 177) = 177
close(4)                                = -1 EDQUOT (Disk quota exceeded)
close(3)                                = 0
statx(AT_FDCWD, "/acct/jynelson/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/components", AT_STATX_SYNC_AS_STAT, STATX_ALL, {stx_mask=STATX_BASIC_STATS, stx_attributes=0, stx_mode=S_IFREG|0644, stx_size=212, ...}) = 0
statx(AT_FDCWD, "/acct/jynelson/.rustup/tmp", AT_STATX_SYNC_AS_STAT, STATX_ALL, {stx_mask=STATX_BASIC_STATS, stx_attributes=0, stx_mode=S_IFDIR|0755, stx_size=4096, ...}) = 0
statx(AT_FDCWD, "/acct/jynelson/.rustup/tmp/lq90se25u5caz4hl_file", AT_STATX_SYNC_AS_STAT, STATX_ALL, 0x7ffe97eb8110) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/acct/jynelson/.rustup/tmp/lq90se25u5caz4hl_file", O_WRONLY|O_CREAT|O_TRUNC|O_CLOEXEC, 0666) = 3
close(3)                                = 0
statx(AT_FDCWD, "/acct/jynelson/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/components", AT_STATX_SYNC_AS_STAT|AT_SYMLINK_NOFOLLOW, STATX_ALL, {stx_mask=STATX_BASIC_STATS, stx_attributes=0, stx_mode=S_IFREG|0644, stx_size=212, ...}) = 0
openat(AT_FDCWD, "/acct/jynelson/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/components", O_RDONLY|O_CLOEXEC) = 3
statx(3, "", AT_STATX_SYNC_AS_STAT|AT_EMPTY_PATH, STATX_ALL, {stx_mask=STATX_BASIC_STATS, stx_attributes=0, stx_mode=S_IFREG|0644, stx_size=212, ...}) = 0
openat(AT_FDCWD, "/acct/jynelson/.rustup/tmp/lq90se25u5caz4hl_file", O_WRONLY|O_CREAT|O_TRUNC|O_CLOEXEC, 0100644) = 4
statx(4, "", AT_STATX_SYNC_AS_STAT|AT_EMPTY_PATH, STATX_ALL, {stx_mask=STATX_BASIC_STATS, stx_attributes=0, stx_mode=S_IFREG|0644, stx_size=0, ...}) = 0
fchmod(4, 0100644)                      = 0
copy_file_range(3, NULL, 4, NULL, 212, 0) = 212
close(4)                                = -1 EDQUOT (Disk quota exceeded)
close(3)                                = 0
rename("/acct/jynelson/.rustup/tmp/3buply84_zc_1f8f_file", "/acct/jynelson/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/components") = 0
openat(AT_FDCWD, "/acct/jynelson/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/manifest-rust-docs-x86_64-unknown-linux-gnu", O_RDONLY|O_CLOEXEC) = 3
statx(3, "", AT_STATX_SYNC_AS_STAT|AT_EMPTY_PATH, STATX_ALL, {stx_mask=STATX_BASIC_STATS, stx_attributes=0, stx_mode=S_IFREG|0644, stx_size=24, ...}) = 0
read(3, "dir:share/doc/rust/html\n", 50) = 24
read(3, "", 26)                         = 0
close(3)                                = 0
statx(AT_FDCWD, "/acct/jynelson/.rustup/tmp", AT_STATX_SYNC_AS_STAT, STATX_ALL, {stx_mask=STATX_BASIC_STATS, stx_attributes=0, stx_mode=S_IFDIR|0755, stx_size=4096, ...}) = 0
statx(AT_FDCWD, "/acct/jynelson/.rustup/tmp/tx3h0o7z7p7_09_y_dir", AT_STATX_SYNC_AS_STAT, STATX_ALL, 0x7ffe97eb8160) = -1 ENOENT (No such file or directory)
mkdir("/acct/jynelson/.rustup/tmp/tx3h0o7z7p7_09_y_dir", 0777) = -1 EDQUOT (Disk quota exceeded)
statx(AT_FDCWD, "/acct/jynelson/.rustup/tmp/3buply84_zc_1f8f_file", AT_STATX_SYNC_AS_STAT, STATX_ALL, 0x7ffe97eb87f0) = -1 ENOENT (No such file or directory)
ioctl(2, TCGETS, {B38400 opost isig icanon echo ...}) = 0
write(2, "\33[1m", 4)                   = 4
write(2, "info: ", 6info: )                   = 6
ioctl(2, TCGETS, {B38400 opost isig icanon echo ...}) = 0
write(2, "\33(B\33[m", 6)               = 6
write(2, "rolling back changes", 20rolling back changes)    = 20
write(2, "\n", 1
)                       = 1
rename("/acct/jynelson/.rustup/tmp/lq90se25u5caz4hl_file", "/acct/jynelson/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/components") = 0
statx(AT_FDCWD, "/acct/jynelson/.rustup/tmp/lq90se25u5caz4hl_file", AT_STATX_SYNC_AS_STAT, STATX_ALL, 0x7ffe97eb8320) = -1 ENOENT (No such file or directory)
ioctl(2, TCGETS, {B38400 opost isig icanon echo ...}) = 0
write(2, "\33[31m", 5)                  = 5
ioctl(2, TCGETS, {B38400 opost isig icanon echo ...}) = 0
write(2, "\33[1m", 4)                   = 4
write(2, "error: ", 7error: )                  = 7
ioctl(2, TCGETS, {B38400 opost isig icanon echo ...}) = 0
write(2, "\33(B\33[m", 6)               = 6
write(2, "could not create temp directory:"..., 33could not create temp directory: ) = 33
write(2, "/acct/jynelson/.rustup/tmp/tx3h0"..., 47/acct/jynelson/.rustup/tmp/tx3h0o7z7p7_09_y_dir) = 47
write(2, "\n", 1
)                       = 1
sigaltstack({ss_sp=NULL, ss_flags=SS_DISABLE, ss_size=8192}, NULL) = 0
munmap(0x7fa25274e000, 12288)           = 0
exit_group(1)                           = ?
+++ exited with 1 +++

@kinnison
Copy link
Contributor

kinnison commented Nov 2, 2020

So the syscall sequence which fills me with the heebie jeebies in particular is:

openat(AT_FDCWD, "/acct/jynelson/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/components", O_RDONLY|O_CLOEXEC) = 3
statx(3, "", AT_STATX_SYNC_AS_STAT|AT_EMPTY_PATH, STATX_ALL, {stx_mask=STATX_BASIC_STATS, stx_attributes=0, stx_mode=S_IFREG|0644, stx_size=212, ...}) = 0
openat(AT_FDCWD, "/acct/jynelson/.rustup/tmp/lq90se25u5caz4hl_file", O_WRONLY|O_CREAT|O_TRUNC|O_CLOEXEC, 0100644) = 4
statx(4, "", AT_STATX_SYNC_AS_STAT|AT_EMPTY_PATH, STATX_ALL, {stx_mask=STATX_BASIC_STATS, stx_attributes=0, stx_mode=S_IFREG|0644, stx_size=0, ...}) = 0
fchmod(4, 0100644)                      = 0
copy_file_range(3, NULL, 4, NULL, 212, 0) = 212
close(4)                                = -1 EDQUOT (Disk quota exceeded)
close(3)                                = 0
rename("/acct/jynelson/.rustup/tmp/3buply84_zc_1f8f_file", "/acct/jynelson/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/components") = 0

@rbtcollins
Copy link
Contributor

rbtcollins commented Nov 2, 2020

I smell transactional rollback leaping blindly into the unknown... which if correct makes it a case of #2417 indeed, at least for the corrupt components list

@rbtcollins
Copy link
Contributor

Hmm no the rollback starts later; the corruption here starts when we have a failure signalled via close but not detected in our code.

I suspect we need some file::flush() calls to fix this - note that this doesn't imply fsync on rust.
https://github.com/rust-lang/rust/blob/4c0c5e099a3b1f1c6ad53115189c2710495588b3/library/std/src/sys/unix/fs.rs#L861

https://github.com/rust-lang/rust/blob/4c0c5e099a3b1f1c6ad53115189c2710495588b3/library/std/src/sys/windows/fs.rs#L438

https://github.com/rust-lang/rust/blob/4c0c5e099a3b1f1c6ad53115189c2710495588b3/library/std/src/io/buffered/bufwriter.rs#L350

So from a performance perspective this is a no-op: all we are doing is just ensuring we have flushed the intermediate buffers that are accumulating the content out to the OS before the drop-is-close semantics kick in. For most-but-not-all file systems that will be enough to get this error earlier. Except NFS, where we can't flush the OS buffer short of an fsync, and that - well we may have to for some select files, but as it won't actually fix the out of space issue, we'll need to think more on this anyway: the goal here should be to not corrupt things, rather than 'succeeding'.

@rbtcollins
Copy link
Contributor

Though the fact that copy_file_range succeeded and close(4) failed leaves me in some doubt about this, and possibly we are just flat out of luck and really really we need close() as an actually fallable syscall here.

@rbtcollins
Copy link
Contributor

We can access close() ourselves by getting the fd via https://doc.rust-lang.org/std/os/unix/io/trait.AsRawFd.html + https://docs.rs/libc/0.2.65/libc/fn.close.html

@jyn514
Copy link
Member Author

jyn514 commented Nov 3, 2020

@rbtcollins if you want to push a change using flush() or sync_all(), I can build rustup from source and see if that helps (as opposed to a full close() call).

@rbtcollins
Copy link
Contributor

There's some chance that sync_all would force the error to be exposed, but it's also a crippling performance impact to make, so other than exploring the space to understand things, I really don't want to do that. And flush() is a no-op on the file, so for our case here I think it will have no effect.

@kinnison
Copy link
Contributor

kinnison commented Nov 3, 2020

To keep all the info in one place, there is https://crates.io/crates/close-file but it's not quite right in the head. Alex also wrote https://gist.github.com/alexcrichton/0489d44efb7b3a6aa96fae044dd1be23 which is a bit better though still not fully correct or ideal

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants