Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hydra-eval-jobs masks error messages from Nix (again) #822

Open
samueldr opened this issue Oct 8, 2020 · 4 comments
Open

hydra-eval-jobs masks error messages from Nix (again) #822

samueldr opened this issue Oct 8, 2020 · 4 comments

Comments

@samueldr
Copy link
Member

samueldr commented Oct 8, 2020

Looking at the following:

I had an inkling of a doubt that it was a similar issue to what was found previously. Where the error message the evaluator gives us is not related to the error that matters. (#728)

The final output of the evaluator, on my end, reduced:

[...]
warning: dumping very large path (> 256 MiB); this may run out of memory
warning: dumping very large path (> 256 MiB); this may run out of memory
warning: dumping very large path (> 256 MiB); this may run out of memory
warning: dumping very large path (> 256 MiB); this may run out of memory
warning: dumping very large path (> 256 MiB); this may run out of memory
warning: dumping very large path (> 256 MiB); this may run out of memory
error: unexpected EOF reading a line

That doesn't sound helpful. I tried using strace like the last time it happened. A big 20GiB file, but a quick tail -n 9999 trace.strace > reduced.strace and we can look at it.

tail -n 130 reduced.strace
31427 read(10, "32768) = 0\n31427 close(10)                         = 0\n31427 lstat(\"/Users/samuel/tmp/nixpkgs/master/pkgs/tools/typesetting/tex/texlive/UPGRADING.md\", {st_mode=S_IFREG|0644, st_size=2306, ...}) = 0\n31427 lstat(\"/Users/samuel/tmp/nixpkgs/master/pkgs/tools/typesetting/tex/texlive/UPGRADING.md\", {st_mode=S_IFREG|0644, st_size=2306, ...}) = 0\n31427 openat(AT_FDCWD, \"/Users/samuel/tmp/nixpkgs/master/pkgs/tools/typesetting/tex/texlive/UPGRADING.md\", O_RDONLY|O_CLOEXEC) = 10\n31427 read(10, \"# Notes on maintaining/"..., 65536) = 65536
31427 write(9, "32768) = 0\n31427 close(10)                         = 0\n31427 lstat(\"/Users/samuel/tmp/nixpkgs/master/pkgs/tools/typesetting/tex/texlive/UPGRADING.md\", {st_mode=S_IFREG|0644, st_size=2306, ...}) = 0\n31427 lstat(\"/Users/samuel/tmp/nixpkgs/master/pkgs/tools/typesetting/tex/texlive/UPGRADING.md\", {st_mode=S_IFREG|0644, st_size=2306, ...}) = 0\n31427 openat(AT_FDCWD, \"/Users/samuel/tmp/nixpkgs/master/pkgs/tools/typesetting/tex/texlive/UPGRADING.md\", O_RDONLY|O_CLOEXEC) = 10\n31427 read(10, \"# Notes on maintaining/"..., 65536) = 65536
31427 read(10, "ner = \\\"ddennedy\\\";\\n    repo = \\\"dvgrab\\\";\\n    rev = \\\"e46042e0c7b3523b6854ee547b0534e8310b7460\\\";\\n    sha256 = \\\"17qy76fjpzrbxm4pj0ljx5lbimxryv24fvr13jwkh24j85dxailn\\\";\\n  };\\n\\n  buildInputs = [\\n    libunwind libraw1394 libjpeg libiec61883 libdv libavc1394\\n  \"..., 1196) = 1196\n31427 close(10)                         = 0\n31427 lstat(\"/Users/samuel/tmp/nixpkgs/master/pkgs/tools/video/flvtool2\", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0\n31427 lstat(\"/Users/samuel/tmp/nixpkgs/master/pkgs/tools/video"..., 65536) = 65536
31427 write(9, "ner = \\\"ddennedy\\\";\\n    repo = \\\"dvgrab\\\";\\n    rev = \\\"e46042e0c7b3523b6854ee547b0534e8310b7460\\\";\\n    sha256 = \\\"17qy76fjpzrbxm4pj0ljx5lbimxryv24fvr13jwkh24j85dxailn\\\";\\n  };\\n\\n  buildInputs = [\\n    libunwind libraw1394 libjpeg libiec61883 libdv libavc1394\\n  \"..., 1196) = 1196\n31427 close(10)                         = 0\n31427 lstat(\"/Users/samuel/tmp/nixpkgs/master/pkgs/tools/video/flvtool2\", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0\n31427 lstat(\"/Users/samuel/tmp/nixpkgs/master/pkgs/tools/video"..., 65536) = 65536
31427 read(10, "\\0\\0\\0\\0\\4\\0\\0\\0\\0\\0\\0\\0type\\0\\0\\0\\0\\t\\0\\0\\0\\0\\0\\0\\0directory\\0\\0\\0\\0\\0\\0\\0\\5\\0\\0\\0\\0\\0\\0\\0entry\\0\\0\\0\\1\\0\\0\\0\\0\\0\\0\\0(\\0\\0\\0\\0\\0\\0\\0\\4\\0\\0\\0\\0\\0\\0\\0name\\0\\0\\0\\0\\v\\0\\0\\0\\0\\0\\0\\0default.nix\\0\\0\\0\\0\\0\\4\\0\\0\\0\\0\\0\\0\\0node\\0\\0\\0\\0\\1\\0\\0\\0\\0\\0\\0\\0(\\0\\0\\0\\0\\0\\0\\0\\4\\0\\0\\0\\0\\0\\0\\0type\\0\\0\\0\\0\\7\\0\\0\\0\\0\\0\\0\\0regular\\0\\10\\0\\0\\0\\0\\0\\0\\0contents\\37\\5\\0\\0\\0\\0\\0\\0{ stdenv, fetchurl, unzip, ruby, openssl, makeWrapper }:\\n\\nstdenv.mkDerivation rec {\\n  pname = \\\"ec2-ami-tools\\\";\\n\\n  version = \\\"1.5.7\\\";\\n\\n  b\"..., 19343)"..., 65536) = 65536
31427 write(9, "\\0\\0\\0\\0\\4\\0\\0\\0\\0\\0\\0\\0type\\0\\0\\0\\0\\t\\0\\0\\0\\0\\0\\0\\0directory\\0\\0\\0\\0\\0\\0\\0\\5\\0\\0\\0\\0\\0\\0\\0entry\\0\\0\\0\\1\\0\\0\\0\\0\\0\\0\\0(\\0\\0\\0\\0\\0\\0\\0\\4\\0\\0\\0\\0\\0\\0\\0name\\0\\0\\0\\0\\v\\0\\0\\0\\0\\0\\0\\0default.nix\\0\\0\\0\\0\\0\\4\\0\\0\\0\\0\\0\\0\\0node\\0\\0\\0\\0\\1\\0\\0\\0\\0\\0\\0\\0(\\0\\0\\0\\0\\0\\0\\0\\4\\0\\0\\0\\0\\0\\0\\0type\\0\\0\\0\\0\\7\\0\\0\\0\\0\\0\\0\\0regular\\0\\10\\0\\0\\0\\0\\0\\0\\0contents\\37\\5\\0\\0\\0\\0\\0\\0{ stdenv, fetchurl, unzip, ruby, openssl, makeWrapper }:\\n\\nstdenv.mkDerivation rec {\\n  pname = \\\"ec2-ami-tools\\\";\\n\\n  version = \\\"1.5.7\\\";\\n\\n  b\"..., 19343)"..., 65536) = 65536
31427 read(10, "lyst-Runtime-5.90126.tar.gz\\\";\\n      sha256 = \\\"66f08334bf8b70049e77c0dcafd741c568e6f1341b2ffbb531a93833638d3986\\\";\\n    };\\n    buildInputs = [ TestFatal Typ\"..., 65536) = 65536\n31427 write(9, \"MooseXTypesLoadableClass TieIxHash ];\\n    meta = {\\n      description = \\\"DBIx::Class::Schema Model Class\\\";\\n      license = with stdenv.lib.licenses; [ artistic1 gpl1Plus ];\\n    };\\n  };\\n\\n  CatalystRuntime = buildPerlPackage {\\n    pname = \\\"Catalyst-Runtime\\\";\\n    version = \\\"5.90126\\\";\\n    src = fetchurl "..., 38273) = 38273
31427 write(9, "lyst-Runtime-5.90126.tar.gz\\\";\\n      sha256 = \\\"66f08334bf8b70049e77c0dcafd741c568e6f1341b2ffbb531a93833638d3986\\\";\\n    };\\n    buildInputs = [ TestFatal Typ\"..., 65536) = 65536\n31427 write(9, \"MooseXTypesLoadableClass TieIxHash ];\\n    meta = {\\n      description = \\\"DBIx::Class::Schema Model Class\\\";\\n      license = with stdenv.lib.licenses; [ artistic1 gpl1Plus ];\\n    };\\n  };\\n\\n  CatalystRuntime = buildPerlPackage {\\n    pname = \\\"Catalyst-Runtime\\\";\\n    version = \\\"5.90126\\\";\\n    src = fetchurl "..., 38273) = 38273
31427 close(10)                         = 0
31427 write(9, "\0\0\0\0\0\0\0\1\0\0\0\0\0\0\0)\0\0\0\0\0\0\0\1\0\0\0\0\0\0\0)\0\0\0\0\0\0\0\1\0\0\0\0\0\0\0)\0\0\0\0\0\0\0", 55) = 55
31427 read(9, "ptxc\0\0\0\0@\0\0\0\0\0\0\0preallocating file of 20528731521 bytes: No space left on device\1\0\0\0\0\0\0\0", 32768) = 88
31427 close(9)                          = 0
31427 write(8, "{\"error\":\"preallocating file of 20528731521 bytes: No space left on device\"}\n", 77) = 77
24419 <... read resumed>"{", 1)         = 1
31427 exit_group(1 <unfinished ...>
24419 read(5,  <unfinished ...>
31427 <... exit_group resumed>)         = ?
24419 <... read resumed>"\"", 1)        = 1
24419 read(5, "e", 1)                   = 1
24419 read(5, "r", 1)                   = 1
24419 read(5, "r", 1)                   = 1
24419 read(5, "o", 1)                   = 1
24419 read(5, "r", 1)                   = 1
24419 read(5, "\"", 1)                  = 1
24419 read(5, ":", 1)                   = 1
24419 read(5, "\"", 1)                  = 1
24419 read(5, "p", 1)                   = 1
24419 read(5, "r", 1)                   = 1
24419 read(5, "e", 1)                   = 1
24419 read(5, "a", 1)                   = 1
24419 read(5, "l", 1)                   = 1
24419 read(5, "l", 1)                   = 1
24419 read(5, "o", 1)                   = 1
24419 read(5, "c", 1)                   = 1
24419 read(5, "a", 1)                   = 1
24419 read(5, "t", 1)                   = 1
24419 read(5, "i", 1)                   = 1
24419 read(5, "n", 1)                   = 1
24419 read(5, "g", 1)                   = 1
24419 read(5, " ", 1)                   = 1
24419 read(5, "f", 1)                   = 1
24419 read(5, "i", 1)                   = 1
24419 read(5, "l", 1)                   = 1
24419 read(5, "e", 1)                   = 1
24419 read(5, " ", 1)                   = 1
24419 read(5, "o", 1)                   = 1
24419 read(5, "f", 1)                   = 1
24419 read(5, " ", 1)                   = 1
24419 read(5, "2", 1)                   = 1
24419 read(5, "0", 1)                   = 1
24419 read(5, "5", 1)                   = 1
24419 read(5, "2", 1)                   = 1
24419 read(5, "8", 1)                   = 1
24419 read(5, "7", 1)                   = 1
24419 read(5, "3", 1)                   = 1
24419 read(5, "1", 1)                   = 1
24419 read(5, "5", 1)                   = 1
24419 read(5, "2", 1)                   = 1
24419 read(5, "1", 1)                   = 1
24419 read(5, " ", 1)                   = 1
24419 read(5, "b", 1)                   = 1
24419 read(5, "y", 1)                   = 1
24419 read(5, "t", 1)                   = 1
24419 read(5, "e", 1)                   = 1
24419 read(5, "s", 1)                   = 1
24419 read(5, ":", 1)                   = 1
24419 read(5, " ", 1)                   = 1
24419 read(5, "N", 1)                   = 1
24419 read(5, "o", 1)                   = 1
24419 read(5, " ", 1)                   = 1
24419 read(5, "s", 1)                   = 1
24419 read(5, "p", 1)                   = 1
24419 read(5, "a", 1)                   = 1
24419 read(5, "c", 1)                   = 1
24419 read(5, "e", 1)                   = 1
24419 read(5, " ", 1)                   = 1
24419 read(5, "l", 1)                   = 1
24419 read(5, "e", 1)                   = 1
24419 read(5, "f", 1)                   = 1
24419 read(5, "t", 1)                   = 1
24419 read(5, " ", 1)                   = 1
24419 read(5, "o", 1)                   = 1
24419 read(5, "n", 1)                   = 1
24419 read(5, " ", 1)                   = 1
24419 read(5, "d", 1)                   = 1
24419 read(5, "e", 1)                   = 1
24419 read(5, "v", 1)                   = 1
24419 read(5, "i", 1)                   = 1
24419 read(5, "c", 1)                   = 1
24419 read(5, "e", 1)                   = 1
24419 read(5, "\"", 1)                  = 1
24419 read(5, "}", 1)                   = 1
24419 read(5, "\n", 1)                  = 1
24419 read(5,  <unfinished ...>
31427 +++ exited with 1 +++
24419 <... read resumed>"", 1)          = 0
24419 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=31427, si_uid=1000, si_status=1, si_utime=2158, si_stime=4177} ---
24419 close(4)                          = 0
24419 close(5)                          = 0
24419 madvise(0x7f140a8f3000, 8368128, MADV_DONTNEED) = 0
24419 exit(0)                           = ?
24406 <... futex resumed>)              = 0
24419 +++ exited with 0 +++
24406 write(2, "\33[31;1merror:\33[0m unexpected EOF reading a line\n", 48) = 48
24406 exit_group(1)                     = ?
24418 <... futex resumed>)              = ?
24417 <... futex resumed>)              = ?
24416 <... futex resumed>)              = ?
24415 <... futex resumed>)              = ?
24414 <... futex resumed>)              = ?
24413 <... futex resumed>)              = ?
24412 <... futex resumed>)              = ?
24411 <... futex resumed>)              = ?
24410 <... futex resumed>)              = ?
24409 <... futex resumed>)              = ?
24408 <... futex resumed>)              = ?
24407 <... rt_sigtimedwait resumed> <unfinished ...>) = ?
24418 +++ exited with 1 +++
24417 +++ exited with 1 +++
24416 +++ exited with 1 +++
24415 +++ exited with 1 +++
24414 +++ exited with 1 +++
24413 +++ exited with 1 +++
24412 +++ exited with 1 +++
24411 +++ exited with 1 +++
24410 +++ exited with 1 +++
24409 +++ exited with 1 +++
24408 +++ exited with 1 +++
24407 +++ exited with 1 +++
24406 +++ exited with 1 +++

More lines in this gist (6.7MiB!).

The error is preallocating file of 20528731521 bytes: No space left on device. (This is 20.53GB.)

So it's failing around here, in Nix, but Hydra totally masks that from the evaluator output.

oops: While this is probably not the issue that the nixos.org hydra evaluator faced, I assume the same root cause hid its message. My rootfs ended up filled :/. I'm re-running the eval to see what was really up.

The issue to fix here, is not the issue that Nixpkgs was having. The issue to fix is to ensure Hydra properly transmits all the information it gets to the end-users who are debugging issues with eval.

Sorry, I don't have much of a reproducer, except to run the evaluator with the problematic commit:

 $ curl -L https://github.com/NixOS/nixpkgs/commit/fb6d63f3fdd95a5468d43a0693c8ca7c1894363f.patch | git am
 $ env -i HYDRA_CONFIG=/Users/samuel/tmp/nixpkgs/hydra.conf strace -s 512 -fo trace.strace /Users/samuel/Projects/nixos/hydra/result/bin/hydra-eval-jobs -I /nix/store -I /Users/samuel/tmp/nixpkgs/master ./nixos/release-combined.nix

Reducing the test case (sorry) is left as an exercise to the reader (for now).


This was run using 405c52b. The currently deployed hydra.

cc: NixOS/nixpkgs#93457

@samueldr
Copy link
Member Author

samueldr commented Oct 8, 2020

So yeah, my eval was wrong, but hiding the message is still problematic. And hopefully is the same root cause anyway for hiding the message in the problem at hand.

@FRidh
Copy link
Member

FRidh commented Jan 3, 2021

We're facing an issue with masking of an error again NixOS/nixpkgs#108304.

@samueldr
Copy link
Member Author

samueldr commented Jan 3, 2021

@edolstra can we have a review of the C++ code from someone that knows about the implementation, so it can then be fixed and merged?

@lukegb
Copy link
Contributor

lukegb commented Sep 22, 2021

#1025 to fix a different class of failure

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants