Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"No .cabal file found in directory" (git-location package race condition?) #2389

Closed
DaveCTurner opened this issue Jul 19, 2016 · 11 comments
Closed
Milestone

Comments

@DaveCTurner
Copy link
Contributor

Hi,

We are seeing about 1-in-10 builds fail in the following fashion:

+ stack --local-bin-path ../deploy/ build XXXXXXX:test --verbose
Version 1.1.2, Git revision cebe10e845fed4420b6224d97dcabf20477bbd4b (3646 commits) x86_64 hpack-0.14.0
Getting project config file from STACK_YAML environment
2016-07-19 05:05:45.435407: [info] Getting project config file from STACK_YAML environment @(stack_K1e6VSKnzs1HNYmTJGAgTQ:Stack.Config src/Stack/Config.hs:802:13)
2016-07-19 05:05:45.435690: [debug] Loading project config file /var/lib/jenkins/slave_root/workspace/XXXXXXX/builds/XXXXXXX.yaml @(stack_K1e6VSKnzs1HNYmTJGAgTQ:Stack.Config src/Stack/Config.hs:829:13)
2016-07-19 05:05:45.437264: [debug] Checking whether stack was built with libgmp4 @(stack_K1e6VSKnzs1HNYmTJGAgTQ:Stack.Config src/Stack/Config.hs:326:5)
2016-07-19 05:05:45.437383: [debug] Run process: ldd /usr/bin/stack @(stack_K1e6VSKnzs1HNYmTJGAgTQ:System.Process.Read src/System/Process/Read.hs:283:3)
2016-07-19 05:05:45.443443: [debug] Stack was not built with libgmp4 @(stack_K1e6VSKnzs1HNYmTJGAgTQ:Stack.Config src/Stack/Config.hs:330:14)
2016-07-19 05:05:45.443543: [debug] Trying to decode /home/jenkins/.stack/build-plan-cache/x86_64-linux/lts-5.2.cache @(stack_K1e6VSKnzs1HNYmTJGAgTQ:Data.Binary.VersionTagged src/Data/Binary/VersionTagged.hs:55:5)
2016-07-19 05:05:45.454655: [debug] Success decoding /home/jenkins/.stack/build-plan-cache/x86_64-linux/lts-5.2.cache @(stack_K1e6VSKnzs1HNYmTJGAgTQ:Data.Binary.VersionTagged src/Data/Binary/VersionTagged.hs:64:13)
2016-07-19 05:05:45.454748: [debug] Getting system compiler version @(stack_K1e6VSKnzs1HNYmTJGAgTQ:Stack.Setup src/Stack/Setup.hs:341:17)
2016-07-19 05:05:45.454845: [debug] Run process: ghc --info @(stack_K1e6VSKnzs1HNYmTJGAgTQ:System.Process.Read src/System/Process/Read.hs:283:3)
2016-07-19 05:05:45.483406: [debug] Asking GHC for its version @(stack_K1e6VSKnzs1HNYmTJGAgTQ:Stack.Setup.Installed src/Stack/Setup/Installed.hs:94:13)
2016-07-19 05:05:45.483505: [debug] Run process: ghc --numeric-version @(stack_K1e6VSKnzs1HNYmTJGAgTQ:System.Process.Read src/System/Process/Read.hs:283:3)
2016-07-19 05:05:45.510598: [debug] Getting Cabal package version @(stack_K1e6VSKnzs1HNYmTJGAgTQ:Stack.GhcPkg src/Stack/GhcPkg.hs:165:5)
2016-07-19 05:05:45.510718: [debug] Run process: ghc-pkg --no-user-package-db field --simple-output Cabal version @(stack_K1e6VSKnzs1HNYmTJGAgTQ:System.Process.Read src/System/Process/Read.hs:283:3)
2016-07-19 05:05:45.526247: [debug] Resolving package entries @(stack_K1e6VSKnzs1HNYmTJGAgTQ:Stack.Setup src/Stack/Setup.hs:221:5)
2016-07-19 05:05:45.527107: [debug] Run process: git reset --hard 02cffad5bc84d3017a46ff42163cf5ad8548c637 -- @(stack_K1e6VSKnzs1HNYmTJGAgTQ:System.Process.Read src/System/Process/Read.hs:283:3)
2016-07-19 05:05:45.529177: [debug] Run process: git reset --hard 4a68a4fa448e4185b1bd4b96db281568f53e2e42 -- @(stack_K1e6VSKnzs1HNYmTJGAgTQ:System.Process.Read src/System/Process/Read.hs:283:3)
2016-07-19 05:05:45.530809: [debug] Run process: git reset --hard 9d093c176b327ca4a9bf00deb277029e4fadad2d -- @(stack_K1e6VSKnzs1HNYmTJGAgTQ:System.Process.Read src/System/Process/Read.hs:283:3)
2016-07-19 05:05:45.532662: [debug] Run process: hg update -C e556da82cae8f092c8348b528bab007ef7b359af -- @(stack_K1e6VSKnzs1HNYmTJGAgTQ:System.Process.Read src/System/Process/Read.hs:283:3)
2016-07-19 05:05:45.579141: [debug] Getting global package database location @(stack_K1e6VSKnzs1HNYmTJGAgTQ:Stack.GhcPkg src/Stack/GhcPkg.hs:48:5)
2016-07-19 05:05:45.579243: [debug] Run process: ghc-pkg --no-user-package-db list --global @(stack_K1e6VSKnzs1HNYmTJGAgTQ:System.Process.Read src/System/Process/Read.hs:283:3)
No .cabal file found in directory /var/lib/jenkins/slave_root/workspace/XXXXXXX/builds/.stack-work/downloaded/RpUseaHyHxB2/

There is a cabal file in that directory, but it seems it was created a few seconds later:

$ stat ekg-statsd.cabal
  File: ‘ekg-statsd.cabal’
  Size: 949             Blocks: 8          IO Block: 4096   regular file
Device: fc03h/64515d    Inode: 2106726     Links: 1
Access: (0664/-rw-rw-r--)  Uid: (  106/ jenkins)   Gid: (  113/ jenkins)
Access: 2016-07-19 05:05:53.535758342 +0000
Modify: 2016-07-19 05:05:53.535758342 +0000
Change: 2016-07-19 05:05:53.535758342 +0000
 Birth: -

Taking a wild guess (I haven't looked at the code yet) I'm guessing it might be that it's not waiting for the git reset commands to run to completion? I unfortunately doubt I'll have time to look at this in detail before September, but thought I'd better note this here anyway.

@mgsloan mgsloan added this to the P2: Should milestone Jul 19, 2016
@mgsloan
Copy link
Contributor

mgsloan commented Jul 19, 2016

Thanks for the report! I think it is waiting for reset, not sure what could cause this.

@DaveCTurner
Copy link
Contributor Author

Hmm, ok, not the obvious race then. Makes sense on reflection: I doubt we'd be the only people seeing this issue if it were that.

Frustrating that it's so intermittent, but one thing that is definitely consistent is that it's always the same directory with the missing file.

@Blaisorblade
Copy link
Collaborator

Blaisorblade commented Jul 21, 2016

  • IIUC this fails in resetting to haskell-github-trust/ekg-statsd@02cffad, right?
  • Timestamps are from when the reset starts, not when it finishes. Hence it might be normal that the fetch finishes a few seconds later, not sure without further log info.
  • Does reset take 20 milliseconds or is stack fetching the packages in parallel*? A problem with that concurrency would be the next obvious guess, and could be much harder to trigger.
    EDIT: seems fetching might not happen in parallel. I also wondered what happens if git reset forks, but I'd guess the git reset process will still wait for completion of all its children. I'll stop speculating.

*I guess so based on https://neilmitchell.blogspot.de/2016/07/why-did-stack-stop-using-shake.html, and because it makes sense, but I don't really know.

@DaveCTurner
Copy link
Contributor Author

You're right about what it's resetting to. I think this build starts from a clean state and isn't doing anything incremental but I'm not sure and can't check right now.

A few seconds seems about right for a reset. 20ms is remarkably short, and indicates to me that they happen in parallel.

@Blaisorblade
Copy link
Collaborator

If this started from a clean state, there should be a clone action, and
theoretically it should be logged looking at the source a bit.
Could we add right now debug logging for when the processes end?

Paolo Giarrusso
From smartphone, sorry for typos or excessive brevity

On Jul 21, 2016 15:14, "David Turner" [email protected] wrote:

You're right about what it's resetting to. I think this build starts from
a clean state and isn't doing anything incremental but I'm not sure and
can't check right now.

A few seconds seems about right for a reset. 20ms is remarkably short, and
indicates to me that they happen in parallel.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#2389 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AARsqHMh_iNavixVM5IiG7YxP22TLJS3ks5qX3C2gaJpZM4JPeWu
.

@mgsloan
Copy link
Contributor

mgsloan commented Jul 26, 2016

@Blaisorblade sure, I'd be in favor of generally logging when processes end.

@DaveCTurner
Copy link
Contributor Author

Hmm, ok, actually managed to catch this in the act. Same message:

No .cabal file found in directory /var/lib/jenkins/slave_root/workspace/XXXXXXX/builds/.stack-work/downloaded/RpUseaHyHxB2/

but that directory doesn't exist at all this time:

$ ls '/var/lib/jenkins/slave_root/workspace/XXXXXXX/builds/.stack-work/downloaded/'
520a286ae02e6322ef44d5e5fa52be99a1f025391008645b25abb9286164db8a  gop_KYYqWQsi  roPexwhCyhg_  zu5Bqd0L4qn6

(so there's certainly no .cabal file there!)

@Blaisorblade
Copy link
Collaborator

Congratulations on catching this! @mgsloan this suggests cabal is not waiting correctly for git clone somehow, is there involved code where this could happen? Are packages indeed fetched in parallel or not?

Regarding the extra logging we discussed, #2327 is related.

@mgsloan
Copy link
Contributor

mgsloan commented Mar 18, 2017

Some of the git code has changed since this issue, so hoping this has been resolved as a side effect. Please re-open if it's still an issue.

@mgsloan mgsloan closed this as completed Mar 18, 2017
@ChristopherKing42
Copy link

Still an issue here:

$ stack upgrade --git
Information about the commit this version of stack was built from is not available due to how it was built. Will continue by assuming an upgrade is needed because we have no information to the contrary.
Cloning stack
Cloning into 'stack'...
remote: Counting objects: 677, done.
remote: Compressing objects: 100% (505/505), done.
remote: Total 677 (delta 60), reused 424 (delta 30), pack-reused 0
Receiving objects: 100% (677/677), 698.55 KiB | 847.00 KiB/s, done.
Resolving deltas: 100% (60/60), done.
Checking connectivity... done.
No .cabal file found in directory /tmp/stack-upgrade26258/stack/

@mgsloan
Copy link
Contributor

mgsloan commented Dec 10, 2017

@ChristopherKing42 That is likely a different problem, my guess is that you are using a version of stack before hpack support. I suggest doing a manual binary upgrade.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants