-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Daemon 0.4.3-rc2 crash #3045
Comments
@NeoTeo this happens every time you restart the daemon? Did you build from source? Or use a downloaded binary from dists? |
Hey @whyrusleeping :) |
@NeoTeo what version of OSX are you running? I can't reproduce this issue... Did you start with a new repo, or is this an older repo that has a bunch of stuff in it still? |
It first happened on El Capitan 10.11.6 (15G31) so I tried it on a different machine which runs Sierra 10.12 beta (16A270f). On the Sierra I also deleted the .ipfs and did an ipfs init. Just tried the El Cap version (which has an existing v4 repo) again and now it doesn't crash but alternates between hanging and returning |
@NeoTeo alright, i'll look into it more. Do you have a repeatable way of reproducing the issue? Or is it just more or less random? |
On Sierra the crash is repeatable, on El Cap the alternating behaviour is consistent (and never actually cats the hash). I have just tried doing a |
Let me know if you need me to try something. |
@NeoTeo This is really bizarre, it feels like a bug in go to me. |
@NeoTeo If its not too much trouble, on the system where the crash is reproducible, could you try installing the latest release candidate of go (1.7) and try building ipfs from source? I'm thinking that this might be an issue caused by changes in OSX syscalls |
ok gonna try that. |
Good hunch! Not crashing on Sierra now. Gonna try on El Cap to see if it solves the hanging issue there as well. |
If that fixes it, then i'll just have to make sure the release binaries for 0.4.3 get built with go1.7 (which i was hoping to do anyways) |
Alas, El Cap still gives the error: merkledag: not found. |
The Sierra binary wasn't rebuilt. Upgrading go was enough. |
Aaand, it crashed after successfully cat'ing the file. I'll try a build from source. |
Trouble with "make install". Overrode check_go_version to accept "1.7rc5 " version number but gx seems to not like it either as its strconv.ParseInt doesn't like the rc part... |
@NeoTeo just do:
I need to fix the go version parsing code there, its been biting me too |
Hm...odd. The first time it gave me this:
Ran it again straight after and it worked just fine. The node's been rock solid since first run. And any subsequent cat'ing works as expected \o/ Is it worth trying this on the El Cap version too? |
Just built 0.4.3-rc2 from source on El Cap with go 1.7rc5 but unfortunately it still hangs on cat. Different problem? At least the Sierra problem seems fixed :) |
Hm, spoke a bit too soon. An https://gist.github.com/NeoTeo/fbaf0357ef5cf7ae1576f928d438d3e0 |
Anytime you see 'unexpected fault address', it's 99% of the time a go compiler bug. You've rebuilt go-ipfs with go1.7-rc5 (theres an rc6 now with a bunch of OSX sierra changes) and have restarted the daemon? |
Yep. I'm updating Sierra right now to b5, so I'll try go1.7-rc6 just after. |
Looking good so far. Using rc6 the daemon has survived both an add and a cat. |
Same setup on El Capitan still gives |
@NeoTeo on el capitan can you try with a freshly inited repo? Thats very odd that its still failing |
I have run an ipfs repo verify (there's a lot there I'd be sad to lose). Can I safely squirrel away the .ipfs directory, init and then hope to somehow restore the repo after? |
@NeoTeo yeah, you can just move the directory somewhere safe and move it back once we figure out the problem |
:/ Seems it was the repo somehow. Weird it'd pass the repo verify though. |
@NeoTeo thats no good! Is it just catting the one object that fails? Or do other things break? Can you add a new file and cat it? |
@whyrusleeping with the fresh init it adds and cats as it should. |
@NeoTeo on the old repo, when it hangs, can you give me the output of pressing |
^\SIGQUIT: quit goroutine 0 [idle]: goroutine 1 [select]: goroutine 17 [syscall, locked to thread]: goroutine 18 [chan receive]: goroutine 19 [syscall]: goroutine 20 [select]: goroutine 10 [select, locked to thread]: goroutine 23 [chan receive]: goroutine 27 [IO wait]: goroutine 28 [select]: rax 0xe |
@NeoTeo could I get the output from the daemon while that command is hanging? (press |
@whyrusleeping That's what that was. I did a |
No, wait. Maybe I misunderstood. On the daemon. Coming up... |
@whyrusleeping Wow, that was a big one: https://gist.github.com/NeoTeo/752acc221e42889c3c3a8a9fc58b1585 |
@NeoTeo that is really strange... can you try deleting the following from line 45 of
For some reason your stack is telling me its hanging while running a garbage collection... but not actually getting to the GC part, just trying to take the lock? |
Yep, that works. But presumably you put that code there for a reason... some other part locking the GC? |
...and, since a freshly inited repo doesn't hang, somehow related to dealing with a repo that has been migrated from the the pre 0.4 days and yet passes the |
@NeoTeo how large is this repo? (the one that youre having troubles with) I think this might be related to another issue where checking if we need a GC is too expensive |
@whyrusleeping ~12 GB |
@NeoTeo in your .ipfs config, can you change Datastore.StorageMax to something like 1000GB or something? and then try catting something with the normal binary (not with the removed GC code) |
Yep, that's it! So a warning or a friendly fail message to to tell you you've exceeded your max would do it. Nice investigative work though @whyrusleeping 👍 :) |
@whyrusleeping it might be the same issue. I had the default GC setting in my config, we might want to be more explicit about running GC because size limit was hit. also the softGC shouldn't lock if there is real GC running. |
We should not have that 'feature' enabled by default. If you want to set a size limit, you should have to do it manually |
@NeoTeo gonna go ahead and close this as i think we solved all the problems :) Please open a new issue if anything else comes up |
Version/Platform/Processor information (from
ipfs version --all
):go-ipfs version: 0.4.3-rc2-
Repo version: 4
System version: amd64/darwin
Golang version: go1.6.2
Type (bug, feature, meta, test failure, question):
bug
Area (api, commands, daemon, fuse, etc):
daemon
Priority (from P0: functioning, to P4: operations on fire):
P3: frequent crashing
Description:
I can reproduce the following crash by starting the daemon and then calling
ipfs cat QmPZ9gcCEpqKTo6aq61g2nXGUhM4iCL3ewB6LDXZCtioEB
The daemon crashes out with the output included below.
Subsequent runs of the daemon crash out immediately.
Crash log:
The text was updated successfully, but these errors were encountered: