Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CIDER thread hangs when reading from dead/silent TCP socket #1438

Closed
alexander-yakushev opened this issue Nov 28, 2015 · 8 comments
Closed
Labels

Comments

@alexander-yakushev
Copy link
Member

Reproducible example:

(slurp (clojure.java.io/reader (.getInputStream (java.net.Socket. "google.com" 80))))

Triggering interruption (C-c C-b) shows Evaluation interrupted. but in fact it isn't. The thread unfreezes on its own after a short while (~1 minute). I believe this is java.net.Sockets own timeout.

@bbatsov
Copy link
Member

bbatsov commented Nov 28, 2015

This issue is a great mystery. When I evaluate the code and try to interrupt it I see the following messages:

(---> 
  ns  "buildstuff.core"
  op  "eval"
  session  "55fdf230-d706-4136-b5e3-ee0c950440ba"
  code  "(slurp (clojure.java.io/reader (.getInputStream (java.net.Socket. \"google.com\" 80))))"
  file  "/Users/bozhidar/projects/buildstuff/src/buildstuff/core.clj"
  line  33
  column  0
  id  "195"
)
(---> 
  op  "interrupt"
  session  "55fdf230-d706-4136-b5e3-ee0c950440ba"
  interrupt-id  "195"
  id  "196"
)
(---> 
  op  "interrupt"
  session  "55fdf230-d706-4136-b5e3-ee0c950440ba"
  interrupt-id  "110"
  id  "197"
)
(---> 
  op  "interrupt"
  session  "55fdf230-d706-4136-b5e3-ee0c950440ba"
  interrupt-id  "107"
  id  "198"
)
(---> 
  op  "interrupt"
  session  "55fdf230-d706-4136-b5e3-ee0c950440ba"
  interrupt-id  "105"
  id  "199"
)
(---> 
  op  "interrupt"
  session  "55fdf230-d706-4136-b5e3-ee0c950440ba"
  interrupt-id  "83"
  id  "200"
)
(---> 
  op  "interrupt"
  session  "55fdf230-d706-4136-b5e3-ee0c950440ba"
  interrupt-id  "75"
  id  "201"
)
(---> 
  op  "interrupt"
  session  "55fdf230-d706-4136-b5e3-ee0c950440ba"
  interrupt-id  "70"
  id  "202"
)
(---> 
  op  "interrupt"
  session  "55fdf230-d706-4136-b5e3-ee0c950440ba"
  interrupt-id  "62"
  id  "203"
)
(---> 
  op  "interrupt"
  session  "55fdf230-d706-4136-b5e3-ee0c950440ba"
  interrupt-id  "54"
  id  "204"
)
(---> 
  op  "interrupt"
  session  "55fdf230-d706-4136-b5e3-ee0c950440ba"
  interrupt-id  "32"
  id  "205"
)
(---> 
  op  "interrupt"
  session  "55fdf230-d706-4136-b5e3-ee0c950440ba"
  interrupt-id  "25"
  id  "206"
)
(---> 
  op  "interrupt"
  session  "55fdf230-d706-4136-b5e3-ee0c950440ba"
  interrupt-id  "12"
  id  "207"
)
(---> 
  op  "interrupt"
  session  "55fdf230-d706-4136-b5e3-ee0c950440ba"
  interrupt-id  "10"
  id  "208"
)
(---> 
  op  "interrupt"
  session  "55fdf230-d706-4136-b5e3-ee0c950440ba"
  interrupt-id  "8"
  id  "209"
)
(---> 
  op  "interrupt"
  session  "55fdf230-d706-4136-b5e3-ee0c950440ba"
  interrupt-id  "7"
  id  "210"
)
(---> 
  op  "interrupt"
  session  "55fdf230-d706-4136-b5e3-ee0c950440ba"
  interrupt-id  "18"
  id  "211"
)
(---> 
  op  "interrupt"
  session  "55fdf230-d706-4136-b5e3-ee0c950440ba"
  interrupt-id  "5"
  id  "212"
)
(<- 
  id  "195"
  session  "55fdf230-d706-4136-b5e3-ee0c950440ba"
  status  ("interrupted")
)
(<- 
  id  "196"
  session  "55fdf230-d706-4136-b5e3-ee0c950440ba"
  status  ("done")
)
(<- 
  id  "198"
  session  "55fdf230-d706-4136-b5e3-ee0c950440ba"
  status  ("done" "interrupt-id-mismatch" "error")
)
(<- 
  id  "200"
  session  "55fdf230-d706-4136-b5e3-ee0c950440ba"
  status  ("done" "interrupt-id-mismatch" "error")
)
(<- 
  id  "201"
  session  "55fdf230-d706-4136-b5e3-ee0c950440ba"
  status  ("done" "interrupt-id-mismatch" "error")
)
(<- 
  id  "203"
  session  "55fdf230-d706-4136-b5e3-ee0c950440ba"
  status  ("done" "interrupt-id-mismatch" "error")
)
(<- 
  id  "205"
  session  "55fdf230-d706-4136-b5e3-ee0c950440ba"
  status  ("done" "interrupt-id-mismatch" "error")
)
(<- 
  id  "207"
  session  "55fdf230-d706-4136-b5e3-ee0c950440ba"
  status  ("done" "interrupt-id-mismatch" "error")
)
(<- 
  id  "197"
  session  "55fdf230-d706-4136-b5e3-ee0c950440ba"
  status  ("done" "interrupt-id-mismatch" "error")
)
(<- 
  id  "209"
  session  "55fdf230-d706-4136-b5e3-ee0c950440ba"
  status  ("done" "interrupt-id-mismatch" "error")
)
(<- 
  id  "211"
  session  "55fdf230-d706-4136-b5e3-ee0c950440ba"
  status  ("done" "interrupt-id-mismatch" "error")
)
(<- 
  id  "212"
  session  "55fdf230-d706-4136-b5e3-ee0c950440ba"
  status  ("done" "interrupt-id-mismatch" "error")
)
(<- 
  id  "204"
  session  "55fdf230-d706-4136-b5e3-ee0c950440ba"
  status  ("done" "interrupt-id-mismatch" "error")
)
(<- 
  id  "202"
  session  "55fdf230-d706-4136-b5e3-ee0c950440ba"
  status  ("done" "interrupt-id-mismatch" "error")
)
(<- 
  id  "210"
  session  "55fdf230-d706-4136-b5e3-ee0c950440ba"
  status  ("done" "interrupt-id-mismatch" "error")
)
(<- 
  id  "208"
  session  "55fdf230-d706-4136-b5e3-ee0c950440ba"
  status  ("done" "interrupt-id-mismatch" "error")
)
(<- 
  id  "199"
  session  "55fdf230-d706-4136-b5e3-ee0c950440ba"
  status  ("done" "interrupt-id-mismatch" "error")
)
(<- 
  id  "206"
  session  "55fdf230-d706-4136-b5e3-ee0c950440ba"
  status  ("done" "interrupt-id-mismatch" "error")
)
(<- 
  id  "195"
  session  "55fdf230-d706-4136-b5e3-ee0c950440ba"
  status  ("done")
)
(<- 
  changed-namespaces  (dict)
  id  "195"
  repl-type  "clj"
  session  "55fdf230-d706-4136-b5e3-ee0c950440ba"
  status  ("state")
)

Seems the eval is interrupted successfully according to nREPL, but for some reason that doesn't really happen. And haven't yet explored why we send a ton of extra interrupt requests - this seems like some bug.

@alexander-yakushev
Copy link
Member Author

Interesting. I must be extremely lucky to bump into this as the very first thing after I launched Clojure repl in a few months :D

@bbatsov bbatsov added the bug label Nov 29, 2015
@alexander-yakushev
Copy link
Member Author

Is there any progress with this? At least any ideas where so many interrupts are coming from?

@bbatsov
Copy link
Member

bbatsov commented Dec 8, 2015

No progress yet. Was super busy between work and ClojureX and I put this on the back-burner. I hope to find some time for this by the end of the week, but if someone decides to investigate the problem before me - I won't mind that at all.

@alexander-yakushev
Copy link
Member Author

Thanks. This has been a terrible productive work killer for me lately. Might as well take a look myself. Any clues where should I look first?

@Malabarba
Copy link
Member

The "so many interrupts" thing is probably just cider being overzealous. If some request was not marked as "done", cider-interrupt will send an interrupt request for it (if the request was done and we didn't know, this is harmless).

This indicates some of our requests are not being properly marked as done. My first guess would be sync-requests, as they don't use regular callbacks.
However, this is unlikely to be the cause of your hang.

To debug it more cleanly, you can try manually sending an interrupt request just for that ID, and then see what happens on the Clojure side.

@bbatsov bbatsov modified the milestone: v0.11 Dec 11, 2015
@Malabarba Malabarba modified the milestones: v0.11, v0.13.0 Apr 15, 2016
@bbatsov bbatsov removed this from the v0.13.0 milestone Jun 18, 2018
@bbatsov
Copy link
Member

bbatsov commented May 12, 2019

I can no longer reproduce this. Probably the changes to the evaluation model in nREPL fixed this.

@bbatsov bbatsov closed this as completed May 12, 2019
@alexander-yakushev
Copy link
Member Author

Haven't experienced this later as well. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants