-
-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dream server process consumes 100% CPU after websocket client is closed #230
Comments
We tried out using websockets on ocaml-ci with 1.0.0~alpha5 and are running into the same issue. |
I will take a look shortly, thank you! |
This appears to be an issue upstream in Websocket/af and/or Gluten. I used GDB to get a backtrace and added two prints before calls to the deepest functions that were in Dream, Websocket/af, or Gluten. In Dream's diff --git a/src/http/shared/websocket.ml b/src/http/shared/websocket.ml
index 358d577..eec9b6e 100644
--- a/src/http/shared/websocket.ml
+++ b/src/http/shared/websocket.ml
@@ -196,6 +196,7 @@ let websocket_handler stream socket =
if !closed then
close !close_code
else begin
+ prerr_endline "Dream websocket.ml: calling Websocketaf.Wsd.schedule";
Websocketaf.Wsd.schedule socket ~kind buffer ~off:offset ~len:length;
bytes_since_flush := !bytes_since_flush + length;
if !bytes_since_flush >= 4096 then This print is triggered a finite number of times before entering the infinite loop. In diff --git a/lwt/gluten_lwt.ml b/lwt/gluten_lwt.ml
index 661b50a..6037f9a 100644
--- a/lwt/gluten_lwt.ml
+++ b/lwt/gluten_lwt.ml
@@ -39,6 +39,8 @@ open Lwt.Infix
module Buffer = Gluten.Buffer
include Gluten_lwt_intf
+let counter = ref 0
+
module IO_loop = struct
let start :
type t fd.
@@ -85,6 +87,8 @@ module IO_loop = struct
let rec write_loop_step () =
match Runtime.next_write_operation t with
| `Write io_vectors ->
+ Printf.eprintf "Gluten_lwt: writev_io_vectors: %i\n%!" !counter;
+ incr counter;
writev io_vectors >>= fun result ->
Runtime.report_write_result t result;
write_loop_step () This second print in Gluten is triggered indefinitely, running up the counter. The output looks as below. The two requests are me opening a browser tab to @anmonteiro, would you be able to comment on this?
|
FYI this issue and related issues in websocket/af are also apparently the cause of the playground repeatedly stopping (it hangs with 100% CPU, backtraces and |
I ran into the same problem. If I close the websocket from the server, it seems to work fine. But if I open a websocket, and then move away from the page so that the browser closes it from the client side, the server process starts consuming 100% CPU after about 20 seconds. I don't have to use (send on) the socket for this to happen. Output from testing is here: Since the logs show that read() in websocket.ml get a close notification when client closes the connection, I tried having a receive 'monitor' started when the client connects. When it get a 'None', it calls to close the socket. This seems to prevent the situation that it runs into the CPU issue. I think I can use it as a workaround for now, but I hope a proper fix for this bug will be found. |
Thank you @hansole! |
The read trick only works when the lost connection get signaled by the OS. If the server is running on a remote node and the net between them goes down, it still runs into the 100% CPU loop. I have been investigating some more, and it seems to be stuck in a loop where it tries to flush some write buffers while the underlying connection is closed, so it can not make any progress. https://github.com/anmonteiro/gluten/blob/166e1e917710e1e43b04d33a368b6701a9f8b1f5/lwt/gluten_lwt.ml#L94-L99 I don't have complete understanding of the interaction between the different layers involved, so I might be wrong. But to me, it seems some state is missing here. If close get called from the server, it sets the state to "closed" and then it flushes buffers and exit the write loop and closes the underlying connection. If the socket get closed from the OS, it gets returned that it is closed. It still tries to flush the buffers which has no effect, and it ends up in a loop. To me it seems like the call to close from the server application should not set state to "closed", but "closing". Once the buffers are flushed, then it can move the state to "closed" and terminate the loop. It it get closed by the OS, then the state should move directly to "closed", and it should not make any attempt to empty buffers. This is my understanding of the problem, and I might be wrong on how this actually works. |
The following change to gluten prevents it from going into the loop. It still seems to be a problem if the network, between server and client. is cut before calling close. Even when calling it from the application, calling @aantron. is the ping-pong protocol supported from websocket? It seems to be possible to call it on stream, but not on websocket. Dream is using "vendor" for some libraries. I was not able to figure out how this "vendor" patches are specified and fetched, I have tried to find documentation for this, but no luck. Do you know where to find documentation? |
@hansole Thanks for continuing to look into this! Ping and pong are supportd by the Dream API and by the latest forked websocket/af. The versions used by Dream are git submodules listed here. For websocket/af, Dream is using @anmonteiro's fork, but I patched it in a minimal way to rename the modules from |
The way I typically debug Dream together with these upstream libraries is that after |
The part that fetches these patches is the |
And it's briefly documented in Contributing in the README, though not with this much detail. |
this should have been fixed in anmonteiro/httpun-ws#73. There are other fixes in httpun-ws recently, too, which may be of interest to dream, including #214 |
I still experience this issue (e.g. using the repro example provided by #230 (comment), using websocat or wscat as a client) using dream 1.0.0~alpha8 and httpun-ws 0.2.0. |
I'm experimenting with websockets in Dream.
With this code, I've observed problematic behavior on version 1.0.0~alpha4 (and OCaml 4.14.0, Ubuntu 20.04), installed via opam. Initially, the program behaves as expected, with integers printed to the browser's JS console. I expect that once the browser tab is closed, Dream.send would raise an exception. Instead, the server process starts consuming 100% CPU.
The text was updated successfully, but these errors were encountered: