-
Notifications
You must be signed in to change notification settings - Fork 653
win: cause uv_read_stop to immediately stop reading the stream #1377
Conversation
@vtjnash thanks for this. I'll review it as soon as I can. Others are also welcome to do so :-) |
@vtjnash is there any way to add a regression test for this? It'd simplify understanding of the problem... |
Here's the start of a simple test: uv_loop_t* loop = uv_default_loop();
uv_pipe_t read, write;
uv_pipe_init(loop, &read, UV_PIPE_READABLE | UV_PIPE_SPAWN_SAFE); /* synchronous readable pipe end */
uv_pipe_init(loop, &write, UV_PIPE_WRITABLE | UV_PIPE_SPAWN_SAFE); /* synchronous writable pipe end */
uv_pipe_link(&read, &write); /* libuv equivalent of posix function `pipe` */
uv_read_start(&read, alloc_fn, read_fn);
Sleep(1000); /* wait for uv_read_start to start it's ReadFile thread */
//optional: uv_read_stop(read); /* necessary if some other process will be using this pipe */
uv_pipe_getsockname(&read); /* Windows kernel hangs here without patch */ Note: I'm using libuv fixes introduced in pending pull-request #451, so that I don't need to reimplement them when creating the read & write ends of the pipe. (it would be nice if someone could comment on that one too -- it's been over two years since we proposed that fix) |
I don't think we can have #451 merged before v0.12, alas. I know it has been quite some time, somehow nobody got around reviewing it, sorry :-/ Is there a way to test this with the current APIs? Note that since this is a Windows only test we could play with internals a bit if necessary ;-) |
Sure, you just need to call
No, I wouldn't have expected that anyways. I would just like to see it making progress (non-stylistic comments) towards gettings it merged. |
this implements locking around the blocking call to ReadFile to get around a Windows kernel bug where a blocking ReadFile operation on a stream can deadlock the thread. this allows uv_read_stop to immediately cancel a pending IO operation, and allows uv_pipe_getsockname to "pause" any pending read (from libuv) while it retrieves the sockname information if unsupported by the OS (pre-Vista), this reverts to the old (e.g. deadlock-prone) behavior ref. issue #1313
Is this expected to merge for v0.12? I added a test. The expectation is that this test will hang the process without this patch or Windows == XP (because of the call to |
@vtjnash Thanks for adding the test! I'll take a look today or during the weekend, but we should be able to land it for 0.12, I think. |
Quick comment: the test is a Windows only test, needs some ifdefs. |
I didn't get much time over the weekend, sorry. Will review tonight. |
@@ -443,7 +443,8 @@ RB_HEAD(uv_timer_tree_s, uv_timer_s); | |||
int queue_len; \ | |||
} pending_ipc_info; \ | |||
uv_write_t* non_overlapped_writes_tail; \ | |||
void* reserved; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please leave this one as the last field in the structure, IIRC we reserved it for fixing sending handles over threads during 0.12.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would have been good to have a comment to that effect when it was named "reserved"
I'd like to see some changes, in order to have this better integrated and avoid a lot of checking for the mutex:
Thanks! And sorry it took me so long to review this. |
@@ -181,6 +182,8 @@ int uv_pipe_write(uv_loop_t* loop, uv_write_t* req, uv_pipe_t* handle, | |||
int uv_pipe_write2(uv_loop_t* loop, uv_write_t* req, uv_pipe_t* handle, | |||
const uv_buf_t bufs[], unsigned int nbufs, uv_stream_t* send_handle, | |||
uv_write_cb cb); | |||
void uv__pipe_unlock_read(const uv_pipe_t* handle); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since these are internal, you can get rid of the const to vaoid casting in several places down below.
@@ -1846,7 +1934,8 @@ int uv_pipe_getsockname(const uv_pipe_t* handle, char* buf, size_t* len) { | |||
name_info = malloc(name_size); | |||
if (!name_info) { | |||
*len = 0; | |||
return UV_ENOMEM; | |||
err = UV_ENOMEM; | |||
goto error1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need, free(NULL) is guaranteed to be fine.
Had another round at it. This is just a couple of things away from getting merged: take out the cancling logic from |
It isn't canceling the read request, it is obtaining a lock on the kernel file lock via temporarily stopping the ReadFile call. When unlock is called, the ReadFile call is resumed (unless it detects that the READING flag is no longer set, then it just returns quietly) |
I see. Maybe we can find a better name: |
that's your call. I've considered also adding the following function, just so the effect of these three consecutive operations is more clear: void uv__pipe_stop_read(uv_pipe_t* handle) {
handle->flags &= ~UV_HANDLE_READING;
uv__pipe_lock_read((uv_pipe_t*)handle);
uv__pipe_unlock_read((uv_pipe_t*)handle);
} |
Looks good 👍 Lets call the other one |
would |
My initial idea (based on your code) was to have 3 functions: #1377 (comment) |
I object to this patch. The CancelSynchronousIo operation is not thread safe. |
Ouch. Windows internals are not my thing, do you have any suggestion on how this can be addressed? |
the function containing ReadFile is not permitted to return until it has reacquired the mutex and cleared the |
@piscisaureus to elaborate, the situation you describe is expected to be handled by the mutex object, such that it is not possible for an unrelated IO operations to be canceled. that is why this pull request is somewhat more complicated than simply calling there are 4 checkpoints in the code, to ensure that we never call
|
@piscisaureus ping |
@vtjnash I see now, sorry for not reading your code thourougly enough. I will not veto this patch or anything - it's really up to @saghul and @indutny. But I will make the case that this patch is not the way to go forward.
|
Thanks for your feedback, @piscisaureus. I think @vtjnash proposed that in the beginning and I rejected it, because sometimes I'm dumb like that :-S I'm really sorry to have wasted your time @vtjnash. In the end, caching the pipe name seems to be a good solution. We could store a pointer which we malloc when the pipe is bound/created, only if we are emulating IOCP. Then in |
When code that has been written ends up being thrown away, it usually hurts a little. However I would not say the time was wasted; we learned that a solution that uses CancelSynchronousIo can be implemented but it requires some serious hoop-jumping. Most of libuv code has been (re)written many times; the git commit log speaks of it, and in fact if you look in my private node fork you'll find even more history. This is normal and healthy. Throwing away questionable code, making things simpler, re-evaluating approaches all help make things better in the future. You learn more and more about the problem space while avoiding carrying your mistakes forward forever. I'm glad to see this happen. Forward! |
Thanks, @vtjnash, for the time you invested in making libuv better. |
You didn't reject it, I did, after testing that approach and finding that it only addresses one cause of the deadlock. If a pipe is only used internal to libuv, this patch would not be needed, since we could ensure that once we have started a read on the pipe we aren't interested in reading any of the properties of the pipe anymore (via any call to However, without this patch, spawning child processes with a stdin handle inherited from cygwin hangs if it tries to inspect the stdin handle (which was when I realized that having uv_read_stop to actually stop the read operation was the only viable solution).
This bug affects everyone, even projects that don't care about retrieving the pipe name, since
I really wish that Window's didn't force this complexity on us, but I can't find a way around it (e.g. a select function or a way to call ReadFile that doesn't hold the kernel lock). It's not even an entirely complete solution, since it depends on the user calling But without an alternative solution proposed, I can't accept that doing nothing is better than having a (complex) solution. |
But this patch doesn't fix that, as it can only cancel ReadFile operations that were started by libuv - not those that are started by other processes. If we would retrieve the information we need when opening the pipe, that would solve the same problem. |
true, but we can't retrieve that information as soon as the pipe is opened (because we may not have opened the pipe, it might just have been handed to us), and that's only part of the concern. while we can't control what other processes do, we can make libuv behave better. like you said, we can't control the actions of other processes – instead it requires diligence from every process (including libuv) to try to avoid this situation. |
@vtjnash I happened to meet Bert a couple of days ago and we discussed this issue. While it ultimately doesn't solve all problems, it does fix some, and even if the code looks quite complex, it doesn't penalize the usual execution path, since those functions are basically noops in that case. So, I'm landing it. Thanks a lot for your patches! |
Landed in ✨ 837c62c ✨ Thanks @vtjnash and @piscisaureus! |
Thank you |
this implements locking around the blocking call to ReadFile to get
around a Windows kernel bug where a blocking ReadFile operation on a
stream can deadlock the thread. this allows uv_read_stop to immediately
cancel a pending IO operation, and allows uv_pipe_getsockname to
"pause" any pending read (from libuv) while it retrieves the
sockname information
if unsupported by the OS (pre-Vista), this reverts to the old
(e.g. deadlock-prone) behavior
ref. issue #1313