Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Porting Socket to win32 #10696

Closed
1 of 3 tasks
straight-shoota opened this issue May 10, 2021 · 5 comments · Fixed by #10784
Closed
1 of 3 tasks

Porting Socket to win32 #10696

straight-shoota opened this issue May 10, 2021 · 5 comments · Fixed by #10784
Labels
platform:windows Windows support based on the MSVC toolchain / Win32 API status:discussion topic:stdlib:networking

Comments

@straight-shoota
Copy link
Member

straight-shoota commented May 10, 2021

The next big step with improving win32 support is getting the event loop running.

There is already a promising draft of a basic implementation at #9957. The most basic first step is #10605 which adds sleep support and swapping between concurrent fibers.

The full event loop implementation can expand on that by waiting on completion status messages from IOCP instead of just putting the thread to sleep.
But we need completion-port based IO for developing and testing event loop integration. The most obvious candidate for this is sockets. Sockets work well with overlapped IO and porting it is relatively simple (see #10610, #10650 for example). There is already a very extensive PR for porting Socket at #9544. This only uses non-overlapped (i.e. blocking) operations, however. So it's a step in the right direction, but does not reach far enough.

Without a working IOCP-based event loop, the overlapped IO is useless and vice versa.
In order to reduce the chicken-and-egg-problem, I think the best first step is to port Socket with the overlapped IO, but explicitly waiting on completion for every operation. This means the socket operations are essentially blocking, but internally they use the non-blocking API. Later we can transparently swap the wait routine for a delegation to the event loop.

The handling of overlapped IO is a bit more complex because it is different to the evented approach we have established in the unix implementation.

I think it is necessary to completely extract the system-specific implementation of Socket methods, similar to other system API implementations, like File, Dir or File. The product would be a Crystal::System::Socket module with system-specific implementations. The unix version includes include IO::Evented and defines integrations with the BSD sockets API. The win32 version defines integrations with the Winsock API, using overlapped operations. The overlapped boilerplate should probably be extracted to a module IO::Overlapped, which is similar to IO::Evented in purpose and only available on win32.

TODO List

  • Extract system-specifics from Socket to Crystal::System::Socket for unix (Extract system-specifics from Socket #10706)
  • Implement Crystal::System::Socket and IO::Overlapped for win32, using overlapped operations and mocked event loop integration
  • Implement event loop for win32
@straight-shoota
Copy link
Member Author

Btw. I have basic implementations of network operations in "overlapped + directly waiting"-mode already working in isolated examples. The big part is about figuring out how to put it together into the stdlib.

@straight-shoota straight-shoota added the platform:windows Windows support based on the MSVC toolchain / Win32 API label May 10, 2021
@straight-shoota
Copy link
Member Author

straight-shoota commented May 20, 2021

I discovered a minor annoyance: On POSIX platforms we can create a server socket, set it to listen, then connect with a client socket and accept on the server, all sequentially in the same fiber.

require "socket"

server = Socket.new
server.bind "127.0.0.1", 4242
server.listen

client = Socket.new
client.connect "127.0.0.1", 4242

socket = server.accept

client.puts "hello"
puts socket.gets

This doesn't seem to work on windows, though. The client's connect is successful, but the accept call gets stalled. At least with AcceptEx. And we need that for overlapped IO. WSAAccept works, but it's not overlapped and then the client send stalls. So it's no good either.

For practical applications, this probably doesn't really matter, so it's not a big problem.

But this sequence is used a lot in specs and it's really nice because it keeps the control flow very simple. We'll just have to refactor the specs to use separate fibers and can't really run them on win32 until the event loop is working (technically, it could be possible to jerry-rig some interleaving control flow for testing purposes, but I don't think that's worth it).

@straight-shoota
Copy link
Member Author

I got the basic IOCP-based event loop working. Unfortunately, IO operations between sockets in the same process continue to show issues, even when running in separate fibers (and therefore being scheduled independently). A concurrent TCP server works very well when the connections come from different processes.

I'm not sure if this is an inherent problem with IOCP, or if I am doing something wrong. Connecting sockets managed by the same process is probably not a very common use case (except for testing purposes). And it's hard to search for information on this.
But I expect this should really be possible, somehow.

I'll be able to share the implementation soon. It's still very much WIP and waiting for a couple of preliminary PRs being merged (#10726, #10605)

@kubo
Copy link
Contributor

kubo commented May 30, 2021

This doesn't seem to work on windows, though. The client's connect is successful, but the accept call gets stalled. At least with AcceptEx.

I made a small C program roughly corresponding to the crystal code in this comment. AcceptEx doesn't get stalled for me (Windows 10 version 21H1). I guess something in my code is different from yours.

The C program test_accept.c is here.

C:\test>cl /nologo test_accept.c
test_accept.c

C:\test>test_accept 0
Use normal accept()
recv: hello

C:\test>test_accept 1
Use AcceptEx()
recv: hello

When the fourth argument of AcceptEx isn't zero, it gets stalled.

C:\test> test_accept 2
Use AcceptEx() with a data buffer
GetQueuedCompletionStatus() failed: WAIT_TIMEOUT

@straight-shoota
Copy link
Member Author

Oh damn, your last comment was very helpful to point me to my mistake. I calculate the buffer size automatically, and at some point, while fiddling with stuff, I must have increased the initialized buffer size by a fixed amount. That resulted in a non-zero value for the data length. 🤷 Somehow I didn't even notice that in the isolated examples, probably because the clients always sent some data after connecting 😆

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
platform:windows Windows support based on the MSVC toolchain / Win32 API status:discussion topic:stdlib:networking
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants