-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Port concurrency features to Windows #6957
Comments
I was asking in the parent issue if |
|
@straight-shoota I'm not sure how this fits in with @ysbaddaden's work, and it shouldn't be neecesary to port fibers. Fibers on windows should be working with channels and merged into master before any of the nonblocking IO stuff is even thought about. They are seperate concerns. |
To be clear: I'm proposing But this very much depends on the scheduler design from @ysbaddaden which I haven't seen. I'm pretty in the dark on the design for multicore which @ysbaddaden is proposing and how that fits in with my and @bcardiff's work. This is why I propose not porting evented IO at all on windows yet. It may end up being counterproductive and chasing a moving target. |
If we can port fibers without event, that's totally fine by me. Then we just need to further refactor |
@straight-shoota you can stub it out for now, the only connection between the scheduler and the event loop is I advise you to do these tweaks via |
Some notes:
I have a few more changes pending that I'd like to push:
|
I got it working so far. At least in theory. I still need to get the stack swap on win32... |
That doesn't work on windows, because you're not waiting for an FD to become readable or writable, you're waiting for a specific read or write IO to finish. You then need to resume the specific fiber that sent that IO.
Where is the blocking sleep when there's nothing to do performed then?
agreed, probably exit with a warning since with just fibers and channels it should never happen? I haven't proved it to myself but it seems logical.
They look like good changes. |
Oh, then the event loop is target specific. That's wonderful. I still wish we could try libevent (at least for timers and sockets), until we dig for arch specifics (IOCP, kqueue and epoll)
Threads will spin trying to steal fibers / run the event loop, then give up and park themselves, unless it's the last thread, which should run the event loop as blocking; along with a mechanism to wake parked threads when fibers are enqueued (i.e. mutex + condition variable). |
I do too, but unfortunately pipes are pretty essential to everything from basic process spawning to signal handing. (yes, windows has signals two)
ah, so blocking/nonblocking is optional, makes sense |
FWIW libevent does mention "IOCP" though with scant documentation, seemingly: https://stackoverflow.com/questions/8042796/libev-on-windows ... hmm might not be enough... |
@RX14 and @ysbaddaden Regarding the Windows event loop I think this is very interesting: https://github.com/piscisaureus/wepoll and just wanted to leave it here as a reference before I forget it. In Rust mio is the de facto standard event loop library and they very recently switched over to a solution inspired by this work. The are several advantages besides a familiar api, one of them beeing performance since the only other way I know of needs extra allocations on Windows due to IOCP requiring a read/write buffer for each event. It's at least worth considering if the alternative is to implement our own IOCP implementation since that will be a big task anyway. |
Wonder if we could just start with select+libevent and then move to IOCP...to save time. But then again maybe too painful to do everythinig twice...or those others might be interesting as well. |
wepoll is limited only to sockets which limits it to being just an optimization once there's an eventloop architecture which can handle the IOCP model. I'd rather make something that works for the most general case of readable/writable handles then optimize it for sockets later, instead of make something which works for sockets then leave IO to block on every other kind of file until someone gets round to fixing it (which would mean refactoring the event loop, which probably means nobody will get around to it which means hell) |
You're right, it's unfortunately only useful for sockets it seems. On investigating this closer I also realized that the use of |
@rdp I think designing the event loop architecture with IOCP in mind from the start is the right thing to do. I would actually consider designing it with IOCP in mind first, getting the readiness based models like kqueue and epoll to work with that is probably easier than the other way around. However, everything is possible. |
@cfsamson yeah, I thought getting epoll to work like IOCB is easier than the other way around too after all the "you need to allocate less buffers when using epoll" argument is moot when using crystal's IO model: you need to allocate them anyway since it emulates blocking IO with greenthreads. Windows' IO model is essentially submitting a buffer and the OS tells you when it's done filling it with data and how much. This is easily mapped to Crystal, and epoll is easily mapped to that (we already do it, just at a higher layer). |
@RX14 I'm going out on a limb here partly since it might contribute to the discussion, and partly out of curiosity. I made an extremely simplified model to just plot down how something like this could work (if the plan is to abstract at a higher level like sockets/files/pipes to hide the implementation details). If I understand you correctly the green thread model greatly simplifies the event loop implementation since you can easily prevent the buffer sent to IOCP from being touched while waiting for the event to complete and you will not have any "extra" allocations since this will be abstracted over in either case: I apologize in advance for simplyfying this so much that the code is not valid anything really and skipping a ton of complexity. |
@cfsamson note that all reads go through the IO primitive |
libevent supports threaded IOCP (practically the only examples I could find here: https://github.com/libevent/libevent/blob/master/event_iocp.c https://github.com/libevent/libevent/blob/master/sample/http-server.c https://github.com/libevent/libevent/blob/master/test/regress_iocp.c#L304 (the Also interesting is that https://github.com/libevent/libevent/blob/master/event_iocp.c (the entire "libevent IOCP implementation") isn't that long, maybe a good pattern. go's is seems reasonably small too: https://golang.org/src/runtime/netpoll_windows.go?h=iocp and https://golang.org/src/net/fd_windows.go#205 line 205 FWIW. Maybe that's all :) libuv also supports IOCP but...I could hardly see any examples anywhere...also it seems libuv requires one "event loop" per thread, wasn't sure how that lined up with crystal's current use of libevent... |
I've been thinking about this quite a lot (since I'm investigating something related). Creating our own event queue is doable. It's a handfull of syscalls to use on linux/bsd/windows, but this is only part of the problem and we should consider the next steps as well. Here are some of the questions I think needs some discussion and my initial thoughts as well: How do we run the event queue?We can implement a simple How to register events?The next part is how we register events. My initial thoughts here are that we implement a DNS lookup and File I/OSince these are most often cached by the OS (and have poor cross platform API's AFAIK) these are most often sent to a thread pool. Anyway, I think we need a cross platform thread pool up and running as well to be able to actually use this in i.e. a web server. How to wake a green threadThis is a bit tricky I think due to synchronization issues and performance. If we wont to avoid actively polling a queue we need a way to interact with the scheduler from the reactor thread. I don't know how well the current I'm just putting this thoughts here for now to see if it can contribute to a constructive discussion. |
The interface would be to submit a file descriptor for a read/write to the scheduler, and the scheduler would resume your fiber when it's done. The rest is a platform-specific black box. On existing platforms it'd use the same (refactored) libevent code it always has, just moved out of IIRC with IOCP you can register a void* of data with your read/write, this would simply be the That's why I proposed the custom event loop for windows - the only hard part is handling the sleep events. |
Oh, I see. Yes, you associate a token (or a pointer) when registering a resource with the completion port in You still need to actually do the blocking wait for events in a separate thread so that would be the Sleep events is tricky. I've tried something like that before and kept an ordered queue of timers (I used a Every blocking call to I don't know if there is a better way to do this since it's not a pretty solution. There probably is. |
Ah yes, I'd forgotten the details. This is exactly the same as we currently have in libevent's implementation then, where we register one
The interface is
This is all abstracted a bit above the event loop in crystal, at the scheduler level. Event loops are per-thread, not per-program, and one thread can only have timers registered on it's event loop from that thread, meaning you never have to deal with the case of a timer being registered while you're sleeping. Fibers can then be passed between threads by pipes (which generate a read event). I'm glad this is solvable if that situation changes though. Might want to look through |
That actually simplifies things even more. I'll have to get to know the I would start by adding bindings for the relevant syscalls and provide some wrappers around them to make them easier to use. I suggest that I'll see if I have some time after my current project is done and see if I can help progress this. Is there a Edit:The above suggestion of using the Judging by the BOOST ASIO implementation they seem to not use Instead they wrap the |
I think this is fine, given that there's only one I can't see where windows lets you see whether an event was a read or a write completing though...
Actually, since we're compiling for x86-64, everything on windows uses the microsoft x64 calling convention, which is not fastcall. This is WinAPI and crystal functions themselves. So on 64bit windows, there is only one calling convention which makes this all simple. If we ever port to 32bit windows we might have to sort this out. |
I think this is exactly why they wrap |
If it works, it's more flexible, and it's what everyone else does, this is just fine to me! |
I just checked mio (a Rust implementation of epoll/kqueue/iocp event queue) and it does (well, did since they switched to wepoll recently) the same as I explained with regards to wrapping the It seems to be the a pretty normal technique. |
yeah I prefer this too now I know about it. |
I've done the plumbing work to get the IOCP functions wrapped into Crystal. I am somewhat familiar with win32 APIs but I've never worked on an event loop. @cfsamson Would you want to work together on this? I've been reading some of your stuff here to get up to speed. |
@incognitorobito Great! I've been wanting to take this on but have had (and still have) a limited bandwidth. If you can take lead on this I'll try to help push this forward. Great that you found that book. The event loop here should be pretty simple. If I remember correctly We'll need to wrap |
I put together a basic implementation in #9957. Does it line up with what was discussed here? |
#9957 is great. I tried it several months ago and it worked for small scripts. However tests failed with access violation randomly. I abandoned my code. By the way, IOCP requires handles with
So event loop for Windows should support I/O without the overlapped flag. Just an idea to resolve 1 and 2:
As for 3, two ideas.
The second:
|
@kubo Right now I would let file I/O be blocking. Most implementations I've seen uses a threadpool for file I/O (e.g. libuv) but with the advent of io_uring this might change. Since most OS cache frequently accessed files the performance impacts of leaving it blocking might not be that big depending on the concrete use case (for some uses it might be faster since you don't involve a lot of machinery to serve a cached file). However, I see that this might be insufficient for a long term solution, but IMHO we should focus on getting every other piece working first. An interesting article about the subject can be found here. |
@cfsamson |
I implemented experimental event loop support of I checked the behavior by the following code. puts("Hit enter to exit.")
spawn do
loop do
sleep(1)
print('.')
end
end
gets It prints "Hit enter to exit" only with #9957. |
#11647 is merged, but leaving this open since the comments about asynchronous file I/O here may prove valuable |
This is a sub-task of porting the stdlib to Windows #5430
Fiber
(Extract platform-specifics of Fiber, Thread and EventLoop #6955)Crystal::System::Fiber
Crystal::Event
,Crystal::EventLoop
(Windows: Event loop based on IOCP #12149)Thread
,Thread::Mutex
,GC
)] (Implement multithreading primitives on Windows #11647)I suppose we can delay porting threads by implementing a mock API for
Thread
andThread::Mutex
for win32 which essentially doesn't to anything. That should work perfectly fine for single threaded process.On windows, we should use the win32 API directly instead of libevent (quoting @RX14):
Since the API models are quite different, this will likely require some refactoring of
Crystal::Event
andCrystal::EventLoop
.The text was updated successfully, but these errors were encountered: