-
-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify lock(DOWNLOAD_LOCK) #136
Conversation
Codecov Report
@@ Coverage Diff @@
## master #136 +/- ##
==========================================
- Coverage 92.24% 92.20% -0.05%
==========================================
Files 5 5
Lines 529 526 -3
==========================================
- Hits 488 485 -3
Misses 41 41
Continue to review full report at Codecov.
|
The message for the commit that added this code explains it:
Perhaps this should be added to the code as a comment. Or, another way to put it, the simplified version with the yield on the outside should be commented as The while loop elimination is probably fine. I was trying to make sure that the downloader object that we use is actually the one that's saved to the |
Thanks, I think I now understand the intention. I should've done a proper git archaeology. But I still don't understand if the Anyway, I moved back |
Perhaps but there isn't really a fixed maximum number of outstanding requests — it's the server that gets overloaded if you open too many concurrent requests and causing it to just drop some of them and the limit depends on the server. You also don't want to block until downloads are complete, you just want any events for in-progress downloads to be handled before starting a new download. The issue is that starting downloads happens in the main task and doesn't require yielding, but making progress at handling a download requires yielding. So if some code starts a lot of downloads without yielding then no progress is ever made on any of the downloads. There's a (unknown) limit to how many connections a server will leave in an outstanding state before dropping them. By yielding at the start of each download, you're making sure that before you start a new download you at least handle the events from the downloads you've already started, even if they're not done. That typically entails processing some data and sending another packet to the server to get more data or end the connection. As long as you keep replying to the server in a timely fashion, you can have a lot of concurrent connections going, but if you stop replying then they get dropped. So that's why we yield. Yielding outside the lock would address that but if there are a number of tasks starting downloads, then it allows other tasks to start downloads before this one, effectively reversing the order the downloads get started, but still not allowing download events to get handled, so that makes the problem worse. Yielding inside the lock prevents later downloads from starting since they don't have the lock, but still gives events from in-progress downloads to be handled, letting the downloads make progress before starting new ones. In practice, this throttles the number of concurrent connections to match what the server can handle. |
If you don't wait for downloads until compilation, I'm still puzzled how a single |
Yes this seems like the right idea. The default is 0 (unlimited). The HTTP RFC previously had a prescribed maximum, but now just says "A client ought to limit the number of simultaneous open connections that it maintains to a given server." (https://datatracker.ietf.org/doc/html/rfc7230#section-6.4) It looks like browsers have settled on order ~10 concurrent connections per host: Side note - there's an interesting interaction between multiple connections and curl's HTTP/2 multiplexing. To quote their docs:
|
By forcing new downloads to wait to start until events for all other downloads have been processed. Otherwise events related to ongoing downloads get queued by libcurl and may not get processed by Julia if there are no yield points. If we don't yield, Julia can just start more downloads, which causes more events to pile up and eventually you get dropped connections.
Maybe this is what's unclear: processing a request doesn't necessarily involve any Julia-side yield points because events are put directly into the libuv event loop by libcurl. The problem this addresses is those piling up without getting a chance to get processed. I'm open to other ways to do this but I determined this approach experimentally and we know it works in practice. I also tried this without the yield and it causes a lot of timed out connections. I also tried it with the yield before the lock and that also produced timed out connections. I also tried it with a max connection limit and that did help but not as reliably as the current approach — with some servers any particular fixed connection limit still can cause too many events to pile up, whereas the current approach seems to adjust well to different servers. So, we can change this but before we do, someone needs to do some serious testing with lots of concurrent requests and prove that it works as reliably as the current pattern does. |
I'm not suggesting throwing away a solution that is known to work in practice lightly. That's why I've already moved the
Thanks, this clarifies a lot. But then it sounds like calling |
I wanted to use official APIs and do something that works across all versions of Julia, but looking through the history, It might independently make sense to have a limit on the number of concurrent connections to each server, but in practice it has been fine to make this unlimited and just let the speed with which the client-server route can establish connections implicitly limit things. The reason for the dropped connections without letting libcurl events get processed is a little subtle and I may still not fully understand it correctly, but my understanding is as follows. It's not about overwhelming the server — nginx (for example) is quite capable of handling as many concurrent client connections as we can throw at it. The server gets our initial requests just fine no matter what and it replies. The libcurl callbacks get those responses and dispatches libuv events for each of them which then need to be processed by some Julia task. If we don't allow that to happen — and here's where I'm a little fuzzy on what exactly goes wrong — it causes connections to get dropped. I don't know if that's because libuv takes longer than libcurl allows to handle events, causing libcurl to decide the connection is dead, or if it's because we're too slow to respond to the server and the server decides to reset the delayed connections. But if we don't process libuv events before starting new connections, connections get dropped if we make enough concurrent connections. Limiting the number of concurrent connections that libcurl will start just papers over this, which is why, while I'm fine with it, I'm not fine with it as the only way to address this problem. But the |
Are you planning on continuing this, @tkf? Or should I make the change myself? |
I don't know how to invoke the original issue so I'm confident enough only with the current change of the PR (= just removing the (I was postponing the reply since I wanted to dig into how the Julia runtime usually calls |
I asked @JeffBezanson about this yesterday and he said that |
Now that we aren't interacting with the libuv event loop in the same way (#157), is this worth revisiting? |
I've modified this to not yield at all, which doesn't hang in tests and may be good now. Let's merge it, bump Downloads on Julia and see what happens. |
awesome! if it works, I will copy it for AWS.jl (since we use our own global downloader there too) 😄 |
This snippet was brought up in https://julialang.zulipchat.com/#narrow/stream/236830-concurrency/topic/Concurrency.20question.20in.20Downloads.2Ejl.
I only looked at the code mostly locally so I may be missing something, but, the
while
loop seems to be unnecessary since, in particular, no other part of the code setsDOWNLOADER[]
(aside from the tests). Also, I couldn't find the reason why callingyield
while holding a lock (it could just increase the number of tasks in the waiter list of the lock). So, I suggest calling it ouside.(Aside: This code style as-is decreases inferrability of the
downloader
variable. But I chose this style to match with other parts of the code, to emphasize the clarify.)