Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Disable pipelining to alleviate race conditions #732

Closed
wants to merge 1 commit into from

Conversation

c42f
Copy link
Contributor

@c42f c42f commented Jul 20, 2021

Some problems

As noted in #517 we've encountered hangs using the HTTP.jl client in production. In #517 (comment) I've found several potential concurrency problems with the current ConnectionPool:

  • It seems both closeread(::Transaction) and closewrite(::Transaction) call release(::Connection) which returns the connection to the pool. So the connection will be returned to the pool multiple times per request-response. In addition, closewrite ends up being called twice as part of request(::Type{StreamLayer{<:}}) - both in the body of that function and also in writebody(). So a given connection will be returned to the pool's channel up to three times. This seems weird to me and potentially broken.

  • reuse_limit is not respected because it checks readcount() rather than the sequence number. But readcount isn't incremented immediately on assigning a transaction to a connection, but only later when closeread is called.

  • It seems that isvalid() is happy to close connection streams based on various conditions (timeout and reuse counts) when the stream isn't currently reading/writing. But depending on the details of locking, a connection could already have had pending transactions assigned to it which have not yet started reading or writing. It seems easy for race conditions to creep in here.

What to do?

Removing release() from closeread() disables HTTP/1.1 pipelining and seems to fix the issues at #517. This PR includes that trivial change as a strawman (and also a couple of other related concurrency fixes).

Rather than trying to fix pipelining properly, perhaps we should consider consider removing it. It seems that major HTTP implementations have deemed it not worthwhile in practice due to:

  • Broken proxies
  • The difficulty of correct implementation, especially for request cancellation / timeout
  • Questionable performance improvements due to head-of-line blocking
  • Effort better spent in "doing it right" using HTTP/2 multiplexing

For example,

@codecov-commenter
Copy link

codecov-commenter commented Jul 20, 2021

Codecov Report

Merging #732 (9529e30) into master (ed987aa) will decrease coverage by 0.21%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #732      +/-   ##
==========================================
- Coverage   76.88%   76.66%   -0.22%     
==========================================
  Files          37       37              
  Lines        2444     2443       -1     
==========================================
- Hits         1879     1873       -6     
- Misses        565      570       +5     
Impacted Files Coverage Δ
src/IODebug.jl 0.00% <ø> (ø)
src/StreamRequest.jl 95.45% <ø> (ø)
src/ConnectionPool.jl 79.43% <100.00%> (-1.31%) ⬇️
src/Servers.jl 85.71% <0.00%> (-0.63%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ed987aa...9529e30. Read the comment docs.

@StefanKarpinski
Copy link
Contributor

Seems ok to me to remove pipelining support given that others aren't doing it.

@quinnj
Copy link
Member

quinnj commented Jul 21, 2021

Yeah, pipeline support has always seemed a bit superfluous to me, especially given the work to switch those kinds of workflows over to http2, which is better in several respects. I'm in favor of removing support for it in HTTP.jl.

@SimonDanisch
Copy link
Contributor

It would be amazing to fix this race condition! Anything we can do to help merge this?

@c42f
Copy link
Contributor Author

c42f commented Aug 10, 2021

It would be amazing to fix this race condition! Anything we can do to help merge this?

The main work to do here is to convince ourselves that this is correct. Unfortunately this really requires a deep dive into both the HTTP standard and the HTTP.jl code so I've been working on this in the last few days. Once I've got a more complete fix it would be amazing to have people help test or review it.

The current PR is more of a hacky workaround than anything — looking into the surrounding code, I haven't been able to convince myself that the connection pool is correct for concurrent/parallel use without larger changes. I'm pursuing two separate lines of research/prototyping:

  • Looking at a larger rewrite of the connection pool with particular concern for parallelism and correctness
  • Trying to figure out whether we can get libcurl in here for clientside HTTP handling (presumably via Downloads.jl)

@quinnj
Copy link
Member

quinnj commented Aug 10, 2021

  • Looking at a larger rewrite of the connection pool with particular concern for parallelism and correctness

On this note, it may be helpful to see the code I split out a while ago and worked on packaging up stand-alone: https://github.com/JuliaServices/ConnectionPools.jl/blob/master/src/ConnectionPools.jl.

@c42f
Copy link
Contributor Author

c42f commented Aug 10, 2021

On this note, it may be helpful to see the code I split out a while ago and worked on packaging up stand-alone

Cool I'd just decided to do something similar. I think the minimal abstraction to extract here is the "Pod". What we currently call a "connection pool" is just a lock around a Dict of Pods; that part is super simple.

The hard part is the pod which I'd call ConcurrentResourcePool or some such (for HTTP.jl, the "resource" being the individual open socket which are pooled for a given pod). When you want a new resource you can get it from the pool (if available) or create it asynchronously. There's a limit on total resources which means resource recycling needs to be really watertight or you'll hang indefinitely at some point.

@c42f
Copy link
Contributor Author

c42f commented Sep 8, 2021

An update on this: I was persuaded to try using libcurl via Downloads.jl as a backend for HTTP.jl rather than fixing HTTP's connection and stream layers in pure Julia. This was only partly about addressing the immediate problem here; the other reason to go in this direction is to have access to a very mature HTTP client library.

The new backend is currently implemented in
https://github.com/JuliaComputing/HTTPDownloads.jl/

Currently I've implemented this by adding a new default layer, effectively replacing the connection and stream layers in the stack. This isn't very satisfying though — it feels to me that we should clean up and generalize our HTTP stack to make this supported in a more obvious way.

@quinnj
Copy link
Member

quinnj commented Sep 9, 2021

Could you expand on what you mean by "clean up and generalize our HTTP stack"? It's not clear to me what you think would make this cleaner/more obvious than just replacing those lower layers.

@c42f
Copy link
Contributor Author

c42f commented Sep 9, 2021

Good question. A few thoughts:

  • The current stack is purely type-based, you can't store any state in it. For example, it would be natural to allow a Downloads.Downloader() as part of the stack, or the HTTP ConnectionPool as part of the stack to make use of global state optional. The type-based stack also forces everything into keywords and settings like reuse_limit, idle_timeout — which are logically shared between connections — need to be re-passed to request() each time.
  • It's not possible to actually replace parts of the default stack, it's only possible to insert new layers with HTTP.Layers.insert_default!(). So currently HTTPDownloads uses a stack with ConnectionLayer and StreamLayer still in there, but they're bypassed. Kind of messy, I guess.
  • Because insert_default! is a bit limited, custom default layers feel a bit second class. Eg, HTTP.stack() keywords can't be used for those.

@fredrikekre fredrikekre added this to the 1.0 milestone Sep 29, 2021
quinnj added a commit that referenced this pull request Dec 3, 2021
Follows up on the discussion and work started in
#732.

The gist is this: supporting "pipelined requests" is not well-supported
across the web these days and severely complicates the threadsafe client
implementation in HTTP.jl. And as has been pointed out in various
discussions, the attempt to support pipelining is affecting our ability
to avoid thread-safety issues in general (see
#517).

This commit has a few key pieces:
  * Splits "connection pooling" logic into a new connectionpools.jl file
  * Removes pipeline-related fields/concepts from the ConnectionPool.jl
  file and specifically the `Connection`/`Transaction` data structures
  * Attempts to simplify a lot of the work/logic around managing a
  `Transaction`'s lifecycle

Things I haven't done yet:
  * Actually deprecated the `pipeline_limit` keyword argument
  * Got all tests passing; I think the websockets/server code isn't
  quite ironed out yet, but I'll dig into it some more tomorrow

Big thanks to @nickrobinson251 for help reviewing the connectionpools.jl
logic.

Pinging people to help review: @c42f, @vtjnash, @s2maki, @fredrikekre
quinnj added a commit that referenced this pull request Dec 14, 2021
* Remove client support for pipelined requests

Follows up on the discussion and work started in
#732.

The gist is this: supporting "pipelined requests" is not well-supported
across the web these days and severely complicates the threadsafe client
implementation in HTTP.jl. And as has been pointed out in various
discussions, the attempt to support pipelining is affecting our ability
to avoid thread-safety issues in general (see
#517).

This commit has a few key pieces:
  * Splits "connection pooling" logic into a new connectionpools.jl file
  * Removes pipeline-related fields/concepts from the ConnectionPool.jl
  file and specifically the `Connection`/`Transaction` data structures
  * Attempts to simplify a lot of the work/logic around managing a
  `Transaction`'s lifecycle

Things I haven't done yet:
  * Actually deprecated the `pipeline_limit` keyword argument
  * Got all tests passing; I think the websockets/server code isn't
  quite ironed out yet, but I'll dig into it some more tomorrow

Big thanks to @nickrobinson251 for help reviewing the connectionpools.jl
logic.

Pinging people to help review: @c42f, @vtjnash, @s2maki, @fredrikekre

* get more tests passing; remove Transaction

* Get all tests passing

* update Julai compat to 1.6

* fix 32-bit and docs

* fix 32-bit

* fix 32-bit

* Address review comments

* Added `GC.@preserve` annotations
* Removed calls to `hash` in `hashconn` and renamed to `connectionkey`
* Removed enforcement of `reuse_limit`
* Renamed some fields/variables for clarity
* Cleaned up comments for clarification in connectionpools.jl
* Added an additional `acquire` method that accepts an already created
connection object for `Pod` insertion/tracking
@fredrikekre
Copy link
Member

Superseded by #783

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants