Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Client timeout refactoring #2768

Closed
asvetlov opened this issue Feb 26, 2018 · 31 comments
Closed

Client timeout refactoring #2768

asvetlov opened this issue Feb 26, 2018 · 31 comments
Labels
Milestone

Comments

@asvetlov
Copy link
Member

There are many different timeout strategies for waiting a response from client:

  • uber timeout for total elapsed seconds
  • timeout for waiting for available connection from a pool
  • connection timeout for waiting a response from a new connection to server
  • timeout for getting HTTP headers from response
  • timeout for every HTTP body chunk read
  • maybe you can add your case

As the solution I propose collapsing all timeouts into the single timeout class.
E.g. Timeout.after(30) is an uber timeout, Timeout.await_pool(5) is a timeout for waiting for connection pool. Timeout.after(30).await_pool(5) composes both.

Do you like the idea? Objections?

@samuelcolvin
Copy link
Member

samuelcolvin commented Feb 27, 2018 via email

@asvetlov
Copy link
Member Author

Are you talking about keyword arguments in Timeout constructor? I don't know, let's see how things will go.
What I definitely don't want is adding all these settings as keyword arguments to client API. Should be the only timeout which contains all required information internally.

@hubo1016
Copy link
Contributor

When a number is given, It should be converted to uber timeout.

Keyword-arguments for constructor may be better. Though chaining is cool, for the real usage, many applications may use configuration dict to store the settings, so keyword arguments would be more convient.

Timeout object should work like dict, so a request timeout object can be merged into the default settings.

@asvetlov
Copy link
Member Author

No. Disabling a waiting for available connection is preferable in many cases, aiobotocore wants it.
@thehesiod please confirm if I understand your needs properly.
It means that uber timeout is not always applicable. Different timeout types are not always composable, sometimes they are mutually exclusive.
Dict-like idea just doesn't work.

@hubo1016
Copy link
Contributor

Maybe use False or a specified constant for "explicit disable"? Allow inheriting from default is always useful I think.

@asvetlov
Copy link
Member Author

I'm totally missed what default are you talking about and what inheriting means in this context?

@hubo1016
Copy link
Contributor

I think there should be a default timeout settings on the session? And for each request, a different timeout can be specified.

@asvetlov
Copy link
Member Author

Yes. Like now a client session has default timeouts which could be overridden by request.
But I want to shrink these values into opaque timeout object that could be passed both into ClientSession constructor and request methods.

@thehesiod
Copy link
Contributor

another possible timeout:

  • DNS resolution timeout (typically tied into "connect" timeout)

I'm not sure if the "send" of the headers blocks, if so I guess that could be a timeout, but more of a "write" timeout.

aiobotocore want's "normal" timeouts, that is timeouts should encapsulate actual network lag. So:

  • connect timeout is time to connect to endpoint (includes DNS resolve)
  • read timeout is waiting for any data from an endpoint

@pfreixes
Copy link
Contributor

Definitely, this is something that has appeared many times and needs to be tackled sooner than later. My 2cents on that:

  • Use kw arguments in the constructor is gonna pollute the ClientSession signature and/or the requests methods, taking into account the current timeouts and future timeouts that might be implemented. I will go for a specific class object.

  • How are we gonna deal with backward compatibility?

Will be appreciated a snippet showing the use case.

Regarding the

timeout for waiting for available connection from a pool

We could give a new option that would allow the developer to get notified via an Exception if and only if all connections are busy because the limit of the pool was reached.

@thehesiod
Copy link
Contributor

@pfreixes ya I think there are two categories, those that should be specifiable, and those that are traceable. For now we're only interested in tracing timeout for waiting for available connection from a pool.

@hubo1016
Copy link
Contributor

@asvetlov When we choose a timeout object for the session which contains uber timeout and connect timeout, and then start a request with a timeout object that cancels the uber timeout, usually we still want to keep the connect timeout. So timeout object should be able to merge.

I suggest that the timeout object has following interfaces:

  1. timeout object is always immutable. We may consider implement it from NamedTuple.
  2. Timeout(uber_timeout = None, *, connect_timeout = None, ...) constructor. Only the uber timeout can be specified as positional arguments, all other arguments are keyword-only. The default values are None. Accepted values are None / False / . None means "use default", False means "force disable", other numbers are timeout values.
  3. copy(**kwargs) Copy an existing timeout object and update the timeout values. Equivalent to convert the timeout to a dict, merge the dict with kwargs and use the dict to call Timeout(**args).
  4. merge(timeout) Copy an existing timeout object and merge the settings from another timeout object. Only False/ are merged, None values are ignored. Equivalent to .copy(**timeout.todict())
  5. todict() Convert the timeout object to a dict containing settings that are not None. Can be used to implement repr and for logging.

NOTICE: copy(**kwargs) is different from merge(Timeout(**kwargs)). Timeout(connect_timeout=1).copy(connect_timeout=None) returns a timeout object with connect_timeout=None, but Timeout(connect_timeout=1).merge(Timeout(connect_timeout=None)) returns a timeout object with connect_timeout=1.

@asvetlov
Copy link
Member Author

I dislike timeout merging and dict exposing, it brings unnecessary complexity.

@asvetlov
Copy link
Member Author

@pfreixes I don't see backward compatibility problems. For transition period a timeout is constructed from old-styled parameters if not provided explicitly.

@asvetlov
Copy link
Member Author

@thehesiod tracing is very reach word with many contexts.
What do you mean be tracing timeout for waiting for available connection from a pool?

@thehesiod
Copy link
Contributor

@asvetlov sorry I didn't quite understand your message, but to clarify, in regards to the subject of the time to acquire a connector from the pool, I'm only interested in that time for tracing (ddtrace, aws-xray, etc), I don't need to specify a timeout for that, nor do I want it included in other fine-grained timeouts. Hope that helps

@asvetlov
Copy link
Member Author

In my mind tracing is not related to timeouts problem discussed here.
You should use client tracing API http://aiohttp.readthedocs.io/en/stable/tracing_reference.html to measure different request lifecycle times.

@pfreixes
Copy link
Contributor

pfreixes commented Feb 28, 2018 via email

@N-Coder
Copy link

N-Coder commented Mar 14, 2018

I'd really love to see more fine granular timeouts implemented, especially for not enforcing a timeout while waiting for a pooled connection, but only for actually establishing the TCP connection. This issue also seems to supersede a few other open issues, including #2648 and #2538.

I do see a strong relation to the aiohttp tracing functionality as both tracing and timeouts hook into certain points of the request lifecycle. I think it might help to use the new block diagrams in the trace docs to discuss and later document which (groups of) states / transitions are counted towards which timeouts (see attached example for the timeout for establishing a new connection).
timeouts

@asvetlov
Copy link
Member Author

To be clear: I have no estimation when I can find a time for the issue personally.
If somebody want to pick it up and make a PR -- you are welcome.

@thehesiod
Copy link
Contributor

@N-Coder ya that's why I brought it up. If I get any chance I'll try to come up with something.

@thehesiod
Copy link
Contributor

btw @N-Coder want to update your diagram for "read" timeouts as well? I'm not sure if that accounts for the header read, or just the body read(s). There's also redirects if you want to capture that as well from "request" to response"

@N-Coder
Copy link

N-Coder commented Mar 14, 2018

That was just one example on how one possible timeout could be visualised. ;)

Just for reference here's how requests handles basic and advanced timeouts.
In a nutshell, requests only knows two types of timeouts:

The connect timeout is the number of seconds Requests will wait for your client to establish a connection to a remote machine (corresponding to the connect()) call on the socket. It's a good practice to set connect timeouts to slightly larger than a multiple of 3, which is the default TCP packet retransmission window.

Once your client has connected to the server and sent the HTTP request, the read timeout is the number of seconds the client will wait for the server to send a response. (Specifically, it's the number of seconds that the client will wait between bytes sent from the server. In 99.9% of cases, this is the time before the server sends the first byte).

So to summarize all options on timeouts I found up to now in this thread (excluding anything on how the timeout might actually be specified in code):

# timeout description affected states
1. uber timeout for total elapsed seconds, current (read) timeout from request start to end
2. pool queue timeout for waiting for an available connection from the pool from queued_start to queued_end
3. connection create timeout for waiting for connection establishment from create_start to create_end
4. DNS resolution timeout probably only relevant on cache_miss, as a hit should take no time from resolve_start to resolve_end
5. socket connect timeout for plain TCP connection establishment, requests' connect timeout for sock_connect
6. connection acquiring timeout current connection timeout, for whole connection acquiring process from connection acquiring begin to end
7. new connection timeout for waiting for first response from a new connection from sock_connect to first byte received (i.e. headers_received)
8. HTTP header timeout for getting HTTP headers of response after request was sent from before headers_sent (or after last chunk_sent?) to headers_received
9. read timeout requests' read timeout for detecting slow server / connection after headers_sent and between each byte received from the server
10. response body timeout for limiting the time it takes to receive the response body from after headers_received, around all chunk_received, until end

Here are my comments:

  • Do we really need timeout 6? It's just the sum of timeouts 2 and potentially 3.
  • Timeout 2 may be 0 seconds to never wait if the pool is empty and directly raise a specific exception.
  • Timeout 10 might also be specified in relation to the number of bytes received, to abort downloads taking ages over a (suddenly) slow connection, but to still allow huge downloads over a normal connection.
  • Some timeouts include others. We should think about the defaults in each individual case once the exact list of timeouts is settled.
  • Special care needs to be taken if the time required for sending data also should be taken into account.

I tried putting all the timeouts and states into a single diagram. Unfortunately, blockdiag doesn't allow a group / rectangle spanning multiple other groups, so I couldn't include all of them (which was the reason why I merged all diagrams in the first place). If you want to fiddle with it, you'll find the source code here.
aiohttp

@thehesiod
Copy link
Contributor

@N-Coder wow, thank you so much for that information! u rock!

for 6 I'm guessing it's useful if you just want to specify an encompassing timeout and don't want to break up the timeout between the sub-parts, so basically either you set 6, or the sub-parts, but not both?

btw for redirect, that could be to a different server so I'm assuming there are two branches?

@asvetlov
Copy link
Member Author

@N-Coder you are god for blockdiag.
I've split the picture into three blocks because was unable to merge everything into single picture.

Some timeouts include others. We should think about the defaults in each individual case once the exact list of timeouts is settled.

This is exactly my main trouble: how to respect all possible timeouts without falling into combinatoric mess?

@N-Coder
Copy link

N-Coder commented Mar 14, 2018

@N-Coder you are god for blockdiag.

Thanks! 😅
Some of the edges are collapsed together, which makes the diagram hard to read at some points (and for which reason I made some edges thick). Also, there seem to be weird encoding problems with the titles containing brackets and some of the labels are cut off. I couldn't find fixes for any of those problems, so we might need a little more advanced solution for the diagram if we want to make something official out of my version. Or at least I should clean up the code, it's quite messy....

for 6 I'm guessing it's useful if you just want to specify an encompassing timeout and don't want to break up the timeout between the sub-parts, so basically either you set 6, or the sub-parts, but not both?

You're right, it might be worth to just keep it for convenience's sake. There might be some weird cases (although I can't come up with one right now), where somebody wants to set any 2 (or all 3) of timeouts 6, 2 and 3. Additionally, I can't come up with anything that speaks against keeping 6.

This is exactly my main trouble: how to respect all possible timeouts without falling into combinatoric mess?

I'd say keep it simple and don't try to guess what the developer actually wants when setting any weird timeouts. Better make it really easy to understand which timeout affects what and don't involve any magic at all. I'd recommend only setting the two timeouts requests is also using (possibly to the same values) and leave the more advanced values to the more arcane use-cases, but without any complicated combinatoric interference.

@N-Coder
Copy link

N-Coder commented Mar 16, 2018

I just found #2310, #2537 and potentially also #2007 which are related to timeouts, but for the case of websockets. I'm not sure how far we should take websockets into account here, as at least the establishment of a connection takes at least one normal HTTP request and some of the timeouts for the websocket itself (e.g. a receive/read timeout for the TCP connection) might be similar.

As I need working timeouts for another project soon and already got kind of an idea how they could be implemented properly, I'll try implementing and example we can discuss.

@asvetlov
Copy link
Member Author

Let's put websocket timeouts out of the issue scope for simplicity.
We can work on websocket timeouts in a separate issue.

@N-Coder
Copy link

N-Coder commented Mar 16, 2018

I condensed my thoughts to code in PR #2842. Please don't see this as a complete PR ready for merging, it's just a draft intended as material for discussion. I had to make a PR to allow inline comments within the code to allow for easier reviewing. As soon as everything is settled, I'll make a proper, cleaned-up PR that makes the CI tools happy.

I'm looking forward to your comments, idea, opinions, objections,...

@asvetlov
Copy link
Member Author

Done by #2972

kornicameister referenced this issue in kornicameister/korni-stats-collector Jun 7, 2018
This PR updates [aiohttp](https://pypi.org/project/aiohttp) from **3.2.1** to **3.3.1**.



<details>
  <summary>Changelog</summary>
  
  
   ### 3.3.0
   ```
   ==================

Features
--------

- Raise ``ConnectionResetError`` instead of ``CancelledError`` on trying to
  write to a closed stream. (`2499 &lt;https://github.com/aio-libs/aiohttp/pull/2499&gt;`_)
- Implement ``ClientTimeout`` class and support socket read timeout. (`2768 &lt;https://github.com/aio-libs/aiohttp/pull/2768&gt;`_)
- Enable logging when ``aiohttp.web`` is used as a program (`2956 &lt;https://github.com/aio-libs/aiohttp/pull/2956&gt;`_)
- Add canonical property to resources (`2968 &lt;https://github.com/aio-libs/aiohttp/pull/2968&gt;`_)
- Forbid reading response BODY after release (`2983 &lt;https://github.com/aio-libs/aiohttp/pull/2983&gt;`_)
- Implement base protocol class to avoid a dependency from internal
  ``asyncio.streams.FlowControlMixin`` (`2986 &lt;https://github.com/aio-libs/aiohttp/pull/2986&gt;`_)
- Cythonize ``helpers.reify``, 5% boost on macro benchmark (`2995 &lt;https://github.com/aio-libs/aiohttp/pull/2995&gt;`_)
- Optimize HTTP parser (`3015 &lt;https://github.com/aio-libs/aiohttp/pull/3015&gt;`_)
- Implement ``runner.addresses`` property. (`3036 &lt;https://github.com/aio-libs/aiohttp/pull/3036&gt;`_)
- Use ``bytearray`` instead of a list of ``bytes`` in websocket reader. It
  improves websocket message reading a little. (`3039 &lt;https://github.com/aio-libs/aiohttp/pull/3039&gt;`_)
- Remove heartbeat on closing connection on keepalive timeout. The used hack
  violates HTTP protocol. (`3041 &lt;https://github.com/aio-libs/aiohttp/pull/3041&gt;`_)
- Limit websocket message size on reading to 4 MB by default. (`3045 &lt;https://github.com/aio-libs/aiohttp/pull/3045&gt;`_)


Bugfixes
--------

- Don&#39;t reuse a connection with the same URL but different proxy/TLS settings
  (`2981 &lt;https://github.com/aio-libs/aiohttp/pull/2981&gt;`_)
- When parsing the Forwarded header, the optional port number is now preserved.
  (`3009 &lt;https://github.com/aio-libs/aiohttp/pull/3009&gt;`_)


Improved Documentation
----------------------

- Make Change Log more visible in docs (`3029 &lt;https://github.com/aio-libs/aiohttp/pull/3029&gt;`_)
- Make style and grammar improvements on the FAQ page. (`3030 &lt;https://github.com/aio-libs/aiohttp/pull/3030&gt;`_)
- Document that signal handlers should be async functions since aiohttp 3.0
  (`3032 &lt;https://github.com/aio-libs/aiohttp/pull/3032&gt;`_)


Deprecations and Removals
-------------------------

- Deprecate custom application&#39;s router. (`3021 &lt;https://github.com/aio-libs/aiohttp/pull/3021&gt;`_)


Misc
----

- 3008, 3011
   ```
   
  
</details>


 

<details>
  <summary>Links</summary>
  
  - PyPI: https://pypi.org/project/aiohttp
  - Changelog: https://pyup.io/changelogs/aiohttp/
  - Repo: https://github.com/aio-libs/aiohttp
</details>
@lock
Copy link

lock bot commented Oct 28, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a [new issue] for related bugs.
If you feel like there's important points made in this discussion, please include those exceprts into that [new issue].
[new issue]: https://github.com/aio-libs/aiohttp/issues/new

@lock lock bot added the outdated label Oct 28, 2019
@lock lock bot locked as resolved and limited conversation to collaborators Oct 28, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

6 participants