Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backwards Compatibility issue: remove chunked encoding support from WebOb (unless explicitly flagged by WSGI server) #279

Closed
digitalresistor opened this issue Sep 22, 2016 · 10 comments
Milestone

Comments

@digitalresistor
Copy link
Member

The problem with loosely defined specs

Unfortunately the WSGI spec does not have a good way to allow Chunked Encoding if you read the spec loosely. This is an issue because to support it you need to read all of the "should"'s as "must".

That would have been the correct thing to do anyway, but alas we are stuck with what we've got for now.

For example, to showcase the issue:

from wsgiref.simple_server import make_server

def hello_world_app(environ, start_response):
    print('before read...')
    environ['wsgi.input'].read()
    print('after read...')

    status = '200 OK'  # HTTP Status
    headers = [('Content-type', 'text/plain; charset=utf-8')]  # HTTP Headers
    start_response(status, headers)

    # The returned object is going to be printed
    return [b"Hello World"]

httpd = make_server('', 8000, hello_world_app)
print("Serving on port 8000...")

# Serve until process is killed
httpd.serve_forever()

With:

curl -X POST http://localhost:8000/

Will cause the server to hang forever as long as the client keeps the connection open.

  1. the WSGI PEP's specify that env['wsgi.input'] should be file like (i.e. have read/readline) (See PEP-3333)
  2. A server should allow read() to be called without an argument, and return the remainder of the client's input stream.
  3. A server should return empty bytestrings from any attempt to read from an empty or exhausted input stream.

See: https://www.python.org/dev/peps/pep-3333/#input-and-error-streams

However, wsgiref will pass the underlying file descriptor for the socket the client is connected to through to wsgi.input in the environ.

This means when you call .read() without a CONTENT_LENGTH, which may happen if the client is using Chunked Encoding (and WebOb tries its best to support that), then you and up hanging forever.

Good servers like waitress don't do this, and we can always call read() on wsgi.input without causing any issues, however it's not guaranteed.

Proposal

Choice number One (arguably the most correct choice)

Currently WebOb tries to be too smart for it's own good and that gets it into trouble...

I'd like to turn back the clock, remove any and all support for Chunked Encoding unless there is a flag in the environment that states it is supported. #278 is one such proposal. mod_wsgi has another way, potentially add some helper functions that find/validate the environment is right and then automatically enable Chunked Encoding. However if the server doesn't support it, then Chunked Encoding is completely dropped and the only way that WebOb will read from wsgi.input will be if there is a CONTENT_LENGTH set in the environment.

There have been previous reported issues with wsgiref's WSGI implementation:
#233, #116

Choice number Two

We completely ignore wsgiref, remove the limitations that were added in https://bitbucket.org/ianb/webob/issues/6 (ee7f027) and simply call .read() unconditionally if there is no CONTENT_LENGTH to limit us to.

I don't know of a good way of verifying if we are running under wsgiref or not to warn users of WebOb though.

Issues

Chunked encoding is apparently being used more and more by mobile clients, versus your run off the mill browser, but I imagine it has other uses too. While I am loathe to remove support for something, I do believe it should be better implemented. I'd like to allow a request body on all request methods (spec says it's allowed, just doesn't have any defined semantics) but I don't want to break WebOb on servers that don't implement PEP-3333 fully.

While wsgiref is the easiest to point out as being faulty, I am not sure choice number two is a good idea simple because there may be other WSGI servers that are not spec compliant...

I'd love feedback on this, and ideas on how to move this forward.


Related: #274, #233, #116, #278

@digitalresistor
Copy link
Member Author

/cc: @mmerickel @jd @mcdonc

@digitalresistor
Copy link
Member Author

There is already an issue open regarding wsgiref: https://bugs.python.org/issue21878

@digitalresistor
Copy link
Member Author

What I would like to do, to give control back to the user ultimately, is ship a piece of middleware that can easily be loaded into the users WSGI pipeline that tells WebOb it is safe to consume wsgi.input without worrying about Content-Length by placing a flag into the environ.

This way even if there is no way to check that a server supports read() on wsgi.input unconditionally users can enable it with the caveat that they are required to check their server for compatibility.

@mmerickel
Copy link
Member

ISTM that https://gist.github.com/mitsuhiko/5721547 is already implemented in werkzeug and the only issue is that no WSGI servers (I checked werkzeug, waitress, wsgiref, gevent, gunicorn, uwsgi) actually define wsgi.input_terminated right now. Is there a goal to get them to set this, or are users just expected to set this themselves if they know how their servers behave? That's probably fine for now and better than the current situation. We can then petition servers to support it. Obviously wsgi.* namespace is reserved by the PEP but............

So if I understand this correctly, if you update webob to depend on wsgi.input_terminated then any applications using a WSGI server that isn't supporting that option, and the request does not have CONTENT_LENGTH, then the body will appear empty to a user?

@digitalresistor
Copy link
Member Author

So if I understand this correctly, if you update webob to depend on wsgi.input_terminated then any applications using a WSGI server that isn't supporting that option, and the request does not have CONTENT_LENGTH, then the body will appear empty to a user?

Yes. Unfortunately due to PEP3333 not properly defining requirements the Content-Length is the only way to know that a request has a body. The only time a client doesn't send a Content-Length is if the request is chunked, in the case of a HTTP -> WSGI the HTTP server is expected to buffer the full chunked request and then pass the WSGI application the full buffered request and set the Content-Length into the environ.

Graham Dumpleton on Twitter suggested creating a wsgi.org proposal instead of just an ad-hoc wsgi.input_terminated, I think that is a good idea and one I will be pursuing. Then the key wsgiorg.input_terminated can be used.

My other proposal is to add a middleware to WebOb that sets wsgi.input_terminated so that users of (waitress, gunicorn, uWSGI, and others) that already do the right thing can explicitly tell WebOb's Request that it's okay to not be PEP3333 compliant in the way it checks for a body.

@mmerickel
Copy link
Member

To me, it seems like adding support for wsgi.input_terminated to webob is acceptable and adding some convenience middleware to enable it is good as well. I'm less concerned about the wsgiorg.input_terminated proposal because I've seen the insane bikesheds that occur on the web-sig about the simplest ideas so I wouldn't predicate any of your work on getting that accepted. I'd just prepare to support it as well if accepted.

@digitalresistor
Copy link
Member Author

Yeah, Graham said something similar about web-sig. I was going to just submit it, let others fight about it. Shipping code wins.

As a reminder to myself, need to make sure I support what Graham has in mod_wsgi as well, there is a flag that is set if the user turned on Chunked support.

@redixin
Copy link

redixin commented Oct 7, 2016

curl -X POST http://localhost:8000/

This request will send empty body (Content-Length: 0)

And according to this note in pep 3333

The server is not required to read past the client's specified Content-Length , and should simulate an end-of-file condition if the application attempts to read past that point. The application should not attempt to read more data than is specified by the CONTENT_LENGTH variable.

this line should return immediately:
environ['wsgi.input'].read()

It it hangs, then it is just a little but in wsgiref

@redixin
Copy link

redixin commented Oct 7, 2016

This means when you call .read() without a CONTENT_LENGTH, which may happen if the client is using Chunked Encoding (and WebOb tries its best to support that), then you and up hanging forever.

If client is using Chunked Encoding, there should be header "Transfer-Encoding: chunked" in request, and .read() will block until client send a chunk. What exactly wrong with it?

@digitalresistor
Copy link
Member Author

@redixin environ['wsgi.input'].read() returning immediately is NOT required by PEP3333 it is a "SHOULD" not a "MUST". wsgiref is thus perfectly within spec, unfortunately. This has been confirmed with Graham Dumpleton who is an authority figure on WSGI and implementations.

The way wsgiref is implemented it's wsgi.input is tied directly to the incoming TCP/IP socket.

Also, the WSGI spec does not allow for Hop-by-Hop headers, which is what Transfer-Encoding is...

Sorry, this feature has been removed, and I am not currently interested in re-visiting it, especially with all of the information that is available above!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants