-
Notifications
You must be signed in to change notification settings - Fork 190
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GET and POST behavior w.r.t. utf-8 decoding errors #161
Comments
I'd be happy with a PR against master for solution number 1. There are some changes that were made to fix #149 so I wonder how that changes things with regards to your possible solution. |
In comments on #198, I wrote
To which, @mmerickel responded
Moving discussion here, since there is a bit more context here. |
@mmerickel Okay, I was proceeding base on the comment from @bertjwregeer, above — I'll abort for now, until there is a clear consensus. |
I think that #115 is still important, but I am not sure that accessing the request.GET should raise an error. I would love to have @mmerickel's input on this, to see what he thinks. |
As a bit of an aside, looking at the charset decoding of RFC2388 (which describes Preliminary testing with google chrome, however, seems to indicate the chrome simply encodes non-ascii control names to bytes using the encoding specified by the Also of note, |
My linux test server has been getting a lot of these kinds of requests lately...
I'm debating whether to use @Gijutsu's sanitization approach. |
Are these decoding errors always indicative of a badly formatted request? I get regular unicode decode tracebacks from |
It is indicative of the remote sending content in a non-UTF8 format. Browsers send the data in the format of the page by default (UTF-8 if you set the charset for the page to UTF-8). Otherwise its up to the browser and the users settings IIRC. |
Looking over this issue again, I do think a property should be able to raise. Simply returning NoVars when clearly there are vars, just not ones we happen to like is a bad idea. It doesn't give the programmer a chance to let the user know they did something wrong. for |
This affects OpenStack: https://bugs.launchpad.net/neutron/+bug/1613901 |
We are having the same issue with some public Kinto instances running with Python3 (Refs #164) I am a bit puzzled to see that the Python2 and Python3 code are so different. |
They are different due to differences between what the WSGI environment provides and requires on Python 3 vs Python 2. https://www.python.org/dev/peps/pep-3333/#a-note-on-string-types This is the reason why the code is so different and why these differences exist between Python 2 and 3. Iff you can figure out a good way to bring the two back together and have the code be similar, I am all game. |
Thank you for the reference in the WSGI PEP. I think the pep is the root of our problem here when they talk about |
Not an active dev here, but as a consumer facing this issue option 1 would be ideal. Option 2 could result in unintended side-effect. In terms of option 3, it is not hard to add some sort of middleware to your application to test if a request can be encoded in utf-8 before proceeding, and throw a 400 otherwise, which is why I assume this issue hasn't been addressed yet. |
What are the next steps here? Can I help? |
We are hitting this issue (mostly from pen-testing as well), and the problem it is causing us is we can't ignore Could I suggest catching and raising a child of |
Could we just define a RequestDecodeError? Or do we want a type hierarchy or multiple types? I think the issues are in headers, url path, query string, and body and we could potentially identify them all separately or we could just call it a request decode error as they all indicate a client-side issue and we pretty much want to just return a 400.
One we decide on this api, someone just needs to pepper it around the code and add docs/tests. |
One exception class + a parameter holding the source of the problem (a string that’s one of `"headers", "path", "params", "query", "body") seems nice and clean to me. |
If someone wants to do that work, I'd accept it. @mmerickel's suggestion is the one I was working towards in my head as well. I'd prefer it to have unique exceptions with RequestDecodeError being the top-level. Not a fan of an exception which holds a parameter as a mechanism because you may want different handling if its a header that failed vs url params for example, and I don't like the idea of people writing code like it's However please do that work against the webob-ng (which is py3 only) branch I started (#390) as I would prefer not to port it later. I am not likely to accept this change against Python 2 at this time. I do plan on trying to get some work done on that PR over the next coming days to get it merged to master, so that will help everyone involved. |
If there is exception hierarchy, it would be still nice to have one attr giving the info out. |
See Pylons/webob#161 Recognized with Swift, when not proper encoded object names causing a HTTP 500 error
See Pylons/webob#161 Recognized with Swift, when not proper encoded object names causing a HTTP 500 error
See Pylons/webob#161 Recognized with Swift, when not proper encoded object names causing a HTTP 500 error Co-authored-by: Arno Uhlig <[email protected]>
The way things stand
Current behavior on badly encoded GET and POST params is
Request.GET
raisesUnicodeDecodeError
:Request.POST
essentially does the utf-8 decoding witherrors='replace'
:Behavior with
Content-Type: multipart/form-data
is similar. [Edit: actually if the bad bytes are in the body of one of the subparts thenUnicodeDecodeError
is raised. If the bad bytes are in the headers of the subparts, they are decoded witherrors='replace'
.]Gripes
request.GET
raiseUnicodeDecodeError
is inconvenient. If one doesn't want way too many entries in ones exception log when a pentester is set loose on ones site, one must check for errors from everyrequest.GET
(orrequest.params
). Also it seems, IMO, bad form — or unexpected, at least — for a property to raise aUnicodeDecodeError
.Request.GET
andrequest.POST
should (IMO) behave similarly w.r.t. how they handle improperly encoded characters.Possible Solutions
request.GET
andrequest.POST
so that they return aNoVars
instance on parameter decode errors. Thereason
attribute of the return value would describe the decoding error. (This is similar to how it is suggested to handle nonmultipart/form-data
bodies in #149.)Request.GET
to useerrors='replace'
semantics when decoding (so that it no longer raisesUnicodeDecodeError
s and matches the behavior ofRequest.POST
.At the moment, I vote for option #1.
(I'm not quite sure, however, how easy it will be to implement for
POST
under python 2. Py3k'scgi.FieldStorage
has an expliciterrors
parameter to control how character decoding errors are handled. Python 2'sFieldStorage
appears to lack this control.)If there is consensus on what needs doing, I’d be happy to (attempt to) come up with a PR.
The text was updated successfully, but these errors were encountered: