Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: content_type/charset handling #261

Merged
merged 40 commits into from
Jul 30, 2016
Merged

Conversation

digitalresistor
Copy link
Member

@digitalresistor digitalresistor commented Jul 9, 2016

  • Response.content_type removes charset unless the new content_type is a text like type that has a charset parameter
  • Reverted change that was made in Improve charset defaults in Response #253 whereby Response.charset would not allow the user to set a charset if the content type was JSON. I am not a big fan of trying to stop the user from shooting themselves in the foot, and would rather let them add/remove as needed.
  • json.dumps/loads are now always UTF-8. Python returns a string to us, and we can encode it as we see fit.
  • Response.__init__ has been cleaned up to remove a lot of extraneous branch conditions that were only complicating the logic.

@digitalresistor
Copy link
Member Author

Closes #130 and #237

@digitalresistor
Copy link
Member Author

Closes #236

@digitalresistor
Copy link
Member Author

This change breaks Pyramid because it treats JSON as text, and attempts to get/set .text on the Response object.

@cdent
Copy link

cdent commented Jul 11, 2016

I reckon a common way in which this change may surprise people is if they have tests which were written to expect (the incorrect) application/json; charset=UTF-8. With these changes they'll now get (the correct) application/json. Since such tests are wrong™ I reckon that's okay.

@digitalresistor
Copy link
Member Author

Any changes made in this area were going to have this issue, I'd argue that I'd rather have WebOb do the right thing rather than continue sending an invalid content-type.

@digitalresistor
Copy link
Member Author

@cdent Wanted to add a little more of a note: in WebOb 1.6 there's already been some work to remove charset=UTF-8 from certain responses that accidentally contained it, so I don't think it will be that big of a surprise to people.


# Will raise
try:
print(res.text)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo.

@mmerickel
Copy link
Member

Is there any planned deprecation warnings for this feature? Or just ripping the bandaid off?

@digitalresistor
Copy link
Member Author

@mmerickel What part of it? The content-type/charset stuff is fixing the remainder of the items that should have been fixed in 1.6 but were inadequately completed.

The default_encoding is something I am not happy with, and is a change I am no longer planning to make.

rylz and others added 15 commits July 16, 2016 20:23
Refactored the logic in Response.__init__ to handle default charset more
consistently.

Added logic in the charset setter that ignores attempts to set it on
JSON content types.

Removed explicit charset specification from exceptions since this is
handled correctly for the text types within Response.

Fixed some affected tests, and added assertions for content types in
exceptions.

Addresses #237
This way the next time I come across this I don't have to spend 20
minutes trying to figure out why status_code wasn't being used.
It is entirely possible that if there is no content_type passed by the
user, no headerlist that contains a content type and there is no default
content type that we don't have a content_type.

In that case we want to make sure we don't set anything in the header
dictionary since None is not a valid header to return in a Response.
Overhaul a lot of the logic in Response.__init__ to be smaller, and
remove a lot of extra logic checking that didn't belong in __init__ but
instead in the properties themselves.
We no longer want to specialise application/json, instead attempt to do
the right thing as best as possible all the time.
Don't try to stop the user from shooting themselves in the foot, if the
user explicitly asks for a charset, even on a content type that may not
support it as a paramater who are we tell them they can't do that.
Makes pytest output more sane
Just encode the json.dumps as UTF-8, then raise an error if the user
passes a text type. This basically undoes a bunch of changes where we
were attempting to stop the user from shooting their own foot off. Let
them shoot.
The docs fairy came to town!

if 'charset' in params:
if not _content_type_has_charset(value):
warn_deprecation(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should maybe be turned into a RuntimeWarning instead.

@digitalresistor
Copy link
Member Author

The only backwards incompatible change is https://github.com/Pylons/webob/pull/261/files#diff-84ba6a58e29e169ddf6578bceca65dc9R768. Whereby if you currently have a content-type with a charset, and you then you replace it with one that doesn't, it is explicitly removed, but the rest of the parameters stick around.

I don't see a way to gracefully deprecate the removal of charset, since that is the crux of the issue. For the other parameters it can continue as is for now, and full removal of all of that can be done at a later point in time.

digitalresistor and others added 2 commits July 16, 2016 20:45
- make sub-classing notes an HTML list
- fix grammar and punctuation
- rewrap as needed
@digitalresistor
Copy link
Member Author

Thanks @stevepiercy. I need to make a couple minor changes to this, and then this will be ready to go.

I forgot to document app_iter for the constructor so I need to do that.

When the new Content-Type value did not contain any semicolons we
attempted to save any existing parameters from the existing Content-Type
headers. If that header was empty, it would do nothing, and no
Content-Type would be set at all.

The new branch explicitly sets the Content-Type to the value if we don't
find any paramters to save.
We already state that if the headerlist contains a Content-Type, then we
won't accept the passed in value, nor will we set it to a default
content type. There is no reason to pull the value out of the header
because if it exists we won't set the content_type anyway.

We have also have all of these fantastic properties, why are we changing
the underlying headers list directly when we can just use the property
and have it do it's appropriate magic.
If the caller provides the headerlist, we shouldn't try to be too smart
and add in a default Content-Type. This is used by
Request.get_response() for example to create a Response object that
matches the Response received from a WSGI application.

We do still add a default charset even if there is none provided on
Content-Types that are known to allow for a charset (basically texty
responses). This way Response.text will function without additional
work.
The charset provided to the constructor should not be used if a charset
is already set on the Content-Type.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants