Skip to content
This repository has been archived by the owner on Nov 5, 2019. It is now read-only.

utf-8 decode bytes inputs for python3 compat + python3 test fix #157

Closed
wants to merge 3 commits into from

Conversation

saaros
Copy link

@saaros saaros commented Mar 31, 2015

  • json loading: always utf-8 decode bytes inputs
  • tests: use six.moves.urllib for py2/3 compatibility

saaros added 2 commits March 31, 2015 17:07
Use `isinstance(s, bytes)` to check if the input should be utf-8 decoded
before passing it to json.loads.  Some places in the code always tried to
utf-8 decode the inputs without checking their type (which can cause
decoding failures on Python 2 unicode inputs and AttributeErrors on Python 3
strings) while others failed to decode them causing the code paths to fail
on Python 3 where HTTP responses are returned as `bytes`.
@coveralls
Copy link

Coverage Status

Coverage increased (+0.15%) to 64.95% when pulling ed36586 on saaros:py3-compat into 0a6241c on google:master.

@@ -272,7 +272,7 @@ def new_from_json(cls, s):
An instance of the subclass of Credentials that was serialized with
to_json().
"""
if six.PY3 and isinstance(s, bytes):

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

@saaros
Copy link
Author

saaros commented Apr 1, 2015

This PR fixes #150 which also had a proposed but slightly broken fix in #152.

`bytes` was introduced in Python 2.6 as an alias for `str`.
`six.binary_type` is equivalent to it, but it makes sense to use the native
type instead of a compatibilty libray type when there's no difference in
them.

`bytes` was already being used in most places, this commit replaces the
three remaining uses.
@saaros
Copy link
Author

saaros commented Apr 7, 2015

Added a commit to convert the last remaining uses of six.binary_type to bytes which was already being used in most places.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.13%) to 64.93% when pulling 3584b07 on saaros:py3-compat into 0a6241c on google:master.

@sullivanmatt
Copy link

+1

Are there any other considerations around this PR that are keeping it from being merged? I'm completely unable to use Python 3 in my infrastructure solely due to this issue.

@dhermes
Copy link
Contributor

dhermes commented May 19, 2015

@sullivanmatt Have you updated httplib2? Is it still an issue with the latest httplib2 (which now does support Python 3 correctly, and fixed many issues here)?

@sullivanmatt
Copy link

I'm using httplib2 0.9.1 (latest).

Here's the error I believe this would resolve (haven't checked out the branch to test, I admit) -

>>> access_token = get_access_token_from_cache()
>>> credentials = AccessTokenCredentials(access_token, user_agent="httpclient/1.0")
>>> http = httplib2.Http()
>>> http = credentials.authorize(http)
Traceback (most recent call last):
  File "/opt/project-service/project_service/gproject_service/__init__.py", line 160, in _get_history
    labelId="INBOX").execute(http=http)
  File "/home/msulliv/mb-project-sdk-py3/lib/python3.4/site-packages/oauth2client/util.py", line 137, in positional_wrapper
    return wrapped(*args, **kwargs)
  File "/home/msulliv/mb-project-sdk-py3/lib/python3.4/site-packages/googleapiclient/http.py", line 722, in execute
    body=self.body, headers=self.headers)
  File "/home/msulliv/mb-project-sdk-py3/lib/python3.4/site-packages/oauth2client/util.py", line 137, in positional_wrapper
    return wrapped(*args, **kwargs)
  File "/home/msulliv/mb-project-sdk-py3/lib/python3.4/site-packages/oauth2client/client.py", line 549, in new_request
    self.apply(headers)
  File "/home/msulliv/mb-project-sdk-py3/lib/python3.4/site-packages/oauth2client/client.py", line 615, in apply
    headers['Authorization'] = 'Bearer ' + self.access_token
TypeError: Can't convert 'bytes' object to str implicitly

@dhermes
Copy link
Contributor

dhermes commented May 20, 2015

@sullivanmatt Your stacktrace doesn't seem to match the code snippet.

After running pip3 install -U oauth2client (which also updates the dependencies) and running a related snippet with both bytes and unicode I can reproduce the failure

$ python3
Python 3.4.0 (default, Apr 11 2014, 13:05:11) 
[GCC 4.8.2] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from oauth2client.client import AccessTokenCredentials
>>> 
>>> 
>>> creds_unicode = AccessTokenCredentials(u'foo', user_agent='UA')
>>> headers = {}
>>> creds_unicode.apply(headers)
>>> print(headers)
{'Authorization': 'Bearer foo'}
>>> 
>>> creds_bytes = AccessTokenCredentials(b'bar', user_agent='UA')
>>> headers = {}
>>> creds_bytes.apply(headers)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.4/dist-packages/oauth2client/client.py", line 615, in apply
    headers['Authorization'] = 'Bearer ' + self.access_token
TypeError: Can't convert 'bytes' object to str implicitly

@sullivanmatt
Copy link

For whatever reason I didn't realize this until I saw your reproduction above, but simply doing access_token.decode('utf-8') was sufficient to get around the issue for the time being.

@dhermes
Copy link
Contributor

dhermes commented May 26, 2015

@sullivanmatt Do you mind if we close out. A lot of the changes are just bytes "grammar" changes.

But there are some places where you add explicit decodes:

    resp, content = http_request(token_revoke_uri)
    if isinstance(content, bytes):
      content = content.decode('utf-8')
...
  def from_json(cls, s):
    if isinstance(s, bytes):
      s = s.decode('utf-8')
...
  if isinstance(content, bytes):
    content = content.decode('utf-8')
  if resp.status == 200:
    # WAS certs = json.loads(content.decode('utf-8'))
    certs = json.loads(content)

How did you go about determining these were needed?

@sullivanmatt
Copy link

I am not the author of this PR (@saaros is), so I can't comment on his methodology.

@dhermes
Copy link
Contributor

dhermes commented May 26, 2015

D'oh! The passage of time has made me weak and dumb :)

@sullivanmatt
Copy link

We've all been there man. Thanks!

@saaros
Copy link
Author

saaros commented May 26, 2015

It's to make sure the input to json.loads() is always an unicode string; it's not always clear where the input comes from and whether it's bytes or unicode. Having if isinstance(s,bytes): s = s.decode("utf-8") in every place that loads json isn't pretty, though. A function in util.py to do it would probably look cleaner.

@dhermes
Copy link
Contributor

dhermes commented May 26, 2015

I agree with you there. Do you mind breaking this into two PRs: one with the bytes grammar fix and the other with the actual code path fixes. (Or I can do it and CC you on the PRs.)

Also, how did you locate all the code paths you edited?

@saaros
Copy link
Author

saaros commented May 26, 2015

I just ran git grep to find them after I ran into various problems using this library on Python 3. Do you want the commit 3584b07 with the six.binary_type -> bytes change in a PR of its own leaving the other two ("fix urllib tests on py3" and "utf8-decode before json.loads") in this PR?

@dhermes
Copy link
Contributor

dhermes commented May 26, 2015

  • The bytes grammar stuff I'm referring to is removing six.PY3 and just using bytes instead of six.binary_type.
  • The "fix urllib tests on py3" is somewhat moot since App Engine only supports Py27
  • utf8-decode before json.loads is fine in this PR but seems worth just closing and opening a new one given your suggestion for moving the functionality to utils.

@dhermes
Copy link
Contributor

dhermes commented Aug 14, 2015

@nathanielmanistaatgoogle I am going through this to see if there is anything we missed in #250

@dhermes
Copy link
Contributor

dhermes commented Aug 19, 2015

Closing since #272 is in

@dhermes dhermes closed this Aug 19, 2015
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants