Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicodeEncodeError #6895

Closed
agusmakmun opened this issue Aug 28, 2019 · 5 comments
Closed

UnicodeEncodeError #6895

agusmakmun opened this issue Aug 28, 2019 · 5 comments

Comments

@agusmakmun
Copy link

Error when decoding the character '\ud83d'

# rest_framework/renderers.py at line 119

'utf-8' codec can't encode character '\ud83d' in position 787: surrogates not allowed

image

@rpkilby
Copy link
Member

rpkilby commented Sep 3, 2019

Hi @agusmakmun. This should be fixed by #6633. Upgrade to v3.10 and let us know if the issue persists.

@rpkilby rpkilby closed this as completed Sep 3, 2019
@agusmakmun
Copy link
Author

@rpkilby the character \ud83d doesn't exist in this line, does it work?

ret = ret.replace('\u2028', '\\u2028').replace('\u2029', '\\u2029')
return ret.encode()

@rpkilby
Copy link
Member

rpkilby commented Sep 4, 2019

My mistake - I looked too briefly with the git blame and misunderstood what the PR was doing.

I'm not that familiar with unicode's surrogate pairs, but as best I can tell the presence of a surrogate is an indication that something has gone wrong. Your string should contain the unicode character itself, not its surrogates. You should probably figure out why you're getting surrogates to begin with and fix the issue there, however, you can also use the surrogatepass handler to fix the string.

>>> '\ud83d\ude4f'.encode('utf-16', 'surrogatepass').decode('utf-16')
'🙏'

Note that surrogates should exist as pairs, and it looks like your string contains a lone high surrogate, so this workaround may not work for you.

>>> '\ud83d'.encode('utf-16', 'surrogatepass').decode('utf-16')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf-16-le' codec can't decode bytes in position 2-3: unexpected end of data

More info:

@moseb
Copy link

moseb commented Oct 31, 2019

@agusmakmun did you manage to reproduce how the client was able to produce JSON containing character 0xD83D? Did you see #7026?

@rpkilby
Copy link
Member

rpkilby commented Oct 31, 2019

@moseb. As best I can tell, this is a related but different issue. In this case, they've stored the surrogates and are trying to render them in a response. In your case, the client is sending you surrogates in the request, and parsing is failing.

Ach, I reread your issue and saw that parsing isn't failing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants