UnicodeEncodeError #6895

agusmakmun · 2019-08-28T09:33:32Z

Error when decoding the character '\ud83d'

# rest_framework/renderers.py at line 119

'utf-8' codec can't encode character '\ud83d' in position 787: surrogates not allowed

The text was updated successfully, but these errors were encountered:

rpkilby · 2019-09-03T17:41:59Z

Hi @agusmakmun. This should be fixed by #6633. Upgrade to v3.10 and let us know if the issue persists.

agusmakmun · 2019-09-04T01:29:37Z

@rpkilby the character \ud83d doesn't exist in this line, does it work?

ret = ret.replace('\u2028', '\\u2028').replace('\u2029', '\\u2029')
return ret.encode()

rpkilby · 2019-09-04T02:31:03Z

My mistake - I looked too briefly with the git blame and misunderstood what the PR was doing.

I'm not that familiar with unicode's surrogate pairs, but as best I can tell the presence of a surrogate is an indication that something has gone wrong. Your string should contain the unicode character itself, not its surrogates. You should probably figure out why you're getting surrogates to begin with and fix the issue there, however, you can also use the surrogatepass handler to fix the string.

>>> '\ud83d\ude4f'.encode('utf-16', 'surrogatepass').decode('utf-16')
'🙏'

Note that surrogates should exist as pairs, and it looks like your string contains a lone high surrogate, so this workaround may not work for you.

>>> '\ud83d'.encode('utf-16', 'surrogatepass').decode('utf-16')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf-16-le' codec can't decode bytes in position 2-3: unexpected end of data

More info:

moseb · 2019-10-31T11:23:05Z

@agusmakmun did you manage to reproduce how the client was able to produce JSON containing character 0xD83D? Did you see #7026?

rpkilby · 2019-10-31T18:53:18Z

@moseb. As best I can tell, this is a related but different issue. In this case, they've stored the surrogates and are trying to render them in a response. In your case, the client is sending you surrogates in the request, and parsing is failing.

Ach, I reread your issue and saw that parsing isn't failing.

rpkilby closed this as completed Sep 3, 2019

moseb mentioned this issue Oct 30, 2019

JSONParser (and CharField) let malformed strings (isolated surrogate code points) pass through to the application… to then cause late 500 errors #7026

Closed

benjackwhite mentioned this issue Dec 20, 2022

Unable to play session recordings PostHog/posthog#13272

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UnicodeEncodeError #6895

UnicodeEncodeError #6895

agusmakmun commented Aug 28, 2019

rpkilby commented Sep 3, 2019

agusmakmun commented Sep 4, 2019

rpkilby commented Sep 4, 2019

moseb commented Oct 31, 2019

rpkilby commented Oct 31, 2019 •

edited

Loading

UnicodeEncodeError #6895

UnicodeEncodeError #6895

Comments

agusmakmun commented Aug 28, 2019

rpkilby commented Sep 3, 2019

agusmakmun commented Sep 4, 2019

rpkilby commented Sep 4, 2019

moseb commented Oct 31, 2019

rpkilby commented Oct 31, 2019 • edited Loading

rpkilby commented Oct 31, 2019 •

edited

Loading