-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
non-ascii XML output does not work with Python2 #106
Comments
Here is a simple MRS that illustrates the problem:
One #164 is fixed, you'll see an error like the one in the original post. The first trace in PyDelphin code, though, is this:
So it seems the |
In an email, @mcmillanmajora says this:
I think we have non-unicode strings in the elements in Python 2, so the The code you've shown avoids the problem by converting to unicode before constructing the XML structure (although it might be better to use the |
You're right. It only solves the delphin command. I tried applying the conversion in loads() instead, and it works for the command line, but the encoding issue persists when using the API. |
Here's a unit test (I called it # -*- coding: utf-8 -*-
from delphin.mrs.util import etree_tostring
def test_etree_tostring():
import xml.etree.ElementTree as etree
e = etree.Element('a')
e.text = 'a'
assert etree_tostring(e, encoding='unicode') == u'<a>a</a>'
e.text = u'あ'
assert etree_tostring(e, encoding='unicode') == u'<a>あ</a>'
e.text = 'あ'
assert etree_tostring(e, encoding='unicode') == u'<a>あ</a>' |
Converting to an XML-based format (e.g.
mrx
ordmrx
) with Python2 generates a UnicodeDecodeError when there is non-ascii characters in the stream:The text was updated successfully, but these errors were encountered: