Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added compatibility for Unicode variable names #179

Merged
merged 15 commits into from
Jun 13, 2018

Conversation

julienmalard
Copy link
Contributor

This seems to be working with Unicode variable names for Vensim .mdl format models!

@coveralls
Copy link

coveralls commented Apr 29, 2018

Coverage Status

Coverage decreased (-1.4%) to 89.238% when pulling 51fd099 on julienmalard:master into 1c22343 on JamesPHoughton:master.

@julienmalard
Copy link
Contributor Author

All Unicode tests are now passing! :)

@@ -2,8 +2,10 @@
<module type="PYTHON_MODULE" version="4">
<component name="NewModuleRootManager">
<content url="file://$MODULE_DIR$" />
<orderEntry type="jdk" jdkName="Python 3.6.0 (~/anaconda/bin/python)" jdkType="Python SDK" />
<orderEntry type="jdk" jdkName="Python 3.6.0 (C:\Users\jmalar1\AppData\Local\Programs\Python\Python36-32\python.exe)" jdkType="Python SDK" />
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be good to revert changes in local work space files, like this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried excluding the .idea folder from my push entirely (consider removing from version control?) Unfortunately, I seem to have failed. If you can help please do!

@@ -1,7 +1,7 @@
<?xml version="1.0" encoding="UTF-8"?>
<project version="4">
<component name="ProjectRootManager" version="2" project-jdk-name="Python 3.6.0 (~/anaconda/bin/python)" project-jdk-type="Python SDK" />
<component name="ProjectRootManager" version="2" project-jdk-name="Python 3.6.0 (C:\Users\jmalar1\AppData\Local\Programs\Python\Python36-32\python.exe)" project-jdk-type="Python SDK" />
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be good to revert changes in local work space files, like this.

@@ -88,7 +88,7 @@ def build(elements, subscript_dict, namespace, outfile_name):
if outfile_name == 'return':
return text

with open(outfile_name, 'w') as out:
with open(outfile_name, 'w', encoding='UTF-8') as out:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changes is not affect files with default asci encoding?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shouldn't, as this is internal PySD file writing and reading anyways (PySD should be the only one to read this, apart from a Python interpreter, and the latter can read UTF 8 in any case). If a model has only ascii characters, these will save with no problem as UTF 8 (and Vensim model files are already saved as UTF 8 by default).

@@ -262,14 +262,14 @@ def make_python_identifier(string, namespace=None, reserved_words=None,

if convert == 'hex':
# Convert invalid characters to hex
s = ''.join([c.encode("hex") if re.findall('[^0-9a-zA-Z_]', c) else c for c in s])
s = ''.join([c.encode("hex") if re.findall('[^\p{l}\p{m}\p{n}_]', c) else c for c in s])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that this regex is a correct... By regex101 service it match any sequence...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is fine; it uses unicode character blocks (which regex101.com does not understand, nor does the python re module, which is why I had to switch to the regex module). It should not match = ( ) and other non-python identifier sequences.Which regex service are you using?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the revisions!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'm use regex in python mode to verify that.. But I'm completelly understand this syntax of regex... It will be good to add some comments and add additional tests to check that nothing is broken.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added comments to describe what this regex is doing. As for additional tests, I think that the Unicode (and non-unicode) tests already present should verify that this works?

@alexprey
Copy link
Collaborator

Some tests failed again...
image

@alexprey alexprey mentioned this pull request May 31, 2018
# This takes care of models with Unicode variable names
basic_id = id_start (id_continue / ~r"[\s]")*
id_start = ~r"[A-Z]" / ~r"[a-z]" / "\u00AA" / "\u00B5" / "\u00BA" / ~r"[\u00C0-\u00D6]" / ~r"[\u00D8-\u00F6]" / ~r"[\u00F8-\u01BA]" / "\u01BB" / ~r"[\u01BC-\u01BF]" / ~r"[\u01C0-\u01C3]" / ~r"[\u01C4-\u0241]" / ~r"[\u0250-\u02AF]" / ~r"[\u02B0-\u02C1]" / ~r"[\u02C6-\u02D1]" / ~r"[\u02E0-\u02E4]" / "\u02EE" / "\u037A" / "\u0386" / ~r"[\u0388-\u038A]" / "\u038C" / ~r"[\u038E-\u03A1]" / ~r"[\u03A3-\u03CE]" / ~r"[\u03D0-\u03F5]" / ~r"[\u03F7-\u0481]" / ~r"[\u048A-\u04CE]" / ~r"[\u04D0-\u04F9]" / ~r"[\u0500-\u050F]" / ~r"[\u0531-\u0556]" / "\u0559" / ~r"[\u0561-\u0587]" / ~r"[\u05D0-\u05EA]" / ~r"[\u05F0-\u05F2]" / ~r"[\u0621-\u063A]" / "\u0640" / ~r"[\u0641-\u064A]" / ~r"[\u066E-\u066F]" / ~r"[\u0671-\u06D3]" / "\u06D5" / ~r"[\u06E5-\u06E6]" / ~r"[\u06EE-\u06EF]" / ~r"[\u06FA-\u06FC]" / "\u06FF" / "\u0710" / ~r"[\u0712-\u072F]" / ~r"[\u074D-\u076D]" / ~r"[\u0780-\u07A5]" / "\u07B1" / ~r"[\u0904-\u0939]" / "\u093D" / "\u0950" / ~r"[\u0958-\u0961]" / "\u097D" / ~r"[\u0985-\u098C]" / ~r"[\u098F-\u0990]" / ~r"[\u0993-\u09A8]" / ~r"[\u09AA-\u09B0]" / "\u09B2" / ~r"[\u09B6-\u09B9]" / "\u09BD" / "\u09CE" / ~r"[\u09DC-\u09DD]" / ~r"[\u09DF-\u09E1]" / ~r"[\u09F0-\u09F1]" / ~r"[\u0A05-\u0A0A]" / ~r"[\u0A0F-\u0A10]" / ~r"[\u0A13-\u0A28]" / ~r"[\u0A2A-\u0A30]" / ~r"[\u0A32-\u0A33]" / ~r"[\u0A35-\u0A36]" / ~r"[\u0A38-\u0A39]" / ~r"[\u0A59-\u0A5C]" / "\u0A5E" / ~r"[\u0A72-\u0A74]" / ~r"[\u0A85-\u0A8D]" / ~r"[\u0A8F-\u0A91]" / ~r"[\u0A93-\u0AA8]" / ~r"[\u0AAA-\u0AB0]" / ~r"[\u0AB2-\u0AB3]" / ~r"[\u0AB5-\u0AB9]" / "\u0ABD" / "\u0AD0" / ~r"[\u0AE0-\u0AE1]" / ~r"[\u0B05-\u0B0C]" / ~r"[\u0B0F-\u0B10]" / ~r"[\u0B13-\u0B28]" / ~r"[\u0B2A-\u0B30]" / ~r"[\u0B32-\u0B33]" / ~r"[\u0B35-\u0B39]" / "\u0B3D" / ~r"[\u0B5C-\u0B5D]" / ~r"[\u0B5F-\u0B61]" / "\u0B71" / "\u0B83" / ~r"[\u0B85-\u0B8A]" / ~r"[\u0B8E-\u0B90]" / ~r"[\u0B92-\u0B95]" / ~r"[\u0B99-\u0B9A]" / "\u0B9C" / ~r"[\u0B9E-\u0B9F]" / ~r"[\u0BA3-\u0BA4]" / ~r"[\u0BA8-\u0BAA]" / ~r"[\u0BAE-\u0BB9]" / ~r"[\u0C05-\u0C0C]" / ~r"[\u0C0E-\u0C10]" / ~r"[\u0C12-\u0C28]" / ~r"[\u0C2A-\u0C33]" / ~r"[\u0C35-\u0C39]" / ~r"[\u0C60-\u0C61]" / ~r"[\u0C85-\u0C8C]" / ~r"[\u0C8E-\u0C90]" / ~r"[\u0C92-\u0CA8]" / ~r"[\u0CAA-\u0CB3]" / ~r"[\u0CB5-\u0CB9]" / "\u0CBD" / "\u0CDE" / ~r"[\u0CE0-\u0CE1]" / ~r"[\u0D05-\u0D0C]" / ~r"[\u0D0E-\u0D10]" / ~r"[\u0D12-\u0D28]" / ~r"[\u0D2A-\u0D39]" / ~r"[\u0D60-\u0D61]" / ~r"[\u0D85-\u0D96]" / ~r"[\u0D9A-\u0DB1]" / ~r"[\u0DB3-\u0DBB]" / "\u0DBD" / ~r"[\u0DC0-\u0DC6]" / ~r"[\u0E01-\u0E30]" / ~r"[\u0E32-\u0E33]" / ~r"[\u0E40-\u0E45]" / "\u0E46" / ~r"[\u0E81-\u0E82]" / "\u0E84" / ~r"[\u0E87-\u0E88]" / "\u0E8A" / "\u0E8D" / ~r"[\u0E94-\u0E97]" / ~r"[\u0E99-\u0E9F]" / ~r"[\u0EA1-\u0EA3]" / "\u0EA5" / "\u0EA7" / ~r"[\u0EAA-\u0EAB]" / ~r"[\u0EAD-\u0EB0]" / ~r"[\u0EB2-\u0EB3]" / "\u0EBD" / ~r"[\u0EC0-\u0EC4]" / "\u0EC6" / ~r"[\u0EDC-\u0EDD]" / "\u0F00" / ~r"[\u0F40-\u0F47]" / ~r"[\u0F49-\u0F6A]" / ~r"[\u0F88-\u0F8B]" / ~r"[\u1000-\u1021]" / ~r"[\u1023-\u1027]" / ~r"[\u1029-\u102A]" / ~r"[\u1050-\u1055]" / ~r"[\u10A0-\u10C5]" / ~r"[\u10D0-\u10FA]" / "\u10FC" / ~r"[\u1100-\u1159]" / ~r"[\u115F-\u11A2]" / ~r"[\u11A8-\u11F9]" / ~r"[\u1200-\u1248]" / ~r"[\u124A-\u124D]" / ~r"[\u1250-\u1256]" / "\u1258" / ~r"[\u125A-\u125D]" / ~r"[\u1260-\u1288]" / ~r"[\u128A-\u128D]" / ~r"[\u1290-\u12B0]" / ~r"[\u12B2-\u12B5]" / ~r"[\u12B8-\u12BE]" / "\u12C0" / ~r"[\u12C2-\u12C5]" / ~r"[\u12C8-\u12D6]" / ~r"[\u12D8-\u1310]" / ~r"[\u1312-\u1315]" / ~r"[\u1318-\u135A]" / ~r"[\u1380-\u138F]" / ~r"[\u13A0-\u13F4]" / ~r"[\u1401-\u166C]" / ~r"[\u166F-\u1676]" / ~r"[\u1681-\u169A]" / ~r"[\u16A0-\u16EA]" / ~r"[\u16EE-\u16F0]" / ~r"[\u1700-\u170C]" / ~r"[\u170E-\u1711]" / ~r"[\u1720-\u1731]" / ~r"[\u1740-\u1751]" / ~r"[\u1760-\u176C]" / ~r"[\u176E-\u1770]" / ~r"[\u1780-\u17B3]" / "\u17D7" / "\u17DC" / ~r"[\u1820-\u1842]" / "\u1843" / ~r"[\u1844-\u1877]" / ~r"[\u1880-\u18A8]" / ~r"[\u1900-\u191C]" / ~r"[\u1950-\u196D]" / ~r"[\u1970-\u1974]" / ~r"[\u1980-\u19A9]" / ~r"[\u19C1-\u19C7]" / ~r"[\u1A00-\u1A16]" / ~r"[\u1D00-\u1D2B]" / ~r"[\u1D2C-\u1D61]" / ~r"[\u1D62-\u1D77]" / "\u1D78" / ~r"[\u1D79-\u1D9A]" / ~r"[\u1D9B-\u1DBF]" / ~r"[\u1E00-\u1E9B]" / ~r"[\u1EA0-\u1EF9]" / ~r"[\u1F00-\u1F15]" / ~r"[\u1F18-\u1F1D]" / ~r"[\u1F20-\u1F45]" / ~r"[\u1F48-\u1F4D]" / ~r"[\u1F50-\u1F57]" / "\u1F59" / "\u1F5B" / "\u1F5D" / ~r"[\u1F5F-\u1F7D]" / ~r"[\u1F80-\u1FB4]" / ~r"[\u1FB6-\u1FBC]" / "\u1FBE" / ~r"[\u1FC2-\u1FC4]" / ~r"[\u1FC6-\u1FCC]" / ~r"[\u1FD0-\u1FD3]" / ~r"[\u1FD6-\u1FDB]" / ~r"[\u1FE0-\u1FEC]" / ~r"[\u1FF2-\u1FF4]" / ~r"[\u1FF6-\u1FFC]" / "\u2071" / "\u207F" / ~r"[\u2090-\u2094]" / "\u2102" / "\u2107" / ~r"[\u210A-\u2113]" / "\u2115" / "\u2118" / ~r"[\u2119-\u211D]" / "\u2124" / "\u2126" / "\u2128" / ~r"[\u212A-\u212D]" / "\u212E" / ~r"[\u212F-\u2131]" / ~r"[\u2133-\u2134]" / ~r"[\u2135-\u2138]" / "\u2139" / ~r"[\u213C-\u213F]" / ~r"[\u2145-\u2149]" / ~r"[\u2160-\u2183]" / ~r"[\u2C00-\u2C2E]" / ~r"[\u2C30-\u2C5E]" / ~r"[\u2C80-\u2CE4]" / ~r"[\u2D00-\u2D25]" / ~r"[\u2D30-\u2D65]" / "\u2D6F" / ~r"[\u2D80-\u2D96]" / ~r"[\u2DA0-\u2DA6]" / ~r"[\u2DA8-\u2DAE]" / ~r"[\u2DB0-\u2DB6]" / ~r"[\u2DB8-\u2DBE]" / ~r"[\u2DC0-\u2DC6]" / ~r"[\u2DC8-\u2DCE]" / ~r"[\u2DD0-\u2DD6]" / ~r"[\u2DD8-\u2DDE]" / "\u3005" / "\u3006" / "\u3007" / ~r"[\u3021-\u3029]" / ~r"[\u3031-\u3035]" / ~r"[\u3038-\u303A]" / "\u303B" / "\u303C" / ~r"[\u3041-\u3096]" / ~r"[\u309B-\u309C]" / ~r"[\u309D-\u309E]" / "\u309F" / ~r"[\u30A1-\u30FA]" / ~r"[\u30FC-\u30FE]" / "\u30FF" / ~r"[\u3105-\u312C]" / ~r"[\u3131-\u318E]" / ~r"[\u31A0-\u31B7]" / ~r"[\u31F0-\u31FF]" / ~r"[\u3400-\u4DB5]" / ~r"[\u4E00-\u9FBB]" / ~r"[\uA000-\uA014]" / "\uA015" / ~r"[\uA016-\uA48C]" / ~r"[\uA800-\uA801]" / ~r"[\uA803-\uA805]" / ~r"[\uA807-\uA80A]" / ~r"[\uA80C-\uA822]" / ~r"[\uAC00-\uD7A3]" / ~r"[\uF900-\uFA2D]" / ~r"[\uFA30-\uFA6A]" / ~r"[\uFA70-\uFAD9]" / ~r"[\uFB00-\uFB06]" / ~r"[\uFB13-\uFB17]" / "\uFB1D" / ~r"[\uFB1F-\uFB28]" / ~r"[\uFB2A-\uFB36]" / ~r"[\uFB38-\uFB3C]" / "\uFB3E" / ~r"[\uFB40-\uFB41]" / ~r"[\uFB43-\uFB44]" / ~r"[\uFB46-\uFBB1]" / ~r"[\uFBD3-\uFD3D]" / ~r"[\uFD50-\uFD8F]" / ~r"[\uFD92-\uFDC7]" / ~r"[\uFDF0-\uFDFB]" / ~r"[\uFE70-\uFE74]" / ~r"[\uFE76-\uFEFC]" / ~r"[\uFF21-\uFF3A]" / ~r"[\uFF41-\uFF5A]" / ~r"[\uFF66-\uFF6F]" / "\uFF70" / ~r"[\uFF71-\uFF9D]" / ~r"[\uFF9E-\uFF9F]" / ~r"[\uFFA0-\uFFBE]" / ~r"[\uFFC2-\uFFC7]" / ~r"[\uFFCA-\uFFCF]" / ~r"[\uFFD2-\uFFD7]" / ~r"[\uFFDA-\uFFDC]"
id_continue = id_start / ~r"[0-9]" / ~r"[\u0300-\u036F]" / ~r"[\u0483-\u0486]" / ~r"[\u0591-\u05B9]" / ~r"[\u05BB-\u05BD]" / "\u05BF" / ~r"[\u05C1-\u05C2]" / ~r"[\u05C4-\u05C5]" / "\u05C7" / ~r"[\u0610-\u0615]" / ~r"[\u064B-\u065E]" / ~r"[\u0660-\u0669]" / "\u0670" / ~r"[\u06D6-\u06DC]" / ~r"[\u06DF-\u06E4]" / ~r"[\u06E7-\u06E8]" / ~r"[\u06EA-\u06ED]" / ~r"[\u06F0-\u06F9]" / "\u0711" / ~r"[\u0730-\u074A]" / ~r"[\u07A6-\u07B0]" / ~r"[\u0901-\u0902]" / "\u0903" / "\u093C" / ~r"[\u093E-\u0940]" / ~r"[\u0941-\u0948]" / ~r"[\u0949-\u094C]" / "\u094D" / ~r"[\u0951-\u0954]" / ~r"[\u0962-\u0963]" / ~r"[\u0966-\u096F]" / "\u0981" / ~r"[\u0982-\u0983]" / "\u09BC" / ~r"[\u09BE-\u09C0]" / ~r"[\u09C1-\u09C4]" / ~r"[\u09C7-\u09C8]" / ~r"[\u09CB-\u09CC]" / "\u09CD" / "\u09D7" / ~r"[\u09E2-\u09E3]" / ~r"[\u09E6-\u09EF]" / ~r"[\u0A01-\u0A02]" / "\u0A03" / "\u0A3C" / ~r"[\u0A3E-\u0A40]" / ~r"[\u0A41-\u0A42]" / ~r"[\u0A47-\u0A48]" / ~r"[\u0A4B-\u0A4D]" / ~r"[\u0A66-\u0A6F]" / ~r"[\u0A70-\u0A71]" / ~r"[\u0A81-\u0A82]" / "\u0A83" / "\u0ABC" / ~r"[\u0ABE-\u0AC0]" / ~r"[\u0AC1-\u0AC5]" / ~r"[\u0AC7-\u0AC8]" / "\u0AC9" / ~r"[\u0ACB-\u0ACC]" / "\u0ACD" / ~r"[\u0AE2-\u0AE3]" / ~r"[\u0AE6-\u0AEF]" / "\u0B01" / ~r"[\u0B02-\u0B03]" / "\u0B3C" / "\u0B3E" / "\u0B3F" / "\u0B40" / ~r"[\u0B41-\u0B43]" / ~r"[\u0B47-\u0B48]" / ~r"[\u0B4B-\u0B4C]" / "\u0B4D" / "\u0B56" / "\u0B57" / ~r"[\u0B66-\u0B6F]" / "\u0B82" / ~r"[\u0BBE-\u0BBF]" / "\u0BC0" / ~r"[\u0BC1-\u0BC2]" / ~r"[\u0BC6-\u0BC8]" / ~r"[\u0BCA-\u0BCC]" / "\u0BCD" / "\u0BD7" / ~r"[\u0BE6-\u0BEF]" / ~r"[\u0C01-\u0C03]" / ~r"[\u0C3E-\u0C40]" / ~r"[\u0C41-\u0C44]" / ~r"[\u0C46-\u0C48]" / ~r"[\u0C4A-\u0C4D]" / ~r"[\u0C55-\u0C56]" / ~r"[\u0C66-\u0C6F]" / ~r"[\u0C82-\u0C83]" / "\u0CBC" / "\u0CBE" / "\u0CBF" / ~r"[\u0CC0-\u0CC4]" / "\u0CC6" / ~r"[\u0CC7-\u0CC8]" / ~r"[\u0CCA-\u0CCB]" / ~r"[\u0CCC-\u0CCD]" / ~r"[\u0CD5-\u0CD6]" / ~r"[\u0CE6-\u0CEF]" / ~r"[\u0D02-\u0D03]" / ~r"[\u0D3E-\u0D40]" / ~r"[\u0D41-\u0D43]" / ~r"[\u0D46-\u0D48]" / ~r"[\u0D4A-\u0D4C]" / "\u0D4D" / "\u0D57" / ~r"[\u0D66-\u0D6F]" / ~r"[\u0D82-\u0D83]" / "\u0DCA" / ~r"[\u0DCF-\u0DD1]" / ~r"[\u0DD2-\u0DD4]" / "\u0DD6" / ~r"[\u0DD8-\u0DDF]" / ~r"[\u0DF2-\u0DF3]" / "\u0E31" / ~r"[\u0E34-\u0E3A]" / ~r"[\u0E47-\u0E4E]" / ~r"[\u0E50-\u0E59]" / "\u0EB1" / ~r"[\u0EB4-\u0EB9]" / ~r"[\u0EBB-\u0EBC]" / ~r"[\u0EC8-\u0ECD]" / ~r"[\u0ED0-\u0ED9]" / ~r"[\u0F18-\u0F19]" / ~r"[\u0F20-\u0F29]" / "\u0F35" / "\u0F37" / "\u0F39" / ~r"[\u0F3E-\u0F3F]" / ~r"[\u0F71-\u0F7E]" / "\u0F7F" / ~r"[\u0F80-\u0F84]" / ~r"[\u0F86-\u0F87]" / ~r"[\u0F90-\u0F97]" / ~r"[\u0F99-\u0FBC]" / "\u0FC6" / "\u102C" / ~r"[\u102D-\u1030]" / "\u1031" / "\u1032" / ~r"[\u1036-\u1037]" / "\u1038" / "\u1039" / ~r"[\u1040-\u1049]" / ~r"[\u1056-\u1057]" / ~r"[\u1058-\u1059]" / "\u135F" / ~r"[\u1369-\u1371]" / ~r"[\u1712-\u1714]" / ~r"[\u1732-\u1734]" / ~r"[\u1752-\u1753]" / ~r"[\u1772-\u1773]" / "\u17B6" / ~r"[\u17B7-\u17BD]" / ~r"[\u17BE-\u17C5]" / "\u17C6" / ~r"[\u17C7-\u17C8]" / ~r"[\u17C9-\u17D3]" / "\u17DD" / ~r"[\u17E0-\u17E9]" / ~r"[\u180B-\u180D]" / ~r"[\u1810-\u1819]" / "\u18A9" / ~r"[\u1920-\u1922]" / ~r"[\u1923-\u1926]" / ~r"[\u1927-\u1928]" / ~r"[\u1929-\u192B]" / ~r"[\u1930-\u1931]" / "\u1932" / ~r"[\u1933-\u1938]" / ~r"[\u1939-\u193B]" / ~r"[\u1946-\u194F]" / ~r"[\u19B0-\u19C0]" / ~r"[\u19C8-\u19C9]" / ~r"[\u19D0-\u19D9]" / ~r"[\u1A17-\u1A18]" / ~r"[\u1A19-\u1A1B]" / ~r"[\u1DC0-\u1DC3]" / ~r"[\u203F-\u2040]" / "\u2054" / ~r"[\u20D0-\u20DC]" / "\u20E1" / ~r"[\u20E5-\u20EB]" / ~r"[\u302A-\u302F]" / ~r"[\u3099-\u309A]" / "\uA802" / "\uA806" / "\uA80B" / ~r"[\uA823-\uA824]" / ~r"[\uA825-\uA826]" / "\uA827" / "\uFB1E" / ~r"[\uFE00-\uFE0F]" / ~r"[\uFE20-\uFE23]" / ~r"[\uFE33-\uFE34]" / ~r"[\uFE4D-\uFE4F]" / ~r"[\uFF10-\uFF19]" / "\uFF3F"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I doubt it (at least, if so, I am not sure how to do it!) The simplified version works with the regex module only (and does not work with the re module), and unfortunately I do not think that it works in a Parsimonious grammar either (as here). If I am wrong please let me know!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, you are right...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should put in a request to the maintainer of parsimonious - I wonder how big of a fix it would be in his package? We could at least open an issue. @julienmalard, you're probably best qualified to do that. Do you have an interest in doing so? Then if it happens, we can simplify here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just added a small function based on your recommendation to use chardet to guess the encoding. It seems to work!

@julienmalard
Copy link
Contributor Author

I think everything is working now! The Unicode Decode error still in the TravisCI output is due to the fact that the test output file seems to not be encoded in Unicode, even for the Unicode test. The solution would be to add a workaround to try to read what type of encoding the file is in, but as I have heard that this is notoriously hard I hope that the current state will be sufficient for this pull request! In any case, unicode models run without any problem.

@alexprey
Copy link
Collaborator

@julienmalard mostly done with this PR. Still have issues with test coverage. Build time increased in 2 times! And test coveraged reduced by 1.5%. Please look closer at test results.

image

And also you can check child repo with tests to fix following:
image

The Unicode Decode error still in the TravisCI output is due to the fact that the test output file seems to not be encoded in Unicode, even for the Unicode test

You can check this repo with test models and maybe fix it https://github.com/SDXorg/test-models

And then I think we can complete with this PR and close few issues about Unicode 😃
Thanks for you job and quick responses!


Regards,
Alexey Mulyukin:
Lead Core developer of sdCloud.io development team.

@@ -831,7 +863,7 @@ def translate_vensim(mdl_file):
#>>> translate_vensim('../../tests/test-models/tests/limits/test_limits.mdl')

"""
with open(mdl_file, 'rU') as in_file:
with open(mdl_file, 'rU', encoding='UTF-8') as in_file:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this break if users import models they've translated with older versions of pysd? I know some people have made modifications between translation steps...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I understand of encodings, it will only break if a user had modified a model file to include non-ascii characters and subsequently saved it in a non-unicode encoding. Should be a rare case?

@JamesPHoughton
Copy link
Collaborator

Sorry for the radio silence. Peter is now sleeping through the night, so hopefully I'll have a brain soon. =)

What was the upshot with the unicode test on Travis? It sounds like it's working when you run locally, just not when Travis runs it? The error seems to come on the pandas call to read the CSV. Is there a platform-specific use of pandas we should be aware of and adapt to, or do they have an issue?

We might be able to do something like was done here in our test runner script. @julienmalard, if it's alright with you I'll let you make that change, and then you can have a clean PR with all your tests passing on Travis too.

@julienmalard
Copy link
Contributor Author

julienmalard commented Jun 12, 2018 via email

@JamesPHoughton
Copy link
Collaborator

Sweet. Thank you! I'll bring that in.

Looks like somewhere along the way we broke the 2.7 backend: https://travis-ci.org/JamesPHoughton/pysd/jobs/391916376#L610-L611
But that's my code, so I'll merge yours in and see if I can fix it...

James

@JamesPHoughton JamesPHoughton merged commit c478e83 into SDXorg:master Jun 13, 2018
@julienmalard
Copy link
Contributor Author

Thank you both @alexprey and @JamesPHoughton for all your help and kind support on my first ever approved pull request!

@JamesPHoughton
Copy link
Collaborator

Thanks @julienmalard! Looking forward to the next one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants