-
Notifications
You must be signed in to change notification settings - Fork 91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added compatibility for Unicode variable names #179
Conversation
All Unicode tests are now passing! :) |
@@ -2,8 +2,10 @@ | |||
<module type="PYTHON_MODULE" version="4"> | |||
<component name="NewModuleRootManager"> | |||
<content url="file://$MODULE_DIR$" /> | |||
<orderEntry type="jdk" jdkName="Python 3.6.0 (~/anaconda/bin/python)" jdkType="Python SDK" /> | |||
<orderEntry type="jdk" jdkName="Python 3.6.0 (C:\Users\jmalar1\AppData\Local\Programs\Python\Python36-32\python.exe)" jdkType="Python SDK" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be good to revert changes in local work space files, like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried excluding the .idea folder from my push entirely (consider removing from version control?) Unfortunately, I seem to have failed. If you can help please do!
@@ -1,7 +1,7 @@ | |||
<?xml version="1.0" encoding="UTF-8"?> | |||
<project version="4"> | |||
<component name="ProjectRootManager" version="2" project-jdk-name="Python 3.6.0 (~/anaconda/bin/python)" project-jdk-type="Python SDK" /> | |||
<component name="ProjectRootManager" version="2" project-jdk-name="Python 3.6.0 (C:\Users\jmalar1\AppData\Local\Programs\Python\Python36-32\python.exe)" project-jdk-type="Python SDK" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be good to revert changes in local work space files, like this.
@@ -88,7 +88,7 @@ def build(elements, subscript_dict, namespace, outfile_name): | |||
if outfile_name == 'return': | |||
return text | |||
|
|||
with open(outfile_name, 'w') as out: | |||
with open(outfile_name, 'w', encoding='UTF-8') as out: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This changes is not affect files with default asci encoding?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It shouldn't, as this is internal PySD file writing and reading anyways (PySD should be the only one to read this, apart from a Python interpreter, and the latter can read UTF 8 in any case). If a model has only ascii characters, these will save with no problem as UTF 8 (and Vensim model files are already saved as UTF 8 by default).
pysd/py_backend/utils.py
Outdated
@@ -262,14 +262,14 @@ def make_python_identifier(string, namespace=None, reserved_words=None, | |||
|
|||
if convert == 'hex': | |||
# Convert invalid characters to hex | |||
s = ''.join([c.encode("hex") if re.findall('[^0-9a-zA-Z_]', c) else c for c in s]) | |||
s = ''.join([c.encode("hex") if re.findall('[^\p{l}\p{m}\p{n}_]', c) else c for c in s]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure that this regex is a correct... By regex101 service it match any sequence...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is fine; it uses unicode character blocks (which regex101.com does not understand, nor does the python re module, which is why I had to switch to the regex module). It should not match = ( ) and other non-python identifier sequences.Which regex service are you using?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the revisions!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'm use regex in python mode to verify that.. But I'm completelly understand this syntax of regex... It will be good to add some comments and add additional tests to check that nothing is broken.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added comments to describe what this regex is doing. As for additional tests, I think that the Unicode (and non-unicode) tests already present should verify that this works?
# This takes care of models with Unicode variable names | ||
basic_id = id_start (id_continue / ~r"[\s]")* | ||
id_start = ~r"[A-Z]" / ~r"[a-z]" / "\u00AA" / "\u00B5" / "\u00BA" / ~r"[\u00C0-\u00D6]" / ~r"[\u00D8-\u00F6]" / ~r"[\u00F8-\u01BA]" / "\u01BB" / ~r"[\u01BC-\u01BF]" / ~r"[\u01C0-\u01C3]" / ~r"[\u01C4-\u0241]" / ~r"[\u0250-\u02AF]" / ~r"[\u02B0-\u02C1]" / ~r"[\u02C6-\u02D1]" / ~r"[\u02E0-\u02E4]" / "\u02EE" / "\u037A" / "\u0386" / ~r"[\u0388-\u038A]" / "\u038C" / ~r"[\u038E-\u03A1]" / ~r"[\u03A3-\u03CE]" / ~r"[\u03D0-\u03F5]" / ~r"[\u03F7-\u0481]" / ~r"[\u048A-\u04CE]" / ~r"[\u04D0-\u04F9]" / ~r"[\u0500-\u050F]" / ~r"[\u0531-\u0556]" / "\u0559" / ~r"[\u0561-\u0587]" / ~r"[\u05D0-\u05EA]" / ~r"[\u05F0-\u05F2]" / ~r"[\u0621-\u063A]" / "\u0640" / ~r"[\u0641-\u064A]" / ~r"[\u066E-\u066F]" / ~r"[\u0671-\u06D3]" / "\u06D5" / ~r"[\u06E5-\u06E6]" / ~r"[\u06EE-\u06EF]" / ~r"[\u06FA-\u06FC]" / "\u06FF" / "\u0710" / ~r"[\u0712-\u072F]" / ~r"[\u074D-\u076D]" / ~r"[\u0780-\u07A5]" / "\u07B1" / ~r"[\u0904-\u0939]" / "\u093D" / "\u0950" / ~r"[\u0958-\u0961]" / "\u097D" / ~r"[\u0985-\u098C]" / ~r"[\u098F-\u0990]" / ~r"[\u0993-\u09A8]" / ~r"[\u09AA-\u09B0]" / "\u09B2" / ~r"[\u09B6-\u09B9]" / "\u09BD" / "\u09CE" / ~r"[\u09DC-\u09DD]" / ~r"[\u09DF-\u09E1]" / ~r"[\u09F0-\u09F1]" / ~r"[\u0A05-\u0A0A]" / ~r"[\u0A0F-\u0A10]" / ~r"[\u0A13-\u0A28]" / ~r"[\u0A2A-\u0A30]" / ~r"[\u0A32-\u0A33]" / ~r"[\u0A35-\u0A36]" / ~r"[\u0A38-\u0A39]" / ~r"[\u0A59-\u0A5C]" / "\u0A5E" / ~r"[\u0A72-\u0A74]" / ~r"[\u0A85-\u0A8D]" / ~r"[\u0A8F-\u0A91]" / ~r"[\u0A93-\u0AA8]" / ~r"[\u0AAA-\u0AB0]" / ~r"[\u0AB2-\u0AB3]" / ~r"[\u0AB5-\u0AB9]" / "\u0ABD" / "\u0AD0" / ~r"[\u0AE0-\u0AE1]" / ~r"[\u0B05-\u0B0C]" / ~r"[\u0B0F-\u0B10]" / ~r"[\u0B13-\u0B28]" / ~r"[\u0B2A-\u0B30]" / ~r"[\u0B32-\u0B33]" / ~r"[\u0B35-\u0B39]" / "\u0B3D" / ~r"[\u0B5C-\u0B5D]" / ~r"[\u0B5F-\u0B61]" / "\u0B71" / "\u0B83" / ~r"[\u0B85-\u0B8A]" / ~r"[\u0B8E-\u0B90]" / ~r"[\u0B92-\u0B95]" / ~r"[\u0B99-\u0B9A]" / "\u0B9C" / ~r"[\u0B9E-\u0B9F]" / ~r"[\u0BA3-\u0BA4]" / ~r"[\u0BA8-\u0BAA]" / ~r"[\u0BAE-\u0BB9]" / ~r"[\u0C05-\u0C0C]" / ~r"[\u0C0E-\u0C10]" / ~r"[\u0C12-\u0C28]" / ~r"[\u0C2A-\u0C33]" / ~r"[\u0C35-\u0C39]" / ~r"[\u0C60-\u0C61]" / ~r"[\u0C85-\u0C8C]" / ~r"[\u0C8E-\u0C90]" / ~r"[\u0C92-\u0CA8]" / ~r"[\u0CAA-\u0CB3]" / ~r"[\u0CB5-\u0CB9]" / "\u0CBD" / "\u0CDE" / ~r"[\u0CE0-\u0CE1]" / ~r"[\u0D05-\u0D0C]" / ~r"[\u0D0E-\u0D10]" / ~r"[\u0D12-\u0D28]" / ~r"[\u0D2A-\u0D39]" / ~r"[\u0D60-\u0D61]" / ~r"[\u0D85-\u0D96]" / ~r"[\u0D9A-\u0DB1]" / ~r"[\u0DB3-\u0DBB]" / "\u0DBD" / ~r"[\u0DC0-\u0DC6]" / ~r"[\u0E01-\u0E30]" / ~r"[\u0E32-\u0E33]" / ~r"[\u0E40-\u0E45]" / "\u0E46" / ~r"[\u0E81-\u0E82]" / "\u0E84" / ~r"[\u0E87-\u0E88]" / "\u0E8A" / "\u0E8D" / ~r"[\u0E94-\u0E97]" / ~r"[\u0E99-\u0E9F]" / ~r"[\u0EA1-\u0EA3]" / "\u0EA5" / "\u0EA7" / ~r"[\u0EAA-\u0EAB]" / ~r"[\u0EAD-\u0EB0]" / ~r"[\u0EB2-\u0EB3]" / "\u0EBD" / ~r"[\u0EC0-\u0EC4]" / "\u0EC6" / ~r"[\u0EDC-\u0EDD]" / "\u0F00" / ~r"[\u0F40-\u0F47]" / ~r"[\u0F49-\u0F6A]" / ~r"[\u0F88-\u0F8B]" / ~r"[\u1000-\u1021]" / ~r"[\u1023-\u1027]" / ~r"[\u1029-\u102A]" / ~r"[\u1050-\u1055]" / ~r"[\u10A0-\u10C5]" / ~r"[\u10D0-\u10FA]" / "\u10FC" / ~r"[\u1100-\u1159]" / ~r"[\u115F-\u11A2]" / ~r"[\u11A8-\u11F9]" / ~r"[\u1200-\u1248]" / ~r"[\u124A-\u124D]" / ~r"[\u1250-\u1256]" / "\u1258" / ~r"[\u125A-\u125D]" / ~r"[\u1260-\u1288]" / ~r"[\u128A-\u128D]" / ~r"[\u1290-\u12B0]" / ~r"[\u12B2-\u12B5]" / ~r"[\u12B8-\u12BE]" / "\u12C0" / ~r"[\u12C2-\u12C5]" / ~r"[\u12C8-\u12D6]" / ~r"[\u12D8-\u1310]" / ~r"[\u1312-\u1315]" / ~r"[\u1318-\u135A]" / ~r"[\u1380-\u138F]" / ~r"[\u13A0-\u13F4]" / ~r"[\u1401-\u166C]" / ~r"[\u166F-\u1676]" / ~r"[\u1681-\u169A]" / ~r"[\u16A0-\u16EA]" / ~r"[\u16EE-\u16F0]" / ~r"[\u1700-\u170C]" / ~r"[\u170E-\u1711]" / ~r"[\u1720-\u1731]" / ~r"[\u1740-\u1751]" / ~r"[\u1760-\u176C]" / ~r"[\u176E-\u1770]" / ~r"[\u1780-\u17B3]" / "\u17D7" / "\u17DC" / ~r"[\u1820-\u1842]" / "\u1843" / ~r"[\u1844-\u1877]" / ~r"[\u1880-\u18A8]" / ~r"[\u1900-\u191C]" / ~r"[\u1950-\u196D]" / ~r"[\u1970-\u1974]" / ~r"[\u1980-\u19A9]" / ~r"[\u19C1-\u19C7]" / ~r"[\u1A00-\u1A16]" / ~r"[\u1D00-\u1D2B]" / ~r"[\u1D2C-\u1D61]" / ~r"[\u1D62-\u1D77]" / "\u1D78" / ~r"[\u1D79-\u1D9A]" / ~r"[\u1D9B-\u1DBF]" / ~r"[\u1E00-\u1E9B]" / ~r"[\u1EA0-\u1EF9]" / ~r"[\u1F00-\u1F15]" / ~r"[\u1F18-\u1F1D]" / ~r"[\u1F20-\u1F45]" / ~r"[\u1F48-\u1F4D]" / ~r"[\u1F50-\u1F57]" / "\u1F59" / "\u1F5B" / "\u1F5D" / ~r"[\u1F5F-\u1F7D]" / ~r"[\u1F80-\u1FB4]" / ~r"[\u1FB6-\u1FBC]" / "\u1FBE" / ~r"[\u1FC2-\u1FC4]" / ~r"[\u1FC6-\u1FCC]" / ~r"[\u1FD0-\u1FD3]" / ~r"[\u1FD6-\u1FDB]" / ~r"[\u1FE0-\u1FEC]" / ~r"[\u1FF2-\u1FF4]" / ~r"[\u1FF6-\u1FFC]" / "\u2071" / "\u207F" / ~r"[\u2090-\u2094]" / "\u2102" / "\u2107" / ~r"[\u210A-\u2113]" / "\u2115" / "\u2118" / ~r"[\u2119-\u211D]" / "\u2124" / "\u2126" / "\u2128" / ~r"[\u212A-\u212D]" / "\u212E" / ~r"[\u212F-\u2131]" / ~r"[\u2133-\u2134]" / ~r"[\u2135-\u2138]" / "\u2139" / ~r"[\u213C-\u213F]" / ~r"[\u2145-\u2149]" / ~r"[\u2160-\u2183]" / ~r"[\u2C00-\u2C2E]" / ~r"[\u2C30-\u2C5E]" / ~r"[\u2C80-\u2CE4]" / ~r"[\u2D00-\u2D25]" / ~r"[\u2D30-\u2D65]" / "\u2D6F" / ~r"[\u2D80-\u2D96]" / ~r"[\u2DA0-\u2DA6]" / ~r"[\u2DA8-\u2DAE]" / ~r"[\u2DB0-\u2DB6]" / ~r"[\u2DB8-\u2DBE]" / ~r"[\u2DC0-\u2DC6]" / ~r"[\u2DC8-\u2DCE]" / ~r"[\u2DD0-\u2DD6]" / ~r"[\u2DD8-\u2DDE]" / "\u3005" / "\u3006" / "\u3007" / ~r"[\u3021-\u3029]" / ~r"[\u3031-\u3035]" / ~r"[\u3038-\u303A]" / "\u303B" / "\u303C" / ~r"[\u3041-\u3096]" / ~r"[\u309B-\u309C]" / ~r"[\u309D-\u309E]" / "\u309F" / ~r"[\u30A1-\u30FA]" / ~r"[\u30FC-\u30FE]" / "\u30FF" / ~r"[\u3105-\u312C]" / ~r"[\u3131-\u318E]" / ~r"[\u31A0-\u31B7]" / ~r"[\u31F0-\u31FF]" / ~r"[\u3400-\u4DB5]" / ~r"[\u4E00-\u9FBB]" / ~r"[\uA000-\uA014]" / "\uA015" / ~r"[\uA016-\uA48C]" / ~r"[\uA800-\uA801]" / ~r"[\uA803-\uA805]" / ~r"[\uA807-\uA80A]" / ~r"[\uA80C-\uA822]" / ~r"[\uAC00-\uD7A3]" / ~r"[\uF900-\uFA2D]" / ~r"[\uFA30-\uFA6A]" / ~r"[\uFA70-\uFAD9]" / ~r"[\uFB00-\uFB06]" / ~r"[\uFB13-\uFB17]" / "\uFB1D" / ~r"[\uFB1F-\uFB28]" / ~r"[\uFB2A-\uFB36]" / ~r"[\uFB38-\uFB3C]" / "\uFB3E" / ~r"[\uFB40-\uFB41]" / ~r"[\uFB43-\uFB44]" / ~r"[\uFB46-\uFBB1]" / ~r"[\uFBD3-\uFD3D]" / ~r"[\uFD50-\uFD8F]" / ~r"[\uFD92-\uFDC7]" / ~r"[\uFDF0-\uFDFB]" / ~r"[\uFE70-\uFE74]" / ~r"[\uFE76-\uFEFC]" / ~r"[\uFF21-\uFF3A]" / ~r"[\uFF41-\uFF5A]" / ~r"[\uFF66-\uFF6F]" / "\uFF70" / ~r"[\uFF71-\uFF9D]" / ~r"[\uFF9E-\uFF9F]" / ~r"[\uFFA0-\uFFBE]" / ~r"[\uFFC2-\uFFC7]" / ~r"[\uFFCA-\uFFCF]" / ~r"[\uFFD2-\uFFD7]" / ~r"[\uFFDA-\uFFDC]" | ||
id_continue = id_start / ~r"[0-9]" / ~r"[\u0300-\u036F]" / ~r"[\u0483-\u0486]" / ~r"[\u0591-\u05B9]" / ~r"[\u05BB-\u05BD]" / "\u05BF" / ~r"[\u05C1-\u05C2]" / ~r"[\u05C4-\u05C5]" / "\u05C7" / ~r"[\u0610-\u0615]" / ~r"[\u064B-\u065E]" / ~r"[\u0660-\u0669]" / "\u0670" / ~r"[\u06D6-\u06DC]" / ~r"[\u06DF-\u06E4]" / ~r"[\u06E7-\u06E8]" / ~r"[\u06EA-\u06ED]" / ~r"[\u06F0-\u06F9]" / "\u0711" / ~r"[\u0730-\u074A]" / ~r"[\u07A6-\u07B0]" / ~r"[\u0901-\u0902]" / "\u0903" / "\u093C" / ~r"[\u093E-\u0940]" / ~r"[\u0941-\u0948]" / ~r"[\u0949-\u094C]" / "\u094D" / ~r"[\u0951-\u0954]" / ~r"[\u0962-\u0963]" / ~r"[\u0966-\u096F]" / "\u0981" / ~r"[\u0982-\u0983]" / "\u09BC" / ~r"[\u09BE-\u09C0]" / ~r"[\u09C1-\u09C4]" / ~r"[\u09C7-\u09C8]" / ~r"[\u09CB-\u09CC]" / "\u09CD" / "\u09D7" / ~r"[\u09E2-\u09E3]" / ~r"[\u09E6-\u09EF]" / ~r"[\u0A01-\u0A02]" / "\u0A03" / "\u0A3C" / ~r"[\u0A3E-\u0A40]" / ~r"[\u0A41-\u0A42]" / ~r"[\u0A47-\u0A48]" / ~r"[\u0A4B-\u0A4D]" / ~r"[\u0A66-\u0A6F]" / ~r"[\u0A70-\u0A71]" / ~r"[\u0A81-\u0A82]" / "\u0A83" / "\u0ABC" / ~r"[\u0ABE-\u0AC0]" / ~r"[\u0AC1-\u0AC5]" / ~r"[\u0AC7-\u0AC8]" / "\u0AC9" / ~r"[\u0ACB-\u0ACC]" / "\u0ACD" / ~r"[\u0AE2-\u0AE3]" / ~r"[\u0AE6-\u0AEF]" / "\u0B01" / ~r"[\u0B02-\u0B03]" / "\u0B3C" / "\u0B3E" / "\u0B3F" / "\u0B40" / ~r"[\u0B41-\u0B43]" / ~r"[\u0B47-\u0B48]" / ~r"[\u0B4B-\u0B4C]" / "\u0B4D" / "\u0B56" / "\u0B57" / ~r"[\u0B66-\u0B6F]" / "\u0B82" / ~r"[\u0BBE-\u0BBF]" / "\u0BC0" / ~r"[\u0BC1-\u0BC2]" / ~r"[\u0BC6-\u0BC8]" / ~r"[\u0BCA-\u0BCC]" / "\u0BCD" / "\u0BD7" / ~r"[\u0BE6-\u0BEF]" / ~r"[\u0C01-\u0C03]" / ~r"[\u0C3E-\u0C40]" / ~r"[\u0C41-\u0C44]" / ~r"[\u0C46-\u0C48]" / ~r"[\u0C4A-\u0C4D]" / ~r"[\u0C55-\u0C56]" / ~r"[\u0C66-\u0C6F]" / ~r"[\u0C82-\u0C83]" / "\u0CBC" / "\u0CBE" / "\u0CBF" / ~r"[\u0CC0-\u0CC4]" / "\u0CC6" / ~r"[\u0CC7-\u0CC8]" / ~r"[\u0CCA-\u0CCB]" / ~r"[\u0CCC-\u0CCD]" / ~r"[\u0CD5-\u0CD6]" / ~r"[\u0CE6-\u0CEF]" / ~r"[\u0D02-\u0D03]" / ~r"[\u0D3E-\u0D40]" / ~r"[\u0D41-\u0D43]" / ~r"[\u0D46-\u0D48]" / ~r"[\u0D4A-\u0D4C]" / "\u0D4D" / "\u0D57" / ~r"[\u0D66-\u0D6F]" / ~r"[\u0D82-\u0D83]" / "\u0DCA" / ~r"[\u0DCF-\u0DD1]" / ~r"[\u0DD2-\u0DD4]" / "\u0DD6" / ~r"[\u0DD8-\u0DDF]" / ~r"[\u0DF2-\u0DF3]" / "\u0E31" / ~r"[\u0E34-\u0E3A]" / ~r"[\u0E47-\u0E4E]" / ~r"[\u0E50-\u0E59]" / "\u0EB1" / ~r"[\u0EB4-\u0EB9]" / ~r"[\u0EBB-\u0EBC]" / ~r"[\u0EC8-\u0ECD]" / ~r"[\u0ED0-\u0ED9]" / ~r"[\u0F18-\u0F19]" / ~r"[\u0F20-\u0F29]" / "\u0F35" / "\u0F37" / "\u0F39" / ~r"[\u0F3E-\u0F3F]" / ~r"[\u0F71-\u0F7E]" / "\u0F7F" / ~r"[\u0F80-\u0F84]" / ~r"[\u0F86-\u0F87]" / ~r"[\u0F90-\u0F97]" / ~r"[\u0F99-\u0FBC]" / "\u0FC6" / "\u102C" / ~r"[\u102D-\u1030]" / "\u1031" / "\u1032" / ~r"[\u1036-\u1037]" / "\u1038" / "\u1039" / ~r"[\u1040-\u1049]" / ~r"[\u1056-\u1057]" / ~r"[\u1058-\u1059]" / "\u135F" / ~r"[\u1369-\u1371]" / ~r"[\u1712-\u1714]" / ~r"[\u1732-\u1734]" / ~r"[\u1752-\u1753]" / ~r"[\u1772-\u1773]" / "\u17B6" / ~r"[\u17B7-\u17BD]" / ~r"[\u17BE-\u17C5]" / "\u17C6" / ~r"[\u17C7-\u17C8]" / ~r"[\u17C9-\u17D3]" / "\u17DD" / ~r"[\u17E0-\u17E9]" / ~r"[\u180B-\u180D]" / ~r"[\u1810-\u1819]" / "\u18A9" / ~r"[\u1920-\u1922]" / ~r"[\u1923-\u1926]" / ~r"[\u1927-\u1928]" / ~r"[\u1929-\u192B]" / ~r"[\u1930-\u1931]" / "\u1932" / ~r"[\u1933-\u1938]" / ~r"[\u1939-\u193B]" / ~r"[\u1946-\u194F]" / ~r"[\u19B0-\u19C0]" / ~r"[\u19C8-\u19C9]" / ~r"[\u19D0-\u19D9]" / ~r"[\u1A17-\u1A18]" / ~r"[\u1A19-\u1A1B]" / ~r"[\u1DC0-\u1DC3]" / ~r"[\u203F-\u2040]" / "\u2054" / ~r"[\u20D0-\u20DC]" / "\u20E1" / ~r"[\u20E5-\u20EB]" / ~r"[\u302A-\u302F]" / ~r"[\u3099-\u309A]" / "\uA802" / "\uA806" / "\uA80B" / ~r"[\uA823-\uA824]" / ~r"[\uA825-\uA826]" / "\uA827" / "\uFB1E" / ~r"[\uFE00-\uFE0F]" / ~r"[\uFE20-\uFE23]" / ~r"[\uFE33-\uFE34]" / ~r"[\uFE4D-\uFE4F]" / ~r"[\uFF10-\uFF19]" / "\uFF3F" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be replaced by simplified version, like here: https://github.com/JamesPHoughton/pysd/pull/179/files#diff-9ad48c059f7f9a044e6df23dc3316322R267 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I doubt it (at least, if so, I am not sure how to do it!) The simplified version works with the regex module only (and does not work with the re module), and unfortunately I do not think that it works in a Parsimonious grammar either (as here). If I am wrong please let me know!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, you are right...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should put in a request to the maintainer of parsimonious - I wonder how big of a fix it would be in his package? We could at least open an issue. @julienmalard, you're probably best qualified to do that. Do you have an interest in doing so? Then if it happens, we can simplify here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just added a small function based on your recommendation to use chardet to guess the encoding. It seems to work!
I think everything is working now! The Unicode Decode error still in the TravisCI output is due to the fact that the test output file seems to not be encoded in Unicode, even for the Unicode test. The solution would be to add a workaround to try to read what type of encoding the file is in, but as I have heard that this is notoriously hard I hope that the current state will be sufficient for this pull request! In any case, unicode models run without any problem. |
@julienmalard mostly done with this PR. Still have issues with test coverage. Build time increased in 2 times! And test coveraged reduced by 1.5%. Please look closer at test results. And also you can check child repo with tests to fix following:
You can check this repo with test models and maybe fix it https://github.com/SDXorg/test-models And then I think we can complete with this PR and close few issues about Unicode 😃 Regards, |
@@ -831,7 +863,7 @@ def translate_vensim(mdl_file): | |||
#>>> translate_vensim('../../tests/test-models/tests/limits/test_limits.mdl') | |||
|
|||
""" | |||
with open(mdl_file, 'rU') as in_file: | |||
with open(mdl_file, 'rU', encoding='UTF-8') as in_file: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this break if users import models they've translated with older versions of pysd? I know some people have made modifications between translation steps...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I understand of encodings, it will only break if a user had modified a model file to include non-ascii characters and subsequently saved it in a non-unicode encoding. Should be a rare case?
Sorry for the radio silence. Peter is now sleeping through the night, so hopefully I'll have a brain soon. =) What was the upshot with the unicode test on Travis? It sounds like it's working when you run locally, just not when Travis runs it? The error seems to come on the pandas call to read the CSV. Is there a platform-specific use of pandas we should be aware of and adapt to, or do they have an issue? We might be able to do something like was done here in our test runner script. @julienmalard, if it's alright with you I'll let you make that change, and then you can have a clean PR with all your tests passing on Travis too. |
Hello,
Nice to hear from you and congratulations! :) Thanks for the tips and comments; I think they will fix the remaining issues. I have a funding proposal due on Friday and will get back to this soon after!
…-Julien
|
Sweet. Thank you! I'll bring that in. Looks like somewhere along the way we broke the 2.7 backend: https://travis-ci.org/JamesPHoughton/pysd/jobs/391916376#L610-L611 James |
Thank you both @alexprey and @JamesPHoughton for all your help and kind support on my first ever approved pull request! |
Thanks @julienmalard! Looking forward to the next one. |
This seems to be working with Unicode variable names for Vensim .mdl format models!