Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rethink mangling #1173

Closed
gilch opened this issue Dec 12, 2016 · 9 comments · Fixed by #1517
Closed

Rethink mangling #1173

gilch opened this issue Dec 12, 2016 · 9 comments · Fixed by #1517

Comments

@gilch
Copy link
Member

gilch commented Dec 12, 2016

Python since version 3.0 allows much of Unicode in its identifiers. Hy's punycode mangling therefore serves no purpose but to make Unicode identifiers harder to read and Hy-Python interop more difficult. Maybe we should remove this "feature" altogether.

Some Hy features are already restricted to Python 3 or later. Unicode identifiers could be another such feature. It's probably a minority of users that need Python 2 support at this point anyway (and this will only become more true over time). They'll be able to make do with ASCII.

On the other hand, mangling of ASCII characters can be improved. Hy already converts - to _, to allow more Lispy names. Hy code would look very different without this, but it does cause some problems.

The other rules are even worse. The earmuff conversion to all caps is of dubious value. It usually indicates a dynamic variable in Lisp, but Hy doesn't have those. (maybe we could add them hylang/hyrule#51) I'd like to remove it.

Hy also converts a trailing ! to trailing _bang and a trailing ? to a leading is_. But if these characters appear anywhere else, they don't get converted. The AST mostly doesn't care when the results are not valid Python identifiers, but we have no guarantee this will continue. It's already been an issue for getargspec #1172 . There are other ASCII characters that are allowed in Hy symbols (like +) but never get converted at all.

Clojure also has to use Java identifiers, but it has a more consistent approach that we might consider emulating. Java's identifier rules are almost as strict as Python 2.

@Kodiologist
Copy link
Member

Python since version 3.0 allows much of Unicode in its identifiers.

By Lisp standards, it's pretty idiosyncratic. For example, λ is legal but isn't.

Punycode mangling seems to be broken or inactive at the moment, since '⚘ returns , not hy_w7h as documented.

I never use earmuffs, so I would support that removal. Lisps are traditionally case-insensitive, but Python and hence Hy is case-sensitive, so names in all caps are just fine.

The conversions of trailing ? and ! seem fine to me except for the annoyance you mentioned in #1115, and to be honest, I feel as if we ought to send a bug to the Python people pointing out that the inconsistency. It seems pretty obvious that Python should either have is_integer and is_lower, or isinteger and islower, but not one of each.

@gilch
Copy link
Member Author

gilch commented Dec 14, 2016

By Lisp standards, it's pretty idiosyncratic. For example, λ is legal but ⚘ isn't.

Um, which Lisp are we talking about? I don't code in Unicode much. Do we want emojis and such in Hy identifiers? Or just the written word for other languages? Does Python allow any mathematics symbols? We might want those too.

Are the uncommon extra symbols worth making all the mangled symbols impossible for a human to read for non-latin alphabets?

If we really want both human readable mangled symbols and emojis, then punycode is out, since it all has to be ASCII alphabetic. We'd have to come up with some other encoding scheme.

I feel as if we ought to send a bug to the Python people pointing out that the inconsistency. It seems pretty obvious that Python should either have is_integer and is_lower, or isinteger and islower, but not one of each.

Feel free to send that bug, but it's not going to get us anywhere. Backwards compatibility would be more important to them at this point.

@Kodiologist
Copy link
Member

My understanding is that in most Lisps, any character other than the handful that have special meaning (like parentheses and whitespace) are legal characters in a symbol. By contrast, Python 3 permits only characters with certain Unicode character properties. See https://docs.python.org/3/reference/lexical_analysis.html#identifiers

@Kodiologist
Copy link
Member

I created http://bugs.python.org/issue29088.

@Kodiologist
Copy link
Member

It was closed in record time. I would've thought they could create temporary aliases to the old names if backwards compatibility was a concern, but hysterical raisins strike again.

@zackmdavis
Copy link
Contributor

It's very sad; the right time to change the standard library to uniformly use is_ would have been in Python 3.0, but now our one and only one chance to backwards-incompatibly break the world has been spent (as the Python community has learned the hard way that the world doesn't necessarily re-form afterwards).

@ghost
Copy link

ghost commented Feb 9, 2017

I like earmuffs.
There is however a problem: symbols with earmuffs are transformed in their upper-cased version, whereas upper-cased symbols from Python code stay the same when imported to Hy.

(setv *test* 42) gives TEST = 42
And if I import a Python module containing TEST = 42, the translated Hy symbol is still TEST.

I think this is inconsistent.

@Kodiologist
Copy link
Member

That's how mangling is supposed to work. In Hy, you can write TEST as *test* whether you defined it originally in Hy or in Python. Would you have a name TEST imported from a Python module renamed to *test*, so that (import [foo [TEST]]) would be compiled to something like from foo import TEST; globals()["*test*"] = TEST; del TEST; ? Then it would be very hard to access the imported name in Hy, because the symbol *test* wherever it appears in Hy code is translated to TEST: Hy's mangling gets in the way of accessing a variable actually named *test*.

@ghost
Copy link

ghost commented Feb 10, 2017

Oh, my bad.
I didn’t realise that we could in fact access Python‘s SYMBOL as either SYMBOL or *symbol*, since well, it isn’t really documented yet ;)
And the same goes for underscore and dashes. Oh. Well, that’s nice, and I am going to write it down.

Thanks !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants