Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mangling makeover #1517

Merged
merged 14 commits into from
Mar 13, 2018
Merged

Mangling makeover #1517

merged 14 commits into from
Mar 13, 2018

Conversation

Kodiologist
Copy link
Member

@Kodiologist Kodiologist commented Feb 26, 2018

Reopened for the new merge policy.

This pull request overhauls mangling such that HySymbols remember their original names (it is no longer the case that (= 'a-b 'a_b)) and all Python variables have names that are legal Python identifiers. The latter accomplishes most of what we need to ensure that hy2py produces real Python code.

The major changes are:

  • Mangling happens at compile-time, not read-time.
  • Mangling rules are different.
    • *foo*FOO and foo!foo_bang are gone.
      • The first one seems misleading, and the second rarely useful.
    • The Punycode transformation (which was only enabled for Python 2 anyway) is gone.
    • foo-barfoo_bar and foo?is_foo remain.
    • Names that still aren't legal Python identifiers (either because they contain illegal characters or because they coincide with a Python reserved word) are prepended with hyx_, and each illegal character is replaced with ΔfooΔ (or for Python 2, XfooX), where foo is the Unicode character name in lowercase with spaces replaced by underscores and hyphens replaced by "H"es. If the character doesn't have a name, we use "U" followed by its code point in lowercase hexadecimal.
    • Any added hyx_ or is_ is added after any leading underscores (because leading underscores can have magical effects in Python).
  • The names of keyword arguments are fully mangled.

The below things remain to be done, but I figured I'd get feeback before continuing.

  • Watch out if the REPL variable _ is confused with the shadow subtraction operator -
  • In the unmangle function, undo hyx_ when necessary; otherwise, the distinct variables "hyx_X61X" and "a" will both unmangle to "a" Not worth it.
  • Perhaps make the mangle function stringify the input, so the caller doesn't have to
  • Rename the mangling functions mangle and unmangle and add them to core (fixes add mangle function to public api #1336)
    • Testing
    • Documentation
  • Document mangling in general
  • Use unmangle instead of hyify in core
  • NEWS

@Kodiologist Kodiologist requested a review from refi64 February 26, 2018 19:34
hy/_compat.py Outdated
if x.rstrip() != x:
return False
import tokenize as T
from StringIO import StringIO
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should prefer io.StringIO/io.BytesIO over the StringIO package

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@Kodiologist Kodiologist force-pushed the mangling-makeover branch 4 times, most recently from 2c543a8 to 027dc47 Compare March 5, 2018 00:47
This means that a HySymbol remembers its original name. That is, `a-b` and `a_b` are different symbols although `(setv a-b 1)` and `(setv a_b 1)` set the same variable (namely, `a_b`).

Most of the edits in this commit are to switch underscores to hyphens in places where mangling hasn't happened yet.

I removed some lexer tests since the lexer no longer does any mangling.
`_`, as a variable, is now the shadow subtraction operator.
@Kodiologist Kodiologist merged commit b023ebd into hylang:master Mar 13, 2018
@gilch
Copy link
Member

gilch commented Mar 13, 2018

This really should have had a link to #1458 where much of the original discussion happened.

One-approval merges should say so here in the conversation, for the record.
As a courtesy I'd like to ask that you give a one day warning before such merges. I'd have requested that in the original rule change, but never got the chance.

Now that we've got the mangling makeover format settled, it's time to rethink the gensym format as discussed in #1458. I'd have mentioned this in time for the merge had I gotten warning.

@Kodiologist
Copy link
Member Author

This really should have had a link to #1458 where much of the original discussion happened.

Fair enough.

One-approval merges should say so here in the conversation, for the record.

It isn't hard to figure out if something was validly merged (and, if so, under what rules it was merged), is it?

As a courtesy I'd like to ask that you give a one day warning before such merges.

But the creation of a PR is itself a 14-day warning. In theory, if you wanted a 1-day warning, you could set an alarm for yourself for 13 days hence (we could even write a bot that does this automatically; in Hy, no less), but why wait to do whatever it is you were going to do?

Now that we've got the mangling makeover format settled, it's time to rethink the gensym format as discussed in #1458. I'd have mentioned this in time for the merge had I gotten warning.

No worries, that can be done in a new issue or PR. This PR is already big, anyway.

@Kodiologist
Copy link
Member Author

we could even write a bot that does this automatically

Come to think of it, this could be useful as a reminder to PR authors to check that the PR is in a mergeworthy state.

@gilch gilch mentioned this pull request Mar 13, 2018
@vodik
Copy link
Contributor

vodik commented Mar 13, 2018

Heads up, we'll have to be careful with this at release time. I found I had to flush all my compiled pyc files.

@Kodiologist
Copy link
Member Author

"Be careful" in the sense of warning users to remove stale bytecode? NEWS could be a good place to put such a warning.

@vodik
Copy link
Contributor

vodik commented Mar 14, 2018

Yes.

@gilch
Copy link
Member

gilch commented Mar 14, 2018

Shouldn't the installer replace all the stale .pyc's automatically? It's fine to warn users about the issue #1324 since it will come up in their own development. But we shouldn't ask the users to do an extra installation step that we could have automated.

@vodik
Copy link
Contributor

vodik commented Mar 14, 2018

Well, its not only about Hy, right?

Lets say I package Hy for Arch Linux (which there exists a package in AUR), and some other useful Hy library/tool (honestly don't know of any, but for examples sake, suppose there is).

Because the best practice for packaging Python like this is to also package the .pyc files, the eventual update to Hy to 0.15 would also mean all other Hy packages have to be rebuilt.

Not the worst thing in the world to require, but its a kind of change that really needs to be announced.

I'll put something up for adding it to the NEWS file tonight.

(setv out (eval `(do
(setv ~sym 10)
[foo? is_foo])))
(assert (= out [10 10])))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Kodiologist what's the rational for late mangling?

Just asking because its easier to make #1504 nice if I mangle symbols before comparing:

(defmacro foo [symbol]
  (if (= symbol 'foobar?)
      ...
	  ...))

Means this macro work on either 'foobar? and is_foobar. Not sure if there's something I'm missing, so asking.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So that you have access to the literal symbol that was entered, in case you want it. If you want to compare things post-mangling, you can always write code like (= (mangle symbol) (mangle 'foobar?)). By contrast, before this PR, when symbols were mangled at read-time, there was no way to tell whether the user entered foobar? or is_foobar, if you wanted to do that. After all, not every symbol ends up as a Python name, so not every symbol eventually needs to be mangled.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The specific feature request was #360.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

@ekaschalk
Copy link
Contributor

The new mangle has different behavior on the empty string, namely an assertion error whereas before it returned back the empty string.

Is there a reasoning for this? Note that unmangle on an empty string doesn't fail, returning back an empty string.

If not, the blank string assertion error would be helpful to have dropped.

@Kodiologist
Copy link
Member Author

I'm not sure what unmangle should do with an empty string, but mangle errors on an empty string because it's supposed to always return a valid Python identifier, and we have no mangling rule to produce a Python identifier for the empty symbol.

@Kodiologist Kodiologist deleted the mangling-makeover branch January 20, 2019 14:42
@Kodiologist Kodiologist restored the mangling-makeover branch January 20, 2019 14:42
@Kodiologist Kodiologist deleted the mangling-makeover branch January 20, 2019 14:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants