Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework handling of . tokenization and dotted operator calls #90

Closed
c42f opened this issue Sep 2, 2022 · 4 comments · Fixed by #151
Closed

Rework handling of . tokenization and dotted operator calls #90

c42f opened this issue Sep 2, 2022 · 4 comments · Fixed by #151
Labels
enhancement New feature or request trees

Comments

@c42f
Copy link
Member

c42f commented Sep 2, 2022

The handling of . in the tokenizer / parser is pretty wonky / inconsistent because the tokenization of . is context-dependent.

bump_split(), in particular is quite ugly and shouldn't exist:

  • ... is tokenized into K"..." which is usually correct. But incorrect for import/using statements as in import ...A ==> (import (. . . . A)), necessitating the splitting of tokens with bump_split - this is ugly!
  • Dotted operators are tokenized with the dot as part of the operator. But sometimes we have to split them, as in standalone dotted operators .+ ==> (. +)

In addition, Expr is quite inconsistent about dotted infix calls vs dotted prefix calls. We should really fix this? Then we can remove the . from the operator names and treat it as separate syntax as it should be! (See also #88)

julia> dump(Meta.parse("a .+ b"))
Expr
  head: Symbol call
  args: Array{Any}((3,))
    1: Symbol .+
    2: Symbol a
    3: Symbol b

julia> dump(Meta.parse("f.(a, b)"))
Expr
  head: Symbol .
  args: Array{Any}((2,))
    1: Symbol f
    2: Expr
      head: Symbol tuple
      args: Array{Any}((2,))
        1: Symbol a
        2: Symbol b
@c42f c42f mentioned this issue Sep 2, 2022
33 tasks
@c42f c42f changed the title Rework handling of . tokenization and operator calls Rework handling of . tokenization and dotted operator calls Sep 2, 2022
@c42f
Copy link
Member Author

c42f commented Sep 7, 2022

I chatted to @JeffBezanson about this. We concluded that the sensible parsing for these is to have a separate dotcall head. So something like

  • a .+ b ==> (dotcall-i a + b) (ie, with the infix flag set)
  • f.(a,b) ==> (dotcall f a b)

That is, just do the same thing as call but with dotcall.

Alternatively, we could use the existing call head, and add a DOT_FLAG. Probably that's a bad idea though - there's a limited number of flags and this flag couldn't apply to most expressions.

@c42f c42f added the enhancement New feature or request label Sep 7, 2022
@c42f c42f added the trees label Sep 21, 2022
@c42f
Copy link
Member Author

c42f commented Oct 13, 2022

A few points after looking into this further

Also, syntactic operators like .&& and .= can't be dealt with using a dotcall kind, as they have their own kinds. Best option may be to leave these as-is with the DOTOP_FLAG flag, or to introduce additional dotted kinds for the small number of kinds involved here (this is the option used for Expr). The latter option might be preferable in some ways: forgetting to distinguish between K"=" and K".=" has been a source of several bugs in the parser already.

@c42f
Copy link
Member Author

c42f commented Oct 17, 2022

More difficulties which occur while looking into this:

Comparison chains

Comparison chains are a bit annoying to deal with. Currently we have a .< b < c parsing into (comparison a .< b < c) - here the dottedness is dealt with by lowering. Perhaps the best option here would be to do (comparison a (. <) b < c)? (This is pretty annoying in implementation, because this output is quite different from a .< b which will parse as (dotcall-i a < b) but they have the same prefix so we can't use plain bump/emit... Can be dealt with using K"TOMBSTONE" but ugh.)

Prefix calls

The best parsing of things like .*(a,b) is confusing. Should we emit (dotcall * a b)? But this form would seem to imply the syntax *.(a,b) (currently invalid syntax). Alternatively, the form (call (. *) a b) would probably be better (and consistent with the current parsing of :(.*); then deal with that in lowering. Another irritation of this, however, is being consistent with the parsing of :.* which is just a symbol .* rather than parsed as (. *).

@c42f
Copy link
Member Author

c42f commented Oct 26, 2022

I chatted to @JeffBezanson about this.

Jeff commented that generally, parsing standalone dot-prefixed operators such as .* into the expression (. *) makes sense. And that having :.* parse by itself is vestigial and should ideally have been an error - given that it can be seen as applying the . to an atom, but is not an atom itself.

For comparison chains and prefix calls, this means we should adopt the following parsings:

  • a .< b < c ==> (comparison a (. <) b < c)
  • .*(a,b) ==> (call (. *) a b)

I guess in this we need to think about the precedence of prefix-.. Currently it's parsed as a single token so the precedence is the highest possible (effectively, can be seen like the @ macro prefix as in parse_atom()) This suggests it's necessary to handle inside parse_atom for now. (Currently dot splitting of constructs like :(.*) are handled within parse_unary_call() ... so in principle it might make sense to have the same precedence as syntax like $a... ie, be parsed inside parse_unary_prefix()?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request trees
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant