Rework handling of `.` tokenization and dotted operator calls #90

c42f · 2022-09-02T00:49:26Z

The handling of . in the tokenizer / parser is pretty wonky / inconsistent because the tokenization of . is context-dependent.

bump_split(), in particular is quite ugly and shouldn't exist:

... is tokenized into K"..." which is usually correct. But incorrect for import/using statements as in import ...A ==> (import (. . . . A)), necessitating the splitting of tokens with bump_split - this is ugly!
Dotted operators are tokenized with the dot as part of the operator. But sometimes we have to split them, as in standalone dotted operators .+ ==> (. +)

In addition, Expr is quite inconsistent about dotted infix calls vs dotted prefix calls. We should really fix this? Then we can remove the . from the operator names and treat it as separate syntax as it should be! (See also #88)

julia> dump(Meta.parse("a .+ b"))
Expr
  head: Symbol call
  args: Array{Any}((3,))
    1: Symbol .+
    2: Symbol a
    3: Symbol b

julia> dump(Meta.parse("f.(a, b)"))
Expr
  head: Symbol .
  args: Array{Any}((2,))
    1: Symbol f
    2: Expr
      head: Symbol tuple
      args: Array{Any}((2,))
        1: Symbol a
        2: Symbol b

The text was updated successfully, but these errors were encountered:

c42f · 2022-09-07T02:48:09Z

I chatted to @JeffBezanson about this. We concluded that the sensible parsing for these is to have a separate dotcall head. So something like

a .+ b ==> (dotcall-i a + b) (ie, with the infix flag set)
f.(a,b) ==> (dotcall f a b)

That is, just do the same thing as call but with dotcall.

Alternatively, we could use the existing call head, and add a DOT_FLAG. Probably that's a bad idea though - there's a limited number of flags and this flag couldn't apply to most expressions.

c42f · 2022-10-13T22:17:22Z

A few points after looking into this further

There's some related discussion and implementation plan for the syntax heads and flags in
AST inconsistency between parsing of = vs kw #99 (comment)
Changing tokenization is actually a pain for the parser. Might be better to leave as-is, and have the parser split tokens even though that's ugly

Also, syntactic operators like .&& and .= can't be dealt with using a dotcall kind, as they have their own kinds. Best option may be to leave these as-is with the DOTOP_FLAG flag, or to introduce additional dotted kinds for the small number of kinds involved here (this is the option used for Expr). The latter option might be preferable in some ways: forgetting to distinguish between K"=" and K".=" has been a source of several bugs in the parser already.

c42f · 2022-10-17T05:49:32Z

More difficulties which occur while looking into this:

Comparison chains

Comparison chains are a bit annoying to deal with. Currently we have a .< b < c parsing into (comparison a .< b < c) - here the dottedness is dealt with by lowering. Perhaps the best option here would be to do (comparison a (. <) b < c)? (This is pretty annoying in implementation, because this output is quite different from a .< b which will parse as (dotcall-i a < b) but they have the same prefix so we can't use plain bump/emit... Can be dealt with using K"TOMBSTONE" but ugh.)

Prefix calls

The best parsing of things like .*(a,b) is confusing. Should we emit (dotcall * a b)? But this form would seem to imply the syntax *.(a,b) (currently invalid syntax). Alternatively, the form (call (. *) a b) would probably be better (and consistent with the current parsing of :(.*); then deal with that in lowering. Another irritation of this, however, is being consistent with the parsing of :.* which is just a symbol .* rather than parsed as (. *).

c42f · 2022-10-26T03:47:18Z

I chatted to @JeffBezanson about this.

Jeff commented that generally, parsing standalone dot-prefixed operators such as .* into the expression (. *) makes sense. And that having :.* parse by itself is vestigial and should ideally have been an error - given that it can be seen as applying the . to an atom, but is not an atom itself.

For comparison chains and prefix calls, this means we should adopt the following parsings:

a .< b < c ==> (comparison a (. <) b < c)
.*(a,b) ==> (call (. *) a b)

I guess in this we need to think about the precedence of prefix-.. Currently it's parsed as a single token so the precedence is the highest possible (effectively, can be seen like the @ macro prefix as in parse_atom()) This suggests it's necessary to handle inside parse_atom for now. (Currently dot splitting of constructs like :(.*) are handled within parse_unary_call() ... so in principle it might make sense to have the same precedence as syntax like $a... ie, be parsed inside parse_unary_prefix()?)

c42f mentioned this issue Sep 2, 2022

Reworking some Julia AST forms #88

Closed

33 tasks

c42f changed the title ~~Rework handling of . tokenization and operator calls~~ Rework handling of . tokenization and dotted operator calls Sep 2, 2022

c42f added the enhancement New feature or request label Sep 7, 2022

c42f added the trees label Sep 21, 2022

c42f mentioned this issue Oct 7, 2022

AST inconsistency between parsing of = vs kw #99

Closed

c42f mentioned this issue Nov 2, 2022

.+ not defined in metaprogramming JuliaLang/julia#47408

Closed

c42f mentioned this issue Nov 20, 2022

Parse dotted calls with dotcall head #151

Merged

c42f closed this as completed in #151 Nov 22, 2022

c42f mentioned this issue Apr 21, 2023

Parsing standalone dotted operators as (. op) #240

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rework handling of `.` tokenization and dotted operator calls #90

Rework handling of `.` tokenization and dotted operator calls #90

c42f commented Sep 2, 2022

c42f commented Sep 7, 2022

c42f commented Oct 13, 2022

c42f commented Oct 17, 2022

c42f commented Oct 26, 2022

Rework handling of . tokenization and dotted operator calls #90

Rework handling of . tokenization and dotted operator calls #90

Comments

c42f commented Sep 2, 2022

c42f commented Sep 7, 2022

c42f commented Oct 13, 2022

c42f commented Oct 17, 2022

Comparison chains

Prefix calls

c42f commented Oct 26, 2022

Rework handling of `.` tokenization and dotted operator calls #90

Rework handling of `.` tokenization and dotted operator calls #90