Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some small grammar fixes #871

Merged
merged 5 commits into from
Dec 27, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 11 additions & 8 deletions standard/dhall.abnf
Original file line number Diff line number Diff line change
Expand Up @@ -223,9 +223,12 @@ label = ("`" quoted-label "`" / simple-label)
; / !builtin label
nonreserved-label = label

; An any-label is allowed to be one of the reserved identifiers.
; An any-label is allowed to be one of the reserved identifiers (but not a keyword).
any-label = label

; Allow specifically `Some` in record and union labels.
any-label-or-some = any-label / Some
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why doesn't None need to be included too?

Copy link
Member Author

@Nadrieril Nadrieril Dec 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because Some is a keyword, but None is a builtin function, and builtins are already allowed as labels of records/unions.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, thanks! That distinction wasn't so clear to me.



; Dhall's double-quoted strings are similar to JSON strings (RFC7159) except:
;
Expand Down Expand Up @@ -295,9 +298,9 @@ unbraced-escape =
;
; See the `valid-non-ascii` rule for the exact ranges that are not allowed
braced-codepoint =
1*3HEXDIG ; %x000-FFF
("1" / "2" / "3" / "4" / "5" / "6" / "7" / "8" / "9" / "A" / "B" / "C" / "D" / "E" / "F" / "10") unicode-suffix; (Planes 1-16)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regardless of the other changes, this tweak should be merged. I brainfarted when I wrote this and didn't think about optimal order to avoid backtracking..

/ unbraced-escape ; (Plane 0)
/ ("1" / "2" / "3" / "4" / "5" / "6" / "7" / "8" / "9" / "A" / "B" / "C" / "D" / "E" / "F" / "10") unicode-suffix; (Planes 1-16)
/ 1*3HEXDIG ; %x000-FFF

; Allow zero padding for braced codepoints
braced-escape = *"0" braced-codepoint
Expand Down Expand Up @@ -838,7 +841,7 @@ selector-expression = primitive-expression *(whsp "." whsp selector)

selector = any-label / labels / type-selector

labels = "{" whsp [ any-label whsp *("," whsp any-label whsp) ] "}"
labels = "{" whsp [ any-label-or-some whsp *("," whsp any-label-or-some whsp) ] "}"

type-selector = "(" whsp expression whsp ")"
; NOTE: Backtrack when parsing the first three alternatives (i.e. the numeric
Expand Down Expand Up @@ -883,13 +886,13 @@ empty-record-literal = "="
empty-record-type = ""

non-empty-record-type-or-literal =
any-label whsp (non-empty-record-literal / non-empty-record-type)
any-label-or-some whsp (non-empty-record-literal / non-empty-record-type)

non-empty-record-type = ":" whsp1 expression *(whsp "," whsp record-type-entry)
record-type-entry = any-label whsp ":" whsp1 expression
record-type-entry = any-label-or-some whsp ":" whsp1 expression

non-empty-record-literal = "=" whsp expression *(whsp "," whsp record-literal-entry)
record-literal-entry = any-label whsp "=" whsp expression
record-literal-entry = any-label-or-some whsp "=" whsp expression

union-type =
non-empty-union-type
Expand All @@ -902,7 +905,7 @@ non-empty-union-type =

; x : Natural
; x
union-type-entry = any-label [ whsp ":" whsp1 expression ]
union-type-entry = any-label-or-some [ whsp ":" whsp1 expression ]

non-empty-list-literal =
"[" whsp [ "," whsp ] expression whsp *("," whsp expression whsp) "]"
Expand Down
Original file line number Diff line number Diff line change
@@ -1 +1 @@
λ(x : Natural) → Natural/fold x Natural (λ(n : Natural) → n + 1) 0
λ(x : Natural) → Natural/fold x Natural (λ(x : Natural) → x + 1) 0
Binary file modified tests/parser/success/preferMissingNoSpacesB.dhallb
Binary file not shown.
2 changes: 1 addition & 1 deletion tests/parser/success/preferMissingNoSpacesB.diag
Original file line number Diff line number Diff line change
@@ -1 +1 @@
[3, 9, [24, null, 0, 7], ["foo", 0]]
["missing//foo", 0]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not how the Haskell implementation parses it. Could you explain why this change is correct (and why the Haskell implementation is wrong)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that both parses are technically allowed (i.e. the grammar is ambiguous). However, the text of the grammar states "prefer the first parse" and "prefer alternatives that parse as many repetitions as possible", which can be summed up as "parse greedily". Taking that into account, we get that when parsing whatever// foo, the whole of whatever// should be consumed as an identifier (since slashes are allowed in identifiers).

Moreover I'm quite confident that this is the correct answer since I derived a parser directly from the abnf (taking into account greediness) and that's the output I got.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, my reading of the grammar is that this should be parsed as a single label

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, seems like I just misread the grammar then! :)

If it had turned out to be ambiguous, I would have suggested that we change that, but fortunately it doesn't seem to be the case. :)

1 change: 1 addition & 0 deletions tests/parser/success/unit/RecordLitSomeA.dhall
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
{ Some = 0 }
Binary file added tests/parser/success/unit/RecordLitSomeB.dhallb
Binary file not shown.
1 change: 1 addition & 0 deletions tests/parser/success/unit/RecordLitSomeB.diag
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
[8, {"Some": [15, 0]}]
1 change: 1 addition & 0 deletions tests/parser/success/unit/SelectionSomeA.dhall
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
record.{ Some }
Binary file added tests/parser/success/unit/SelectionSomeB.dhallb
Binary file not shown.
1 change: 1 addition & 0 deletions tests/parser/success/unit/SelectionSomeB.diag
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
[10, ["record", 0], "Some"]
1 change: 1 addition & 0 deletions tests/parser/success/unit/UnionTypeSomeA.dhall
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
< Some: Natural >
1 change: 1 addition & 0 deletions tests/parser/success/unit/UnionTypeSomeB.dhallb
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
� �dSomegNatural
1 change: 1 addition & 0 deletions tests/parser/success/unit/UnionTypeSomeB.diag
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
[11, {"Some": "Natural"}]