Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Formal specification of SILE grammar documentation #1435

Closed
baubleb opened this issue Jun 15, 2022 · 13 comments · Fixed by #1715
Closed

Formal specification of SILE grammar documentation #1435

baubleb opened this issue Jun 15, 2022 · 13 comments · Fixed by #1715
Labels
documentation Documentation bug or improvement issue todo
Milestone

Comments

@baubleb
Copy link

baubleb commented Jun 15, 2022

Is the formal specification of the SILE grammar documented?

This would help write other tools such as a Language Server, as per #1406

@Omikhleia
Copy link
Member

AFAIK, the LPEG grammar is probably the closest thing to a "formal" specification

@alerque
Copy link
Member

alerque commented Jun 15, 2022

No, it isn't. At the moment the LPEG parser source code is the defacto standard. I probably wouldn't bee too hard to come up with an EBNF for it, but such has not been done yet. There are a couple of idiosyncrasies such as the balanced braces requirement is pass-through blocks, but mostly it's pretty straight forward.

@alerque alerque added todo documentation Documentation bug or improvement issue labels Jun 15, 2022
@alerque
Copy link
Member

alerque commented Dec 6, 2022

The much ado about #105 is making be serious about writing out a formal grammar spec!

Is there a language that would be preferred for this? I have experience with reading several by eye (mostly EBNF) but not in writing them or parsing them automatically. Would Lark or something else be preferable if I were to dig in to this?

@nawordar
Copy link

nawordar commented Jan 25, 2023

Is there a language that would be preferred for this? I have experience with reading several by eye (mostly EBNF) but not in writing them or parsing them automatically. Would Lark or something else be preferable if I were to dig in to this?

Tree-sitter is getting popular. It has bindings to many languages and some editors, e.g. Neovim supports using it for syntax highlighting. Having official tree-sitter grammar would make it easy to create tools for SILE in (almost) any language. I wouldn't call it "formal specification" though.

Zig is using PEG grammar for both documentation and the parser: https://github.com/ziglang/zig-spec/tree/master/grammar. Maybe we could do the same thing?

@alerque
Copy link
Member

alerque commented Jan 25, 2023

Our parser is already written in LPEG so writing up a PEG variant probably wouldn't be that hard. But how does that help? Can a PEG grammar be converted to a tree-sitter grammar automatically?

@nawordar
Copy link

Can a PEG grammar be converted to a tree-sitter grammar automatically?

I don't think so. I mentioned PEG, because the issue is about formal specification, and tree-sitter uses a JavaScript DSL instead of a separate language (example). This DSL still looks pretty similar to PEG though, so maybe it's fine.

@alerque
Copy link
Member

alerque commented Feb 14, 2023

c.f. unicode-org/message-format-wg#342 for a relevant discussion of grammars and tooling.

c.f. edubart/nelua-lang#193 for a discussion of generating railroad diagrams from grammars that touches on several possible grammars.

@alerque
Copy link
Member

alerque commented Feb 15, 2023

I started work on an ABNF grammar (which seemed like the best candidate on the grounds that in can be converted to other grammars better than most and hence used in the widest range of tooling), but that may or may not turn out to be a good fit. So far it hasn't gone well.

@brynne8
Copy link

brynne8 commented Jun 18, 2023

It would be easier to understand the grammar if the LPeg code is written in LPeg.re format. Below is math grammar in LPeg.re

math grammar

It is worth noting that zyedidia/gpeg implemented a PEG parser similar to LPeg with incremental parsing support.

@alerque
Copy link
Member

alerque commented Jan 11, 2024

@brynne8 Do you have that grammar coded up somewhere other than a screen shot? It would be useful to have to play around with if you had it to share.

@brynne8
Copy link

brynne8 commented Jan 15, 2024

@brynne8 Do you have that grammar coded up somewhere other than a screen shot? It would be useful to have to play around with if you had it to share.

local lpeg = require('lpeg')
local re = require('re')
local bits = require('parserbits')
local inspect = require('inspect')

local P, C, S = lpeg.P, lpeg.C, lpeg.S
local myID = C(bits.silidentifier) / 1

local wrapper = function (a) return type(a)=="table" and a or {} end
local specials = S"{}%\\"

local g = re.compile([=[
document         <- texlike_stuff !.
texlike_stuff    <- {: environment / comment / texlike_text / texlike_braced_stuff / texlike_command :}*

environment      <- '\begin' {:options: %parameters :}
                   ('{' {:command: passthrough_cmd :} '}' passthrough_env_stuff pass_end /
                    '{' {:command: %cmdID :} '}' texlike_stuff notpass_end)

comment          <- ('%' (!%eol .)* %eol ) -> ''
texlike_text     <- { (!%specials . / %escaped_specials)+ } -> unescapeSpecials
texlike_braced_stuff <- '{' texlike_stuff '}'
texlike_command  <- '\' ({:command: passthrough_cmd :} {:options: %parameters :}
                    passthrough_braced_stuff / {:command: %cmdID :} {:options: %parameters :}
                    texlike_braced_stuff)

passthrough_cmd  <- 'ftl' / 'lua' / 'math' / 'raw' / 'script' / 'sil' / 'use' / 'xml'

passthrough_stuff <- { {: passthrough_text / passthrough_debraced_stuff :} }
passthrough_env_stuff <- {: passthrough_env_text :}*
passthrough_text <- { [^{}]+ }
passthrough_env_text <- { (!('\end{' =command '}') .)+ }
passthrough_braced_stuff <- '{' passthrough_stuff '}'
passthrough_debraced_stuff <- { passthrough_braced_stuff }
notpass_end <- '\end{' =command '}' _
pass_end <- '\end{' =command '}' _

_   <- %s*
]=], {
  unescapeSpecials = function (str)
    return str:gsub('\\([{}%%\\])', '%1')
  end,
  cmdID = myID - P"begin" - P"end",
  parameters = (P"[" * bits.parameters * P"]")^-1 / wrapper,
  eol = S"\r\n",
  specials = specials,
  escaped_specials = P"\\" * specials
})

print(inspect(g:match([[
% this is a sample comment
\begin[papersize=a6]{document}
Hello, world!
\end{document}
]])))

I used to think that LPeg.re supports Cb capture, but found no corresponding rule. So I slightly modified the grammar.

alerque pushed a commit to alerque/sile that referenced this issue Jan 20, 2024
@alerque
Copy link
Member

alerque commented Jan 20, 2024

Thanks @brynne8. I started setting that up as an alternative parser. It isn't quite all wired up yet but it will be interesting to benchmark it against the epnf based parser as well as mess around with how useful parser errors could be. One of the reasons it's failing tests right now is it throws different errors than epnf based parsing. If we were to switch by default that would be one less Lua dependency, and I agree the grammar is somewhat easier to grok (at least for me who loves regular expressions)).

Tangentially I also have the ABNF grammar converting to EBNF, which also opens up a lot of options for converting it to other parsers. See this branch.

@alerque
Copy link
Member

alerque commented Jan 30, 2024

For anybody following this issue, I've posted some of the formal grammar work including railroad diagrams on the website. The ABNF format grammar also works as input for apg-c, apg-js, and apg-py to generate parsers in those languages. I haven't put the generated parsers through any paces, but the input syntax is at least valid enough to generate them without errors. Some developer rules for generating parsers are currently in this branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Documentation bug or improvement issue todo
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants