-
-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Formal specification of SILE grammar documentation #1435
Comments
AFAIK, the LPEG grammar is probably the closest thing to a "formal" specification |
No, it isn't. At the moment the LPEG parser source code is the defacto standard. I probably wouldn't bee too hard to come up with an EBNF for it, but such has not been done yet. There are a couple of idiosyncrasies such as the balanced braces requirement is pass-through blocks, but mostly it's pretty straight forward. |
The much ado about #105 is making be serious about writing out a formal grammar spec! Is there a language that would be preferred for this? I have experience with reading several by eye (mostly EBNF) but not in writing them or parsing them automatically. Would Lark or something else be preferable if I were to dig in to this? |
Tree-sitter is getting popular. It has bindings to many languages and some editors, e.g. Neovim supports using it for syntax highlighting. Having official tree-sitter grammar would make it easy to create tools for SILE in (almost) any language. I wouldn't call it "formal specification" though. Zig is using PEG grammar for both documentation and the parser: https://github.com/ziglang/zig-spec/tree/master/grammar. Maybe we could do the same thing? |
Our parser is already written in LPEG so writing up a PEG variant probably wouldn't be that hard. But how does that help? Can a PEG grammar be converted to a tree-sitter grammar automatically? |
I don't think so. I mentioned PEG, because the issue is about formal specification, and tree-sitter uses a JavaScript DSL instead of a separate language (example). This DSL still looks pretty similar to PEG though, so maybe it's fine. |
c.f. unicode-org/message-format-wg#342 for a relevant discussion of grammars and tooling. c.f. edubart/nelua-lang#193 for a discussion of generating railroad diagrams from grammars that touches on several possible grammars. |
I started work on an ABNF grammar (which seemed like the best candidate on the grounds that in can be converted to other grammars better than most and hence used in the widest range of tooling), but that may or may not turn out to be a good fit. So far it hasn't gone well. |
It would be easier to understand the grammar if the LPeg code is written in LPeg.re format. Below is math grammar in LPeg.re It is worth noting that zyedidia/gpeg implemented a PEG parser similar to LPeg with incremental parsing support. |
@brynne8 Do you have that grammar coded up somewhere other than a screen shot? It would be useful to have to play around with if you had it to share. |
local lpeg = require('lpeg')
local re = require('re')
local bits = require('parserbits')
local inspect = require('inspect')
local P, C, S = lpeg.P, lpeg.C, lpeg.S
local myID = C(bits.silidentifier) / 1
local wrapper = function (a) return type(a)=="table" and a or {} end
local specials = S"{}%\\"
local g = re.compile([=[
document <- texlike_stuff !.
texlike_stuff <- {: environment / comment / texlike_text / texlike_braced_stuff / texlike_command :}*
environment <- '\begin' {:options: %parameters :}
('{' {:command: passthrough_cmd :} '}' passthrough_env_stuff pass_end /
'{' {:command: %cmdID :} '}' texlike_stuff notpass_end)
comment <- ('%' (!%eol .)* %eol ) -> ''
texlike_text <- { (!%specials . / %escaped_specials)+ } -> unescapeSpecials
texlike_braced_stuff <- '{' texlike_stuff '}'
texlike_command <- '\' ({:command: passthrough_cmd :} {:options: %parameters :}
passthrough_braced_stuff / {:command: %cmdID :} {:options: %parameters :}
texlike_braced_stuff)
passthrough_cmd <- 'ftl' / 'lua' / 'math' / 'raw' / 'script' / 'sil' / 'use' / 'xml'
passthrough_stuff <- { {: passthrough_text / passthrough_debraced_stuff :} }
passthrough_env_stuff <- {: passthrough_env_text :}*
passthrough_text <- { [^{}]+ }
passthrough_env_text <- { (!('\end{' =command '}') .)+ }
passthrough_braced_stuff <- '{' passthrough_stuff '}'
passthrough_debraced_stuff <- { passthrough_braced_stuff }
notpass_end <- '\end{' =command '}' _
pass_end <- '\end{' =command '}' _
_ <- %s*
]=], {
unescapeSpecials = function (str)
return str:gsub('\\([{}%%\\])', '%1')
end,
cmdID = myID - P"begin" - P"end",
parameters = (P"[" * bits.parameters * P"]")^-1 / wrapper,
eol = S"\r\n",
specials = specials,
escaped_specials = P"\\" * specials
})
print(inspect(g:match([[
% this is a sample comment
\begin[papersize=a6]{document}
Hello, world!
\end{document}
]]))) I used to think that LPeg.re supports |
As found in GH issue comment: sile-typesetter#1435 (comment)
Thanks @brynne8. I started setting that up as an alternative parser. It isn't quite all wired up yet but it will be interesting to benchmark it against the epnf based parser as well as mess around with how useful parser errors could be. One of the reasons it's failing tests right now is it throws different errors than epnf based parsing. If we were to switch by default that would be one less Lua dependency, and I agree the grammar is somewhat easier to grok (at least for me who loves regular expressions)). Tangentially I also have the ABNF grammar converting to EBNF, which also opens up a lot of options for converting it to other parsers. See this branch. |
For anybody following this issue, I've posted some of the formal grammar work including railroad diagrams on the website. The ABNF format grammar also works as input for |
Is the formal specification of the SILE grammar documented?
This would help write other tools such as a Language Server, as per #1406
The text was updated successfully, but these errors were encountered: