-
Notifications
You must be signed in to change notification settings - Fork 15
Parser Combinators
A parser combinator takes one or more parsers and controls how, or whether, each parser is applied for the intended work. Kern's predefined combinators provide generic functionality like sequencing, repetition, selection, and bracketing.
We first load the Kern core
namespace.
(use 'blancas.kern.core)
skip-ws
skips all whitespace characters at the start of the input text.
(run (skip-ws letter) "\t \n x") ;; \x
<|>
allows for alternative parsers to be tried in order, as long as failing parsers don't consume any input. Stops and succeeds with the first parser that succeeds; fails if all alternatives fail or with the first failing parser that consumes input.
(run (<|> digit letter) "x") ;; \x
(run (<|> digit letter) "9") ;; \9
(run (<|> digit letter tab) "?")
;; line 1 column 1
;; unexpected \?
;; expecting digit, letter or tab
>>
applies one or more parsers; keeps only the last result.
(run (>> digit digit letter) "91x") ;; \x
<<
applies one or more parsers; keeps only the first result.
(run (<< upper (sym* \-) digit digit) "F-16") ;; \F
<$>
applies the given parser on the input; then applies the given function on the parsed value and returns the result.
(run (<$> #(* % %) dec-num) "12") ;; 144
<*>
Applies two or more parsers in sequence; returns the results in a vector.
(run (<*> dec-num tab oct-num tab hex-num) "737\t747\tA340") ;; [737 \tab 487 \tab 41792]
<:>
Backtracking parser; on failure it restores any consumed input.
(:input (parse (<:> float-num) "5005.a")) ;; (\5 \0 \0 \5 \. \a)
many
applies a parser zero or more times until it fails. Since this failure tells when to stop, many
still succeeds.
(run (many letter) "123") ;; []
(run (many letter) "xyz") ;; [\x \y \z]
many1
applies a parser one or more times until it fails. many1
will fail if it can't apply the supplied parser at least once; otherwise it succeeds.
(run (many1 letter) "xyz") ;; [\x \y \z]
(run (many1 letter) "123")
;; line 1 column 1
;; unexpected \1
;; expecting letter
<+>
applies one or more parsers; then flattens and concatenates their results into a string.
(run (<+> letter (many alpha-num)) "a177") ;; "a177"
optional
succeeds if the supplied parser succeeds or if it fails consuming no input.
(run (<+> (optional (sym* \-)) dec-num) "-45") ;; "-45" optional succeeds
(run (<+> (optional (sym* \-)) dec-num) "45") ;; "45" optional fails, consumes no input
(run (<+> (optional float-num) letter) "45.A") ;; optional fails, consumes input
;; line 1 column 4
;; unexpected \A
;; expecting digit
option
applies the given parser; if it fails without consuming input it returns a parser state whose value is the supplied default value.
(run (option 3.1416 float-num) "foo") ;; 3.1416
skip
applies one or more parsers and ignores their results; it just returns nil.
(run (skip letter digit letter digit) "A5E8") ;; nil
skip-many
applies a parser zero or more times and returns nil.
(run (skip-many letter) "xyz") ;; nil
skip-many1
applies a parser one or more times and returns nil.
(run (skip-many1 letter) "xyz") ;; nil
(run (skip-many1 letter) "123")
;; line 1 column 1
;; unexpected \1
;; expecting letter
sep-by
applies a parser zero or more times separated by the application of another parser.
(run (sep-by (sym* \&) (many1 letter)) "a&bb&ccc&ddd") ;; [[\a] [\b \b] [\c \c \c] [\d \d \d]]
sep-by1
applies a parser one or more times separated by the application of another parser.
(run (sep-by1 (sym* \&) (many1 letter)) "a&bb&ccc&ddd") ;; [[\a] [\b \b] [\c \c \c] [\d \d \d]]
(run (sep-by1 (sym* \&) (many1 letter)) "0&bb&ccc&ddd")
;; line 1 column 1
;; unexpected \0
;; expecting letter
end-by
applies a parser zero or more times separated and ended by the application of another parser.
(run (end-by (sym* \&) (many1 letter)) "a&bb&ccc&") ;; [[\a] [\b \b] [\c \c \c]]
end-by1
applies a parser one or more times separated and ended by the application of another parser.
(run (end-by1 (sym* \&) (many1 letter)) "a&bb&ccc&") ;; [[\a] [\b \b] [\c \c \c]]
(run (end-by1 (sym* \&) (many1 letter)) "0&bb&ccc")
;; line 1 column 1
;; unexpected \0
;; expecting letter
sep-end-by
applies a parser zero or more times separated and optionally ended by the application of another parser.
(run (sep-end-by (sym* \&) (many1 letter)) "0&bb&ccc") ;; []
(run (sep-end-by (sym* \&) (many1 letter)) "a&bb&ccc") ;; [[\a] [\b \b] [\c \c \c]]
(run (sep-end-by (sym* \&) (many1 letter)) "a&bb&ccc&") ;; [[\a] [\b \b] [\c \c \c]]
sep-end-by1
applies a parser one or more times separated and optionally ended by the application of another parser.
(run (sep-end-by1 (sym* \&) (many1 letter)) "a&bb&ccc") ;; [[\a] [\b \b] [\c \c \c]]
(run (sep-end-by1 (sym* \&) (many1 letter)) "a&bb&ccc&") ;; [[\a] [\b \b] [\c \c \c]]
(run (sep-end-by1 (sym* \&) (many1 letter)) "0&bb&ccc")
;; line 1 column 1
;; unexpected \0
;; expecting letter
between
applies a third parser between the first and the second.
(run (between (sym \() (sym \)) dec-num) "(1984)") ;; 1984
times
applies a parser the given number of times.
(run (times 3 letter) "xyz") ;; [\x \y \z]
not-followed-by
succeeds only if the supplied parser fails.
(run (<*> dec-num (not-followed-by letter)) "737-3") ;; [737 nil]
many-till
applies a parser zero or more times while trying a second one until it fails. many-till
, however, always succeeds.
(run (many-till letter digit) "abcdefg9") ;; [\a \b \c \d \e \f \g]
search
applies the supplied parser, repeatedly traversing the input, until it finds the target text or it finds the end of the input. It is intended for extracting data from unstructured text, like similar facilities for regular expressions.
(run (search dec-num) "Designated equiptment is a 787, coming soon.")
;; 787
Sometimes is necessary to use forward declarations for parsers. See a typical case in the evaluation of expressions. Using def
to define parser makes for a cleaner syntax but it makes the parsers evaluate immediately, which could cause an unbound var error. To avoid that error we wrap the parser with fwd
.
fwd
is a macro that delays the evaluation of a parser until runtime, as opposed to compile-time in def
declarations. This idiom is done in two parts: using declare p
and then (fwd p)
.
The JSON Sample simplifies the example in the Readme page and shows two uses of the forward-declared parser jvalue
.
The program uses a forward declaration for the recursive parser jvalue
.
(declare jvalue)
The Array production must just fwd
because the def
declarations are evaluated during compilation.
(def array
"Parses the rule: array := '[' (jvalue (',' jvalue)*)* ']'"
(brackets (comma-sep (fwd jvalue))))
But an exception can be made when such as parser is not the first one in a bind
expression, as it ends up in a nested function and thus we can write:
(def pair
"Parses the rule: pair := String ':' jvalue"
(bind [f string-lit _ colon v jvalue]
(return [f v])))