Skip to content

Parser Combinators

Armando Blancas edited this page May 26, 2018 · 16 revisions

A parser combinator takes one or more parsers and controls how, or whether, each parser is applied for the intended work. Kern's predefined combinators provide generic functionality like sequencing, repetition, selection, and bracketing.

We first load the Kern core namespace.

(use 'blancas.kern.core)

skip-ws skips all whitespace characters at the start of the input text.

(run (skip-ws letter) "\t \n x")  ;; \x

<|> allows for alternative parsers to be tried in order, as long as failing parsers don't consume any input. Stops and succeeds with the first parser that succeeds; fails if all alternatives fail or with the first failing parser that consumes input.

(run (<|> digit letter) "x")  ;; \x
(run (<|> digit letter) "9")  ;; \9
(run (<|> digit letter tab) "?")
;; line 1 column 1
;; unexpected \?
;; expecting digit, letter or tab

>> applies one or more parsers; keeps only the last result.

(run (>> digit digit letter) "91x")  ;; \x

<< applies one or more parsers; keeps only the first result.

(run (<< upper (sym* \-) digit digit) "F-16")  ;; \F

<$> applies the given parser on the input; then applies the given function on the parsed value and returns the result.

(run (<$> #(* % %) dec-num) "12")  ;; 144

<*> Applies two or more parsers in sequence; returns the results in a vector.

(run (<*> dec-num tab oct-num tab hex-num) "737\t747\tA340")  ;; [737 \tab 487 \tab 41792]

<:> Backtracking parser; on failure it restores any consumed input.

(:input (parse (<:> float-num) "5005.a"))  ;; (\5 \0 \0 \5 \. \a)

many applies a parser zero or more times until it fails. Since this failure tells when to stop, many still succeeds.

(run (many letter) "123")  ;; []
(run (many letter) "xyz")  ;; [\x \y \z]

many1 applies a parser one or more times until it fails. many1 will fail if it can't apply the supplied parser at least once; otherwise it succeeds.

(run (many1 letter) "xyz")  ;; [\x \y \z]
(run (many1 letter) "123")
;; line 1 column 1
;; unexpected \1
;; expecting letter

<+> applies one or more parsers; then flattens and concatenates their results into a string.

(run (<+> letter (many alpha-num)) "a177")  ;; "a177"

optional succeeds if the supplied parser succeeds or if it fails consuming no input.

(run (<+> (optional (sym* \-)) dec-num) "-45")  ;; "-45" optional succeeds
(run (<+> (optional (sym* \-)) dec-num) "45")   ;; "45"  optional fails, consumes no input
(run (<+> (optional float-num) letter) "45.A")  ;; optional fails, consumes input
;; line 1 column 4
;; unexpected \A
;; expecting digit

option applies the given parser; if it fails without consuming input it returns a parser state whose value is the supplied default value.

(run (option 3.1416 float-num) "foo")  ;; 3.1416

skip applies one or more parsers and ignores their results; it just returns nil.

(run (skip letter digit letter digit) "A5E8")  ;; nil

skip-many applies a parser zero or more times and returns nil.

(run (skip-many letter) "xyz")  ;; nil

skip-many1 applies a parser one or more times and returns nil.

(run (skip-many1 letter) "xyz")  ;; nil
(run (skip-many1 letter) "123")
;; line 1 column 1
;; unexpected \1
;; expecting letter

sep-by applies a parser zero or more times separated by the application of another parser.

(run (sep-by (sym* \&) (many1 letter)) "a&bb&ccc&ddd")  ;; [[\a] [\b \b] [\c \c \c] [\d \d \d]]

sep-by1 applies a parser one or more times separated by the application of another parser.

(run (sep-by1 (sym* \&) (many1 letter)) "a&bb&ccc&ddd")  ;; [[\a] [\b \b] [\c \c \c] [\d \d \d]]
(run (sep-by1 (sym* \&) (many1 letter)) "0&bb&ccc&ddd")
;; line 1 column 1
;; unexpected \0
;; expecting letter

end-by applies a parser zero or more times separated and ended by the application of another parser.

(run (end-by (sym* \&) (many1 letter)) "a&bb&ccc&")  ;; [[\a] [\b \b] [\c \c \c]]

end-by1 applies a parser one or more times separated and ended by the application of another parser.

(run (end-by1 (sym* \&) (many1 letter)) "a&bb&ccc&")  ;; [[\a] [\b \b] [\c \c \c]]
(run (end-by1 (sym* \&) (many1 letter)) "0&bb&ccc")
;; line 1 column 1
;; unexpected \0
;; expecting letter

sep-end-by applies a parser zero or more times separated and optionally ended by the application of another parser.

(run (sep-end-by (sym* \&) (many1 letter)) "0&bb&ccc")  ;; []
(run (sep-end-by (sym* \&) (many1 letter)) "a&bb&ccc")  ;; [[\a] [\b \b] [\c \c \c]]
(run (sep-end-by (sym* \&) (many1 letter)) "a&bb&ccc&")  ;; [[\a] [\b \b] [\c \c \c]]

sep-end-by1 applies a parser one or more times separated and optionally ended by the application of another parser.

(run (sep-end-by1 (sym* \&) (many1 letter)) "a&bb&ccc")  ;; [[\a] [\b \b] [\c \c \c]]
(run (sep-end-by1 (sym* \&) (many1 letter)) "a&bb&ccc&")  ;; [[\a] [\b \b] [\c \c \c]]
(run (sep-end-by1 (sym* \&) (many1 letter)) "0&bb&ccc") 
;; line 1 column 1
;; unexpected \0
;; expecting letter

between applies a third parser between the first and the second.

(run (between (sym \() (sym \)) dec-num) "(1984)")  ;; 1984

times applies a parser the given number of times.

(run (times 3 letter) "xyz")  ;; [\x \y \z]

not-followed-by succeeds only if the supplied parser fails.

(run (<*> dec-num (not-followed-by letter)) "737-3")  ;; [737 nil]

many-till applies a parser zero or more times while trying a second one until it fails. many-till, however, always succeeds.

(run (many-till letter digit) "abcdefg9")  ;; [\a \b \c \d \e \f \g]

search applies the supplied parser, repeatedly traversing the input, until it finds the target text or it finds the end of the input. It is intended for extracting data from unstructured text, like similar facilities for regular expressions.

(run (search dec-num) "Designated equiptment is a 787, coming soon.")
;; 787

Forward References

Sometimes is necessary to use forward declarations for parsers. See a typical case in the evaluation of expressions. Using def to define parser makes for a cleaner syntax but it makes the parsers evaluate immediately, which could cause an unbound var error. To avoid that error we wrap the parser with fwd.

fwd is a macro that delays the evaluation of a parser until runtime, as opposed to compile-time in def declarations. This idiom is done in two parts: using declare p and then (fwd p).

The JSON Sample simplifies the example in the Readme page and shows two uses of the forward-declared parser jvalue.

The program uses a forward declaration for the recursive parser jvalue.

(declare jvalue)

The Array production must just fwd because the def declarations are evaluated during compilation.

(def array
  "Parses the rule:  array := '[' (jvalue (',' jvalue)*)* ']'"
  (brackets (comma-sep (fwd jvalue))))

But an exception can be made when such as parser is not the first one in a bind expression, as it ends up in a nested function and thus we can write:

(def pair
  "Parses the rule:  pair := String ':' jvalue"
  (bind [f string-lit _ colon v jvalue]
    (return [f v])))