PEG, an Implementation of a Packrat Parsing Expression Grammar in Go

A Parsing Expression Grammar ( hence peg) is a way to create grammars similar in principle to regular expressions but which allow better code integration. Specifically, peg is an implementation of the Packrat parser generator originally implemented as peg/leg by Ian Piumarta in C. A Packrat parser is a "descent recursive parser" capable of backtracking and negative look-ahead assertions which are problematic for regular expression engines .

Installation

go install github.com/pointlander/peg@latest

Usage

peg [<option>]... <file>

Usage of peg:
  -inline
      parse rule inlining
  -noast
      disable AST
  -output string
      specify name of output file
  -print
      directly dump the syntax tree
  -strict
      treat compiler warnings as errors
  -switch
      replace if-else if-else like blocks with switch blocks
  -syntax
      print out the syntax tree
  -version
      print the version and exit

Sample Makefile

This sample Makefile will convert any file ending with .peg into a .go file with the same name. Adjust as needed.

.SUFFIXES: .peg .go

.peg.go:
	peg -noast -switch -inline -strict -output $@ $<

all: grammar.go

Use caution when picking your names to avoid overwriting existing .go files. Since only one PEG grammar is allowed per Go package (currently) the use of the name grammar.peg is suggested as a convention:

grammar.peg
grammar.go

PEG File Syntax

First declare the package name and any import(s) required:

package <package name>

import <import name>

Then declare the parser:

type <parser name> Peg {
	<parser state variables>
}

Next declare the rules. Note that the main rules are described below but are based on the peg/leg rules which provide additional documentation.

The first rule is the entry point into the parser:

<rule name> <- <rule body>

The first rule should probably end with !. to indicate no more input follows.

first <- . !.

This is often set to END to make PEG rules more readable:

END <- !.

. means any character matches. For zero or more character matches, use:

repetition <- .*

For one or more character matches, use:

oneOrMore <- .+

For an optional character match, use:

optional <- .?

If specific characters are to be matched, use single quotes:

specific <- 'a'* 'bc'+ 'de'?

This will match the string "aaabcbcde".

For choosing between different inputs, use alternates:

prioritized <- 'a' 'a'* / 'bc'+ / 'de'?

This will match "aaaa" or "bcbc" or "de" or "". The matches are attempted in order.

If the characters are case-insensitive, use double quotes:

insensitive <- "abc"

This will match "abc" or "Abc" or "ABc" and so on.

For matching a set of characters, use a character class:

class <- [a-z]

This will match "a" or "b" or all the way to "z".

For an inverse character class, start with a caret:

inverse <- [^a-z]

This will match anything but "a" or "b" or all the way to "z".

If the character class is case-insensitive, use double brackets:

insensitive <- [[A-Z]]

(Note that this is not available in regular expression syntax.)

Use parentheses for grouping:

grouping <- (rule1 / rule2) rule3

For looking ahead a match (predicate), use:

lookAhead <- &rule1 rule2

For inverse look ahead, use:

inverse <- !rule1 rule2

Use curly braces for Go code:

gocode <- { fmt.Println("hello world") }

For string captures, use less than and greater than:

capture <- <'capture'> { fmt.Println(text) }

Will print out "capture". The captured string is stored in buffer[begin:end].

Testing Complex Grammars

Testing a grammar usually requires more than the average unit testing with multiple inputs and outputs. Grammars are also usually not for just one language implementation. Consider maintaining a list of inputs with expected outputs in a structured file format such as JSON or YAML and parsing it for testing or using one of the available options for Go such as Rob Muhlestein's tinout package.

Development

Requirements

Golang, see go.mod for version
golangci-lint latest version
gofumpt
```
go install mvdan.cc/gofumpt@latest
```

Build

go run build.go

or

go generate

Test

go run build.go test

Lint

golangci-lint run

Format

gofumpt -l -w .

Files

bootstrap/main.go - bootstrap syntax tree of peg
tree/peg.go - syntax tree and code generator
peg.peg - peg in its own language

Author

Andrew Snodgrass

Projects That Use `peg`

Here are some projects that use peg to provide further examples of PEG grammars:

https://github.com/tj/go-naturaldate - natural date/time parsing
https://github.com/gnames/gnparser - scientific names parsing

Name		Name	Last commit message	Last commit date
Latest commit History 290 Commits
.github/workflows		.github/workflows
bootstrap		bootstrap
cmd/peg-bootstrap		cmd/peg-bootstrap
grammars		grammars
set		set
tree		tree
.gitignore		.gitignore
.golangci.yml		.golangci.yml
AUTHORS		AUTHORS
LICENSE		LICENSE
LINKS.md		LINKS.md
README.md		README.md
build.go		build.go
buildinfo.go		buildinfo.go
go.mod		go.mod
go.sum		go.sum
main.go		main.go
peg.peg		peg.peg
peg.peg.go		peg.peg.go
peg_test.go		peg_test.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PEG, an Implementation of a Packrat Parsing Expression Grammar in Go

See Also

Installation

Usage

Sample Makefile

PEG File Syntax

Testing Complex Grammars

Development

Requirements

Build

Test

Lint

Format

Files

Author

Projects That Use `peg`

About

Releases

Packages

Contributors 37

Languages

License

pointlander/peg

Folders and files

Latest commit

History

Repository files navigation

PEG, an Implementation of a Packrat Parsing Expression Grammar in Go

See Also

Installation

Usage

Sample Makefile

PEG File Syntax

Testing Complex Grammars

Development

Requirements

Build

Test

Lint

Format

Files

Author

Projects That Use peg

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 37

Languages

Projects That Use `peg`

Packages