A parser and interpreter written in Elixir for ABNF grammars.
ABNF is defined in the RFC2234, which is obsoleted by RFC4234, which in turn is obsoleted by the RFC5234. There's also an update in the RFC7405.
This library implements the latest definition (RFC5234) (with erratas #3076, and #2968), and RFC7405.
iex(1)> grammar = ABNF.load_file "test/resources/ipv4.abnf"
iex(2)> initial_state = %{}
iex(2)> ABNF.apply grammar, "ipv4address", '250.246.192.34', initial_state
%ABNF.CaptureResult{
input: '250.246.192.34',
rest: '',
state: %{ipv4address: '250.246.192.34'},
string_text: '250.246.192.34',
string_tokens: ['250', '.', '246', '.', '192', '.', '34'],
values: ["Your ip address is: 250.246.192.34"]
}
The result can be read as an %ABNF.CaptureResult{} where:
- input: The original input
- rest: The part of the input that didn't match.
- state: The state after running all the rules applied to the input.
- string_text: The rule value as a string (this might or might not be the same as the rule value, since you can return custom values when adding a reduce code to the rule).
- string_tokens: Each one of the values that compose the string (in this case, [octet, dot, octet, dot, octet, dot, octet]).
- values: The rule value. In this case the value comes from the reduce code in the grammar itself.
-
There's a small sample application at https://github.com/marcelog/ex_abnf_example. An article describing this application is located at http://marcelog.github.io/articles/abnf_grammars_in_elixir.html.
-
The unit tests use different sample RFCs to test the grammar parser and the interpreter
This is not a parser generator, but an interpreter. It will load up an ABNF grammar, and generate an (kind of) AST for it. Then you can apply any of the rules to an input and the interpreter will parse the input according to the rule.
To use it in your Mix projects, first add it as a dependency:
def deps do
[{:ex_abnf, "~> 0.2.8"}]
end
Then run mix deps.get to install it.
After a rule, you can add your own code, for example:
userinfo = *( unreserved / pct-encoded / sub-delims / ":" ) !!!
state = Map.put state, :userinfo, rule
{:ok, state}
!!!
The code in question will be packed together into a module that is created in runtime to speed up execution later on.
Your code can return:
-
{:ok, state}: The match continues, and the new state is used for future calls.
-
{:ok, state, rule_value}: Returns a new state but also the rule_value is used as the result of the match. In YACC terms, rule_value would be the equivalent of $$ = ...
-
{:error, error}: The whole match is aborted and this error is thrown.
And your code will be called with the following bindings:
-
state: This is the state that you can pass when calling the initial ABNF.apply function, and is a way to keep state through the whole match, it can be whatever you like and can mutate through calls as long as your code can handle it.
-
values: When a rule is composed of different tokens (e.g: path = SEGMENT "/" SEGMENT) this contains a list with all the values of those tokens in order. In YACC terms, this would be the equivalent of using $1, $2, $3, etc. Note that a value here can be a reduced value returned by your own code in a previous rule.
-
string_values: Just like
values
but each value is a nested list of lists with all the characters that matched (you will usually want to flatten the list to get each one of the full strings).
You can also start your grammar with code to write your own helper functions and module additions. For example:
!!!
require Logger
def return_value(ip) do
Logger.debug "Hello world"
"Your ip address is: #{ip}"
end
!!!
IPv4address =
dec-octet "."
dec-octet "."
dec-octet "."
dec-octet !!!
state = Map.put state, :ipv4address, rule
{:ok, state, return_value(rule)}
!!!
dec-octet = DIGIT ; 0-9
/ %x31-39 DIGIT ; 10-99
/ "1" 2DIGIT ; 100-199
/ "2" %x30-34 DIGIT ; 200-249
/ "25" %x30-35 ; 250-255
DIGIT = %x30-39
Note how the result of the IPv4address
rule is the result of a call to the
function return_value
.
- In the reduce code the rule value is no longer the rule name, but the
variable
rule
. - The grammar text no longer supports
cr
as the newline, one should always usecrlf
. - In the reduce code there are now available the following variables:
rule
: The rule valuestring_values
: Like the oldtokens
variable, but contains a nested list of lists with the parsed strings.values
: Like the oldtokens
variable, but with the reduced values (could be a mixed nested list of lists containing char_lists and/or other kind of values).- Original rule names are now preserverd and only downcased, no replacements
are done to chars (i.e:
-
to_
).
The source code is released under Apache 2 License.
Check LICENSE file for more information.