Skip to content

Yadriggy Syntax

Shigeru Chiba edited this page Oct 10, 2018 · 7 revisions

Syntax checker. This DSL is used for checking whether an abstract syntax tree obtained by calling Yadriggy::reify fits the given syntax. The syntax is specified in this DSL.

A typical hemi-parasitic DSL implemented with Yadriggy will use only limited syntax of Ruby. Hence the DSL compiler/interpreter has to report a syntax error when it receives unsupported Ruby code, or invalid code in that DSL. Valid Ruby code might be invalid in that DSL. The syntax checker helps DSL developers implement a syntax checker for their DSL.

How to check

The following code snippet makes a syntax checker syn that accepts only nil.

syn = Yadriggy::define_syntax do
  Reserved <= { name: 'nil' }
end

The syntax rules are given between do and end. In the case above, only one rule is given. The check method on the syntax checker returns true when the given abstract syntax tree is syntactically valid. Otherwise, it returns false.

syn.check(Yadriggy::reify { nil }.tree.body)  # true
syn.check(Yadriggy::reify { self }.tree.body) # false
syn.check(Yadriggy::reify { foo }.tree.body)  # false

For example, Yadriggy::reify { nil } returns an ASTree object containing an abstract syntax tree. ASTree#tree returns the abstract syntax tree. It is a Block object representing the block { nil }. Block#body returns the body of that block, which is a Reserved object representing an abstract syntax tree for the reserved word nil. The check method receives such an abstract syntax tree and determines whether it is syntactically valid or not.

The following is a somewhat larger example:

syn = Yadriggy::define_syntax do
  Binary <= { op: :+ | :-, left: expr, right: expr }
  expr   <= Binary | Number
  Block  <= { body: expr }
end

expr is not an AST-node type but a user type described later. This accepts only addition and subtraction expressions with numbers as their operands. It is compatible to the BNF definition below:

binary : expr '+' expr | expr '-' expr
expr   : binary | number
block  : expr

Note that the syntax rules above are not used for parsing. They are used for checking the validity. Hence their ambiguity does not matter. See the results of the syntax checking:

syn.check(Yadriggy::reify { 1 }.tree)           # true
syn.check(Yadriggy::reify { 1 + 2 }.tree)       # true
syn.check(Yadriggy::reify { 1 + 2 - 3 }.tree)   # true
syn.check(Yadriggy::reify { 1 * 2 - 3 }.tree)   # false
syn.check(Yadriggy::reify { 1 + a }.tree)       # false

Syntax rules

A syntax rule is written in a hemi-parasitic DSL that is also implemented with Yadriggy. It takes the following form:

type_name <= constraint

The left operand of <= is a type name of abstract syntax tree (AST) node such as Block and Reserved. All the node types are subclasses of ASTnode. The operator can be not only <= but also =.

The syntax rule specifies that an AST node has to satisfy the constraint in the right operand of <= when the type of that AST node matches the left operand of <=. The syntax checker first attempts to find a rule for the class type of that AST node. If the rule is not found, then it attempts to find a rule for the super class. If no syntax rule for the given AST node is found at all, a syntax error is reported.

Constraint

The syntax checker supports several kinds of constraints.

constraint specification
type name the AST node is a value of that type and it satisfies
the constraint for that type.
hash literal the attributes of the AST node satisfies the given constraints.
nil the AST node does never exist.
c1 + c2 Both constraints c1 and c2 are satisfied.
c1 | c2 Ordered choice. Either c1 or c2 is satisfied.
c1 is tested first.

Type name

For example,

Identifier <= Name

Identifier is a class for ASTnodes. It is a subclass of Name. The rule above specifies that an Identifier node satisfies the constraints for Name. The syntax checker looks for a rule whose left operand is Name. If it finds the rule, it checks whether the constraint of that rule is satisfied or not. If the constraint is not satisfied, a syntax error will be reported.

If the rule for the type name in the right operand is not found, no further syntax checking is performed. For the example above, if the syntax checker cannot find a rule whose left operand is Name, then the Identifier node satisfies the constraint; no syntax error is reported.

Hash literal

A hash literal specifies a constraint that the attributes of an AST node have to satisfy. For example,

Reserved <= { name: 'nil' }

This specifies that the name attribute of a Reserved AST node has to be a character string 'nil'. In other words, Reserved#name has to return 'nil'.

The attributes that are not included in the hash literal can be any value. So,

Reserved <= {}

This specifies that no constraint is given to Reserved; any AST node of type Reserved is acceptable.

The hash value for each key is the constraint for the corresponding attribute. It can be either a string literal, a symbol literal, nil, an array literal, or a type name. It can be surrounded with parentheses. The constraints can be combined with an ordered choice operator |.

hash value specification
string literal the value of the attribute matches that string literal.
symbol literal the value of the attribute matches that symbol literal.
For example, Binary#op (operator) is a symbol.
nil the attribute is nil or an empty array.
array literal the attribute is an array and the elements match the constraint.
type name the attribute is a value of that type.
( c ) the constraint c is optional. The attribute is nil or satisfies c.
c1 | c2 ordered choice. The attribute satisfies either c1 or c2.

The constraint surrounded with parentheses specifies that it is optional; the attribute is nil or it satisfies that constraint.

Array literal

An array literal specifies that the attribute is an array. It takes the following forms:

array literal specification
[c] all the array elements matches the constraint c.
[c0, c1] the first element matches c0 and the rest of
the elements match c1.
[c0, c1, c2] the first two elements match c0 and c1,
respectively. The rest of the elements match c2.
[c1 * c2] Each array element is an array with two AST nodes.
The first element of each array element matches c1 and
the second element matches c2.

The followings are examples.

Exprs       <= { expressions: [ Binary ] }
HashLiteral <= { pairs: [ Label * Number ] }

An Exprs node has the attribute expressions and this attribute has to be an array of Binary nodes. The pairs attribute of a HashLiteral node has to be an array. Each element of this array has to be an array whose first element is a Label node and whose second element is a Number object.

User type

In Ruby, a type name starts with a upper case. In the syntax rules, a type name starting with a lower case is also available. It is called a user type.

For example,

expr   <= Number | Identifier
Exprs  <= { expressions: [ expr ] }

The first rule specifies the user type expr. It is either Number or Identifier. The second rule specifies that an Exprs node has the attribute expressions. The value of expressions is an array of the expr type. Each array element is either a Number node or an Identifier node.

A user type can be used as a subtype of AST-node type:

add_expr <= Binary + { op: :+ }

A value of the user type add_expr is an AST node that is an instance of Binary and whose op attribute is :+.

When the syntax checker finds an AST node of a user type, or in other words, an AST node that satisfies the constraints given by the user type, then the user type is attached to that AST node. The attached user type can be obtained by calling ASTnode#usertype on the AST node after syntax checking. For the example above,

expr   <= Number | Identifier
Exprs  <= { expressions: [ expr ] }

When the syntax checker finds an Exprs node matching the rules above, then the expr type is attached to all the array elements of the expressions attribute.

When an AST node matches multiple user types, the attached type to that AST node is the innermost user type matching the last. For example,

expr     <= foo_name | VariableCall
foo_name <= VariableCall + { name: "foo" }

An AST for foo is a VariableCall node. It has the user type foo_name although it first matches the user type expr. Note that | is an ordered choice operator. The left operand of | is checked before the right operand. An AST for bar has the user type expr since its name attribute is not "foo" but "bar".

Checking attributes

A hash literal in a syntax rule specifies the constraint that the attributes of an AST node have to satisfy. If the hash value in the hash literal is a (not user-)type name, the value of the corresponding attribute has to be of that type. It also have to satisfy the constraint for that type.

For example,

Unary        <= { op: Symbol, operand: Name }
VariableCall <= Name
Reserved     <= nil

This syntax rules specifies that the operand of a Unary node is an instance of Name or its subclass except Reserved. Note that subclassing is considered.

The first rule specifies that the operand attribute of an Unary node has to be a value of Name, which is a subclass of ASTnode. If the attribute value is not Name, then a syntax error will be reported.

If the attribute value is Name, then the syntax checker recursively checks whether that attribute value satisfies the syntax rules. This syntax checking considers subclasses.

For example, the AST for -x is a Unary node with a VariableCall node for the operand attribute. This AST satisfies the first syntax rule above since VariableCall is a subclass of Name. Then the syntax checker checks whether this VariableCall node satisfies the syntax rules. It looks for a rule for VariableCall and, if it finds the rule, it checks whether the VariableCall node satisfies that rule. Note that it looks for a rule for Name only when it does not find the rule for VariableCall.

The AST for -Size is a Unary node with a Const node for the operand attribute. This AST satisfies the first syntax rule above. Then the syntax checker looks for a rule for Const. Since there is no rule for Const, that operand attribute of class Const passes the syntax checking.

The AST for -self is a Unary node with a Reserved node for the operand attribute since self is a reserved word. This AST satisfies the first syntax rule above but the syntax checker reports a syntax error when it checks the third syntax rule. Since the operand attribute is a Reserved node, the syntax checker selects the third rule for checking the attribute value. However, the right operand of this rule is nil. This specifies any AST containing a Reserved node causes a syntax error.

Clone this wiki locally