Skip to content

Commit

Permalink
Add preliminary docs for queries
Browse files Browse the repository at this point in the history
  • Loading branch information
AntonyBlakey authored and Xanewok committed Apr 8, 2024
1 parent 1d3599a commit fea3714
Show file tree
Hide file tree
Showing 6 changed files with 171 additions and 3 deletions.
33 changes: 30 additions & 3 deletions documentation/public/user-guide/concepts.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,8 @@ Slang is capable of parsing the source code into a Concrete Syntax Tree (CST; al
which is a tree structure of the program that also includes things like punctuation or whitespace.

This is done by using the (standard) approach of lexical analysis followed by syntax analysis.
The source text as a sequence of characters is recognized into a sequence of tokens (lexical analysis), which then in turn is _parsed_ into the CST.
The source text as a sequence of characters is recognized into a sequence of
tokens (lexical analysis), which then in turn is _parsed_ into the CST.

The resulting CST is a regular tree data structure that you can visit.
The tree nodes are represented by the `Node` structure, which can be one of two kinds:
Expand All @@ -38,8 +39,9 @@ The tree nodes are represented by the `Node` structure, which can be one of two
For many code analysis tasks, it is useful to traverse the parse tree and visit each node.
The `Cursor` object allows callers to traverse the parse tree in an efficient pre-order manner.

It provides several `goTo*()` navigation functions, each returning `true` if the cursor was successfully moved, and `false` otherwise.
There are three main ways to do it:
It provides several `goTo*()` navigation functions, each returning `true` if the
cursor was successfully moved, and `false` otherwise. There are three main ways
to do it:

- According to the DFS order, i.e. `goToNext()` and `goToPrevious()`,
- According to the relationship between the current node and the next node, i.e. `goToParent()`, `goToFirstChild()`, `goToNextNonDescendent()`
Expand All @@ -48,10 +50,35 @@ There are three main ways to do it:
As such, the cursor is stateful and keeps track of the path it has taken through the CST.
It starts at the root it was created at and is completed when it reaches its root when navigating forward.

## CST Queries

The `Cursor` API is a low-level API that allows you to traverse the CST in a
procedural manner. However, it is often more convenient to use the declarative
`Query` API. Queries allow you to express your intent more concisely, and also
allows you to reuse the same query in multiple places. Queries can largely
replace the need for both internal (cursor), and external (visitor) iterator
patterns.

The [query language](./query-language.md) is based on pattern matching, and the
execution semantics are closer to unification than to regular expression
matching i.e. a query returns all possible matches, not just the
longest/shortest/first/last match. There is no concept of a 'greedy' operator
for example.

Query execution is based on `Cursor`s, and the resulting matches and unification
bindings are returned as `Cursor`s as well. This allows you to mix and match
manual traversal, cursors, and queries.

Multiple queries can be executed in a batch, and efficiently traverse the tree
looking for matches. This mode of operation can replace all visitor patterns.

## Abstract Syntax Tree (AST)

AST types are a set of abstractions that provide a typed view of the untyped CST nodes.
You can convert any untyped CST node to its corresponding AST type using their constructors.

There is a corresponding type for each `RuleKind` (non-terminal) in the language. AST types are immutable.
Additionally, their fields are constructed lazily as they are accessed for the first time.

AST nodes can maintain a reference to the CST node they were constructed from,
and can be used to navigate to the corresponding CST node.
1 change: 1 addition & 0 deletions documentation/public/user-guide/npm-package/NAV.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
- [Installation](./installation.md)
- [Using the Parser](./using-the-parser.md)
- [Using the Cursor](./using-the-cursor.md)
- [Using Queries](./using-queries.md)
- [Using the AST](./using-the-ast.md)
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# Using Queries
137 changes: 137 additions & 0 deletions documentation/public/user-guide/query-language.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,137 @@
# The Tree Query Language

## Query Syntax

A _query_ is a pattern that matches a
certain set of nodes in a tree. The expression to match a given node
consists of a pair of brackets (`[]`) containing two things: the node's kind, and
optionally, a series of other patterns that match the node's children. For
example, this pattern would match any `binary_expression` node whose children
are both `number_literal` nodes:

```scheme
[binary_expression [number_literal] [number_literal]]
```

The children of a node can optionally be named. The name is a property of the edge from
the node to the child, and is not a property of the child. For example, this pattern will match a `binary_expression`
node with two `number literal` children, named `left` and `right`:

```scheme
[binary_expression [left:number_literal] [right:number_literal]]
```

You can also match a node's textual content using a string literal. For example, this pattern would match a
`binary expression` with a `+` operator:

```scheme
[binary_expression [operator:"+"] [left:_] [right:_]]
```

If you don't care about the kind of a node, you can use an underscore '\_', which matches any kind.
For example, this pattern will match a `binary_expression`
node with two children, one of any kind named`left` and one of any kind:

```scheme
[binary_expression [left:_] [_]]
```

Children can also be elided. For example, this would produce multiple matches for a
`binary_expression` where at least _one_ of the children is a `string_literal` node, where each match
is associated with each of the `string_literal` children:

```scheme
[binary_expression ... [string_literal] ...]
```

### Capturing Nodes

When matching patterns, you may want to process specific nodes within the
pattern. Captures allow you to associate names with specific nodes in a pattern,
so that you can later refer to those nodes by those names. Capture names are
written _after_ the nodes that they refer to, and start with an `@` character.

For example, this pattern would match any assignment of a `function` to an
`identifier`, and it would associate the name `the-function-name` with the
identifier:

```scheme
[assignment_expression
@the-function-name [left:identifier]
[right:function]]
```

And this pattern would match all method definitions, associating the name
`the-method-name` with the method name, `the-class-name` with the containing
class name:

```scheme
[class_declaration
@the-class-name [name:identifier]
[body:class_body
[method_definition
@the-method-name [name:property_identifier]]]]
```

### Quantification

You can surround a sequence of patterns in parenthesis (`()`), followed
by a `?`, `*` or `+` operator. The `?` operator matches _zero or one_ repetitions
of a pattern, the`*` operator matches _zero or more_, and the `+` operator
matches _one or more_.

For example, this pattern would match a sequence of one or more comments:

```scheme
([comment])+
```

This pattern would match a class declaration, capturing all of the decorators if
any were present:

```scheme
[class_declaration
(@the-decorator [decorator])*
@the-name [name:identifier]]
```

This pattern would match all function calls, capturing a string argument if one was
present:

```scheme
[call_expression
@the-function [function:identifier]
[arguments:arguments (@the-string-arg [string])?]]
```

### Alternations

An alternation is written as a sequence of patterns separated by '|' and surrounded by parentheses.

For example, this pattern would match a call to either a variable or an object property.
In the case of a variable, capture it as `@function`, and in the case of a property, capture it as `@method`:

```scheme
[call_expression
function: (
@function [identifier]
| [member_expression @method [property:property_identifier]]
)
]
```

This pattern would match a set of possible keyword tokens, capturing them as `@keyword`:

```scheme
@keyword (
"break"
| "delete"
| "else"
| "for"
| "function"
| "if"
| "return"
| "try"
| "while"
)
```
1 change: 1 addition & 0 deletions documentation/public/user-guide/rust-crate/NAV.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,3 +2,4 @@
- [Using the CLI](./using-the-cli.md)
- [Using the Parser](./using-the-parser.md)
- [Using the Cursor](./using-the-cursor.md)
- [Using Queries](./using-queries.md)
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# Using Queries

0 comments on commit fea3714

Please sign in to comment.