Add preliminary docs for queries

Xanewok · Apr 8, 2024 · fea3714 · fea3714
1 parent 1d3599a
commit fea3714
Show file tree

Hide file tree

Showing 6 changed files with 171 additions and 3 deletions.
diff --git a/documentation/public/user-guide/concepts.md b/documentation/public/user-guide/concepts.md
@@ -25,7 +25,8 @@ Slang is capable of parsing the source code into a Concrete Syntax Tree (CST; al
 which is a tree structure of the program that also includes things like punctuation or whitespace.
 
 This is done by using the (standard) approach of lexical analysis followed by syntax analysis.
-The source text as a sequence of characters is recognized into a sequence of tokens (lexical analysis), which then in turn is _parsed_ into the CST.
+The source text as a sequence of characters is recognized into a sequence of
+tokens (lexical analysis), which then in turn is _parsed_ into the CST.
 
 The resulting CST is a regular tree data structure that you can visit.
 The tree nodes are represented by the `Node` structure, which can be one of two kinds:
@@ -38,8 +39,9 @@ The tree nodes are represented by the `Node` structure, which can be one of two
 For many code analysis tasks, it is useful to traverse the parse tree and visit each node.
 The `Cursor` object allows callers to traverse the parse tree in an efficient pre-order manner.
 
-It provides several `goTo*()` navigation functions, each returning `true` if the cursor was successfully moved, and `false` otherwise.
-There are three main ways to do it:
+It provides several `goTo*()` navigation functions, each returning `true` if the
+cursor was successfully moved, and `false` otherwise. There are three main ways
+to do it:
 
 -   According to the DFS order, i.e. `goToNext()` and `goToPrevious()`,
 -   According to the relationship between the current node and the next node, i.e. `goToParent()`, `goToFirstChild()`, `goToNextNonDescendent()`
@@ -48,10 +50,35 @@ There are three main ways to do it:
 As such, the cursor is stateful and keeps track of the path it has taken through the CST.
 It starts at the root it was created at and is completed when it reaches its root when navigating forward.
 
+## CST Queries
+
+The `Cursor` API is a low-level API that allows you to traverse the CST in a
+procedural manner. However, it is often more convenient to use the declarative
+`Query` API. Queries allow you to express your intent more concisely, and also
+allows you to reuse the same query in multiple places. Queries can largely
+replace the need for both internal (cursor), and external (visitor) iterator
+patterns.
+
+The [query language](./query-language.md) is based on pattern matching, and the
+execution semantics are closer to unification than to regular expression
+matching i.e. a query returns all possible matches, not just the
+longest/shortest/first/last match. There is no concept of a 'greedy' operator
+for example.
+
+Query execution is based on `Cursor`s, and the resulting matches and unification
+bindings are returned as `Cursor`s as well. This allows you to mix and match
+manual traversal, cursors, and queries.
+
+Multiple queries can be executed in a batch, and efficiently traverse the tree
+looking for matches. This mode of operation can replace all visitor patterns.
+
 ## Abstract Syntax Tree (AST)
 
 AST types are a set of abstractions that provide a typed view of the untyped CST nodes.
 You can convert any untyped CST node to its corresponding AST type using their constructors.
 
 There is a corresponding type for each `RuleKind` (non-terminal) in the language. AST types are immutable.
 Additionally, their fields are constructed lazily as they are accessed for the first time.
+
+AST nodes can maintain a reference to the CST node they were constructed from,
+and can be used to navigate to the corresponding CST node.
diff --git a/documentation/public/user-guide/npm-package/NAV.md b/documentation/public/user-guide/npm-package/NAV.md
@@ -1,4 +1,5 @@
 -   [Installation](./installation.md)
 -   [Using the Parser](./using-the-parser.md)
 -   [Using the Cursor](./using-the-cursor.md)
+-   [Using Queries](./using-queries.md)
 -   [Using the AST](./using-the-ast.md)
diff --git a/documentation/public/user-guide/npm-package/using-queries.md b/documentation/public/user-guide/npm-package/using-queries.md
@@ -0,0 +1 @@
+# Using Queries
diff --git a/documentation/public/user-guide/query-language.md b/documentation/public/user-guide/query-language.md
@@ -0,0 +1,137 @@
+# The Tree Query Language
+
+## Query Syntax
+
+A _query_ is a pattern that matches a
+certain set of nodes in a tree. The expression to match a given node
+consists of a pair of brackets (`[]`) containing two things: the node's kind, and
+optionally, a series of other patterns that match the node's children. For
+example, this pattern would match any `binary_expression` node whose children
+are both `number_literal` nodes:
+
+```scheme
+[binary_expression [number_literal] [number_literal]]
+```
+
+The children of a node can optionally be named. The name is a property of the edge from
+the node to the child, and is not a property of the child. For example, this pattern will match a `binary_expression`
+node with two `number literal` children, named `left` and `right`:
+
+```scheme
+[binary_expression [left:number_literal] [right:number_literal]]
+```
+
+You can also match a node's textual content using a string literal. For example, this pattern would match a
+`binary expression` with a `+` operator:
+
+```scheme
+[binary_expression [operator:"+"] [left:_] [right:_]]
+```
+
+If you don't care about the kind of a node, you can use an underscore '\_', which matches any kind.
+For example, this pattern will match a `binary_expression`
+node with two children, one of any kind named`left` and one of any kind:
+
+```scheme
+[binary_expression [left:_] [_]]
+```
+
+Children can also be elided. For example, this would produce multiple matches for a
+`binary_expression` where at least _one_ of the children is a `string_literal` node, where each match
+is associated with each of the `string_literal` children:
+
+```scheme
+[binary_expression ... [string_literal] ...]
+```
+
+### Capturing Nodes
+
+When matching patterns, you may want to process specific nodes within the
+pattern. Captures allow you to associate names with specific nodes in a pattern,
+so that you can later refer to those nodes by those names. Capture names are
+written _after_ the nodes that they refer to, and start with an `@` character.
+
+For example, this pattern would match any assignment of a `function` to an
+`identifier`, and it would associate the name `the-function-name` with the
+identifier:
+
+```scheme
+[assignment_expression
+  @the-function-name [left:identifier]
+  [right:function]]
+```
+
+And this pattern would match all method definitions, associating the name
+`the-method-name` with the method name, `the-class-name` with the containing
+class name:
+
+```scheme
+[class_declaration
+  @the-class-name [name:identifier]
+  [body:class_body
+    [method_definition
+      @the-method-name [name:property_identifier]]]]
+```
+
+### Quantification
+
+You can surround a sequence of patterns in parenthesis (`()`), followed
+by a `?`, `*` or `+` operator. The `?` operator matches _zero or one_ repetitions
+of a pattern, the`*` operator matches _zero or more_, and the `+` operator
+matches _one or more_.
+
+For example, this pattern would match a sequence of one or more comments:
+
+```scheme
+([comment])+
+```
+
+This pattern would match a class declaration, capturing all of the decorators if
+any were present:
+
+```scheme
+[class_declaration
+  (@the-decorator [decorator])*
+  @the-name [name:identifier]]
+```
+
+This pattern would match all function calls, capturing a string argument if one was
+present:
+
+```scheme
+[call_expression
+  @the-function [function:identifier]
+  [arguments:arguments (@the-string-arg [string])?]]
+```
+
+### Alternations
+
+An alternation is written as a sequence of patterns separated by '|' and surrounded by parentheses.
+
+For example, this pattern would match a call to either a variable or an object property.
+In the case of a variable, capture it as `@function`, and in the case of a property, capture it as `@method`:
+
+```scheme
+[call_expression
+  function: (
+      @function [identifier]
+    | [member_expression @method [property:property_identifier]]
+  )
+]
+```
+
+This pattern would match a set of possible keyword tokens, capturing them as `@keyword`:
+
+```scheme
+@keyword (
+    "break"
+  | "delete"
+  | "else"
+  | "for"
+  | "function"
+  | "if"
+  | "return"
+  | "try"
+  | "while"
+)
+```
diff --git a/documentation/public/user-guide/rust-crate/NAV.md b/documentation/public/user-guide/rust-crate/NAV.md
@@ -2,3 +2,4 @@
 -   [Using the CLI](./using-the-cli.md)
 -   [Using the Parser](./using-the-parser.md)
 -   [Using the Cursor](./using-the-cursor.md)
+-   [Using Queries](./using-queries.md)
diff --git a/documentation/public/user-guide/rust-crate/using-queries.md b/documentation/public/user-guide/rust-crate/using-queries.md
@@ -0,0 +1 @@
+# Using Queries