diff --git a/docs/content/error-recover/_index.md b/docs/content/error-recover/_index.md new file mode 100644 index 00000000..fb59a0e9 --- /dev/null +++ b/docs/content/error-recover/_index.md @@ -0,0 +1,141 @@ +--- +date: 2024-12-20 00:00:00 +0900 +title: "Error recovering" +weight: 2 +--- + +Since v0.1.0, `memefish.ParseXXX` methods returns AST node(s) even if an error is reproted. +That is, if we try to parse incomplete SQL such as: + +```sql +SELECT (1 +) + (* 2) +``` + +Then, the following two errors are reported: + +```sql +syntax error: :1:12: unexpected token: ) + + 1: SELECT (1 +) + (* 2) + ^ + + +syntax error: :1:17: unexpected token: * + + 1: SELECT (1 +) + (* 2) + ^ +``` + +Hoever, the AST is also returned: + +```go {hl_lines=["10-31","36-57"]} +&ast.QueryStatement{ + Query: &ast.Select{ + Results: []ast.SelectItem{ + &ast.ExprSelectItem{ + Expr: &ast.BinaryExpr{ + Op: "+", + Left: &ast.ParenExpr{ + Lparen: 7, + Rparen: 11, + Expr: &ast.BadExpr{ + BadNode: &ast.BadNode{ + NodePos: 8, + NodeEnd: 11, + Tokens: []*token.Token{ + &token.Token{ + Kind: "", + Raw: "1", + Base: 10, + Pos: 8, + End: 9, + }, + &token.Token{ + Kind: "+", + Space: " ", + Raw: "+", + Pos: 10, + End: 11, + }, + }, + }, + }, + }, + Right: &ast.ParenExpr{ + Lparen: 15, + Rparen: 19, + Expr: &ast.BadExpr{ + BadNode: &ast.BadNode{ + NodePos: 16, + NodeEnd: 19, + Tokens: []*token.Token{ + &token.Token{ + Kind: "*", + Raw: "*", + Pos: 16, + End: 17, + }, + &token.Token{ + Kind: "", + Space: " ", + Raw: "2", + Base: 10, + Pos: 18, + End: 19, + }, + }, + }, + }, + }, + }, + }, + }, + }, +} +``` + +Thus, the places where the error occurred are filled with the `ast.BadXXX` nodes (`ast.BadExpr` in this example). + +## How méméfish performs error recovery + +This section explains how méméfish performs error recovery. + +In méméfish, a *recovery point* is set when parsing a syntax where some multiple types of AST nodes are the result. +For example, when parsing an parenthesized expression, the recovery point is set after the open parenthesis `(`. +If an error occurs in the parenthesized expression, the parser backtracks to the recovery point and skips the tokens until the parenthesized expression ends. +The skipped tokens are then collectively `ast.BadNode` and this node is wrapped up a specific `ast.BadXXX` node (e.g., `ast.BadExpr`). + +```sql +SELECT (1 + 2 *) + ^--- error point + ^---------- recovery point + |~~~~~| --- skipped tokens +``` + +Recovery points are set where: + +- the beginning of statements, queries, DDLs, DMLs, +- the beginning of expressions (e.g., after an open parenthesis `(`, `SELECT`, `WHERE` etc.), and +- the beginning of types. + +Token skipping is performed as follows. + +- For `ast.Statement`, `ast.DDL`, and `ast.DML`, + * skip tokens until a semicolon `;` appears. +- For `ast.QueryExpr`, + * skip tokens until a semicolon `;` appears, or + * skip tokens with counting the nest of parentheses `(` + + until the closing symbol (`)`) appears at no nestings, or + + until the symbol that is supposed to be the end of the expression (`UNION`, `INTERSECT`, `EXCEPT`) appears at no nestings. +- For `ast.Expr`, + * skip tokens until a semicolon `;` appears, or + * skip tokens with counting the nest of parentheses `(`, brackets `[`, `CASE` and `WHEN` + + until the closing symbol (`)`, `]`, `END`, `THEN`) appears at no nestings or + + until the symbol that is supposed to be the end of the expression (`,`, `AS`, `FROM`, `GROUP`, `HAVING`, `ORDER`, `LIMIT`, `OFFSET`, `AT`, `UNION`, `INTERSECT`, `EXCEPT`) appears at no nestings. +- For `ast.Type`, + * skip tokens until the semicolon `;` or the closing parenthesis `)` appears, or + * skip tokens with counting the nest of triangle brackets `<` + * until the closing symbol (`>`) appears at no nestings. + +Note that this skipping rules are just heuristics and may not be perfect. +In some cases, there is a possibility of skipping too many tokens. diff --git a/docs/content/example-parse/_index.md b/docs/content/example-parse/_index.md index 0d820e95..bcde9055 100644 --- a/docs/content/example-parse/_index.md +++ b/docs/content/example-parse/_index.md @@ -6,10 +6,6 @@ weight: 1 This example shows how to parse a Spanner SQL and unparse it. - - - ## Code - ```go package main diff --git a/docs/hugo.toml b/docs/hugo.toml index c881e529..d7d89cff 100644 --- a/docs/hugo.toml +++ b/docs/hugo.toml @@ -21,6 +21,8 @@ summaryLength = 30 startLevel = 2 endLevel = 6 ordered = false + [markup.highlight] + style = 'catppuccin-frappe' [params] description = "Spanner SQL parser for Go" diff --git a/tools/parse/main.go b/tools/parse/main.go index ff3c86b9..77307ab5 100644 --- a/tools/parse/main.go +++ b/tools/parse/main.go @@ -15,7 +15,7 @@ import ( "github.com/cloudspannerecosystem/memefish/ast" "github.com/cloudspannerecosystem/memefish/token" "github.com/cloudspannerecosystem/memefish/tools/util/poslang" - "github.com/k0kubun/pp" + "github.com/k0kubun/pp/v3" ) var usage = heredoc.Doc(` @@ -121,8 +121,11 @@ func main() { } fmt.Println("--- AST") - _, _ = pp.Println(node) + pprinter := pp.New() + pprinter.SetOmitEmpty(true) + _, _ = pprinter.Println(node) fmt.Println() + fmt.Println("--- SQL") fmt.Println(node.SQL())