-
-
Notifications
You must be signed in to change notification settings - Fork 128
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Initial commit for the Logos Handboook * A bit more adds to the book * chore(ci): setup automated CI for book * chore(ci): update branches * fix(ci): remove extra needs * chore(docs): adding brainfuck example * Add missing `Debug` error type requirement (#298) * chore(docs): create JSON example * Initial commit for the Logos Handboook * A bit more adds to the book * chore(ci): setup automated CI for book * chore(ci): update branches * fix(ci): remove extra needs * chore(docs): adding brainfuck example * chore(docs): create JSON example * chore(ci): test code examples * chore(docs): scrape examples and autodoc features * chore(docs): adding brainfuck example * Add missing `Debug` error type requirement (#298) * chore(docs): create JSON example * chore(ci): test code examples * chore(docs): scrape examples and autodoc features * Auto stash before rebase of "maciejhirsz/book" * chore(book): typos and styling --------- Co-authored-by: Maciej Hirsz <[email protected]> Co-authored-by: Marcin Wojnarowski <[email protected]>
- Loading branch information
1 parent
dcfd3d7
commit 4980bd2
Showing
13 changed files
with
680 additions
and
4 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,11 +1,15 @@ | ||
# Summary | ||
|
||
+ [Intro](./intro.md) | ||
+ [Getting Started](./getting-started.md) | ||
+ [Examples](./examples.md) | ||
+ [Brainfuck interpreter](./examples/brainfuck.md) | ||
+ [JSON parser](./examples/json.md) | ||
+ [Attributes](./attributes.md) | ||
+ [`#[logos]`](./attributes/logos.md) | ||
+ [`#[error]`](./attributes/error.md) | ||
+ [`#[token]` and `#[regex]`](./attributes/token_and_regex.md) | ||
+ [Token disambiguation](./token-disambiguation.md) | ||
+ [Using `Extras`](./extras.md) | ||
+ [Using callbacks](./callbacks.md) | ||
+ [Common regular expressions](./common-regex.md) | ||
+ [Common regular expressions](./common-regex.md) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
# Examples | ||
|
||
The following examples are ordered by increasing level of complexity. | ||
|
||
**[Brainfuck interpreter](./examples/brainfuck.md)**: Lexers are very powerful tools for parsing code programs into meaningful instructions. We show you how you can build an interpreter for the Brainfuck programming language under 100 lines of code! | ||
|
||
**[JSON parser](./examples/json.md)**: We present a JSON parser written with Logos that does nice error reporting when invalid values are encountered. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
# Brainfuck interpreter | ||
|
||
In most programming languages, commands can be made of multiple program tokens, where a token is simply string slice that has a particular meaning for the language. For example, in Rust, the function signature `pub fn main()` could be split by the **lexer** into tokens `pub`, `fn`, `main`, `(`, and `)`. Then, the **parser** combines tokens into meaningful program instructions. | ||
|
||
However, there exists programming languages that are so simple, such as Brainfuck, that each token can be mapped to a single instruction. There are actually 8 single-characters tokens: | ||
|
||
```rust,no_run,noplayground | ||
{{#include ../../../logos/examples/brainfuck.rs:tokens}} | ||
``` | ||
|
||
All other characters must be ignored. | ||
|
||
Once the tokens are obtained, a Brainfuck interpreter can be easily created using a [Finite-state machine](https://en.wikipedia.org/wiki/Finite-state_machine). For the sake of simpliciy, we collected all the tokens into one vector called `operations`. | ||
|
||
Now, creating an interpreter becomes straightforward[^1]: | ||
```rust,no_run,noplayground | ||
{{#include ../../../logos/examples/brainfuck.rs:fsm}} | ||
``` | ||
|
||
[^1]: There is a small trick to make it easy. As it can be seen in the full code, we first perform a check that all beginning loops (`'['`) have a matching end (`']'`). This way, we can create two maps, `pairs` and `pairs_reverse`, to easily jump back and forth between them. | ||
|
||
Finally, we provide you the full code that you should be able to run with[^2]: | ||
```bash | ||
cd logos/logos | ||
cargo run --example brainfuck examples/hello_word.bf | ||
``` | ||
|
||
[^2] You first need to clone [this repository](https://github.com/maciejhirsz/logos). | ||
|
||
```rust,no_run,noplayground | ||
{{#include ../../../logos/examples/brainfuck.rs:all}} | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,55 @@ | ||
# JSON parser | ||
|
||
JSON is a widely used format for exchanging data between formats, while being human-readable. | ||
|
||
Possible values are defined recursively and can be any of the following: | ||
|
||
```rust,no_run,noplayground | ||
{{#include ../../../logos/examples/json.rs:values}} | ||
``` | ||
|
||
Object are delimites with braces `{` and `}`, arrays with brackets `[` and `]`, and values with commas `,`. Newlines, tabs or spaces should be ignored by the lexer. | ||
|
||
Knowing that, we can construct a lexer with `Logos` that will identify all those cases: | ||
|
||
```rust,no_run,noplayground | ||
{{#include ../../../logos/examples/json.rs:tokens}} | ||
``` | ||
|
||
> NOTE: the hardest part is to define valid regexes for `Number` and `String` variants. The present solution was inspired by [this stackoverflow thread](https://stackoverflow.com/questions/32155133/regex-to-match-a-json-string). | ||
Once we have our tokens, we must parse them into actual JSON values. We will proceed be creating 3 functions: | ||
|
||
+ `parse_value` for parsing any JSON object, without prior knowledge of its type; | ||
+ `parse_array` for parsing an array, assuming we matched `[`; | ||
+ and `parse_oject` for parsing an object, assuming we matched `{`. | ||
|
||
Starting with parsing an arbitrary value, we can easily obtain the four scalar types, `Bool`, `Null`, `Number`, and `String`, while we will call the next functions for arrays and objects parsing. | ||
|
||
```rust,no_run,noplayground | ||
{{#include ../../../logos/examples/json.rs:value}} | ||
``` | ||
|
||
To parse an array, we simply loop between tokens, alternating between parsing values and commas, until a closing bracket is found. | ||
|
||
```rust,no_run,noplayground | ||
{{#include ../../../logos/examples/json.rs:array}} | ||
``` | ||
|
||
A similar approach is used for objects, where the only different is that we expect (key, value) pairs, separated by a colon. | ||
|
||
```rust,no_run,noplayground | ||
{{#include ../../../logos/examples/json.rs:object}} | ||
``` | ||
|
||
Finally, we provide you the full code that you should be able to run with[^1]: | ||
```bash | ||
cd logos/logos | ||
cargo run --example json examples/example.json | ||
``` | ||
|
||
[^1] You first need to clone [this repository](https://github.com/maciejhirsz/logos). | ||
|
||
```rust,no_run,noplayground | ||
{{#include ../../../logos/examples/json.rs:all}} | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,72 @@ | ||
# Getting Started | ||
|
||
**Logos** can be included in your Rust project using the `cargo add logos` command, or by directly modifying your `Cargo.toml` file: | ||
|
||
```toml | ||
[dependencies] | ||
logos = "0.13.0" | ||
``` | ||
|
||
Then, you can automatically derive the [`Logos`](https://docs.rs/logos/latest/logos/trait.Logos.html) trait on your `enum` using the `Logos` derive macro: | ||
|
||
```rust,no_run,no_playground | ||
use logos::Logos; | ||
#[derive(Logos, Debug, PartialEq)] | ||
#[logos(skip r"[ \t\n\f]+")] // Ignore this regex pattern between tokens | ||
enum Token { | ||
// Tokens can be literal strings, of any length. | ||
#[token("fast")] | ||
Fast, | ||
#[token(".")] | ||
Period, | ||
// Or regular expressions. | ||
#[regex("[a-zA-Z]+")] | ||
Text, | ||
} | ||
``` | ||
|
||
Then, you can use `Logos::lexer` method to turn any `&str` into an iterator of tokens[^1]: | ||
|
||
```rust,no_run,no_playground | ||
let mut lex = Token::lexer("Create ridiculously fast Lexers."); | ||
assert_eq!(lex.next(), Some(Ok(Token::Text))); | ||
assert_eq!(lex.span(), 0..6); | ||
assert_eq!(lex.slice(), "Create"); | ||
assert_eq!(lex.next(), Some(Ok(Token::Text))); | ||
assert_eq!(lex.span(), 7..19); | ||
assert_eq!(lex.slice(), "ridiculously"); | ||
assert_eq!(lex.next(), Some(Ok(Token::Fast))); | ||
assert_eq!(lex.span(), 20..24); | ||
assert_eq!(lex.slice(), "fast"); | ||
assert_eq!(lex.next(), Some(Ok(Token::Text))); | ||
assert_eq!(lex.slice(), "Lexers"); | ||
assert_eq!(lex.span(), 25..31); | ||
assert_eq!(lex.next(), Some(Ok(Token::Period))); | ||
assert_eq!(lex.span(), 31..32); | ||
assert_eq!(lex.slice(), "."); | ||
assert_eq!(lex.next(), None); | ||
``` | ||
|
||
[^1]: Each item is actually a [`Result<Token, _>`](https://docs.rs/logos/latest/logos/struct.Lexer.html#associatedtype.Item), because the lexer returns an error if some part of the string slice does not match any variant of `Token`. | ||
|
||
Because [`Lexer`](https://docs.rs/logos/latest/logos/struct.Lexer.html), returned by [`Logos::lexer`](https://docs.rs/logos/latest/logos/trait.Logos.html#method.lexer), implements the `Iterator` trait, you can use a `for .. in` construct: | ||
|
||
```rust,no_run,no_playground | ||
for result in Token::lexer("Create ridiculously fast Lexers.") { | ||
match result { | ||
Ok(token) => println!("{:#?}", token), | ||
Err(e) => panic!("some error occured: {}", e), | ||
} | ||
} | ||
``` | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.