Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an example on how to implement a parser for interspersed input #468

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
45 changes: 34 additions & 11 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -262,7 +262,7 @@ We need to make this difference because the first type of error allows us to say

### Backtrack

Backtrack allows us to convert an _arresting failure_ to _epsilon failure_. It also rewinds the input to the offset to that used before parsing began. The resulting parser might still be combined with others. Let's look at the example:
Backtrack allows us to convert an _arresting failure_ to _epsilon failure_. It also rewinds the input to the offset that was used before parsing began. The resulting parser might still be combined with others. Let's look at the example:

```scala mdoc:reset
import cats.parse.Rfc5234.{digit, sp}
Expand Down Expand Up @@ -293,7 +293,7 @@ val p2 = sp *> digit

p1.backtrack.orElse(p2).parse(" 1")
// res0: Either[Error, Tuple2[String, Char]] = Right((,1))
(p1.backtrack | p2 ).parse(" 1")
(p1.backtrack | p2).parse(" 1")
// res1: Either[Error, Tuple2[String, Char]] = Right((,1))
```

Expand Down Expand Up @@ -344,13 +344,11 @@ p1.parse("The Wind Has Risen")
This error happens because we can't really tell if we are parsing the `fieldValue` before we met a `:` char. We might do this with by writing two parsers, converting the first one's failure to epsilon failure by `backtrack` and then providing fallback parser by `|` operator (which allows the epsilon failures):

```scala mdoc
val p2 = fieldValue.? ~ (searchWord ~ sp.?).rep.string
val p2 = (searchWord ~ sp.?).rep.string

val p3 = (searchWord ~ sp.?).rep.string

(p2.backtrack | p3).parse("title:The Wind Has Risen")
(p1.backtrack | p2).parse("title:The Wind Has Risen")
// res0 = Right((,(Some((title,())),The Wind Has Risen)))
(p2.backtrack | p3).parse("The Wind Has Risen")
(p1.backtrack | p2).parse("The Wind Has Risen")
// res1 = Right((,The Wind Has Risen))
```

Expand All @@ -359,16 +357,41 @@ But this problem might be resolved with `soft` method inside the first parser si
```scala mdoc
val fieldValueSoft = alpha.rep.string.soft ~ pchar(':')

val p4 = fieldValueSoft.? ~ (searchWord ~ sp.?).rep.string
val p3 = fieldValueSoft.? ~ (searchWord ~ sp.?).rep.string

p4.parse("title:The Wind Has Risen")
p3.parse("title:The Wind Has Risen")
// res2 = Right((,(Some((title,())),The Wind Has Risen)))
p4.parse("The Wind Has Risen")
// res3 = Right((,(None,The Wind Has Risen)))
p3.parse("The Wind Has Risen")
// res3 = Right((,(None,The Wind 22Has Risen)))
```

So when the _right side_ returns an epsilon failure the `soft` method allows us to rewind parsed input and try to proceed it's parsing with next parsers (without changing the parser itself!).

Another common use case for `soft` is implementing a parser to find values interspersed with a separator. For example, if you want to extract `'a'`, `'b'` and `'c'` from `"a,b,c"`, you can use `soft`.

Naively, one may try the following:

```scala mdoc
val p4 = alpha
val naiveInterspersed = (p4 <* pchar(',')).rep
naiveInterspersed.parse("a,b,c")
// res4 = Left(Error(5,NonEmptyList(InRange(offset = 5, lower = ',', upper = ','),List())))
```

Basically, it's looking for the trailing comma:

```scala mdoc
naiveInterspersed.parse("a,b,c,")
// res5 = Right(("", NonEmptyList('a', List('b', 'c'))))
```

But you can use, `soft` along with `|` to implement that:
```scala mdoc
val interspersed = ((p4.soft <* pchar(',')) | p4).rep
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this example would also parse abc which maybe what you want but might be a bit confusing for newcomers.

here is an example of how I parse lists in bosatsu:
https://github.com/johnynek/bosatsu/blob/c58b55785b6ac72e59b98afe3149a042d623a902/core/src/main/scala/org/bykn/bosatsu/Parser.scala#L302

interspersed.parse("a,b,c")
// res6 = Right(("", NonEmptyList('a', List('b', 'c'))))
```

# JSON parser example

Below is most of a json parser (the string unescaping is elided). This example can give you a feel
Expand Down