Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to get terminator of AtLeastOnceUntil without lookahead and parsing twice? #121

Closed
chyyran opened this issue Jan 23, 2022 · 8 comments
Closed

Comments

@chyyran
Copy link

chyyran commented Jan 23, 2022

I'm trying to port a parser written with nom in Rust to a C# parser with Pidgin. There are some cases where I need the result of a terminator from AtLeastOnceUntil. With nom, I can extract both the first result, and the terminator like so

let (input, (title, version)) = take_up_to(
        alt((
            preceded(char(' '), parse_version_string).map(|t| Some(t)),
            peek(alt((char('('), char('[')))).map(|_| None)))
    )(input)?;

Somewhat equivalently I've written up this parser with Pidgin

from title in Any.AtLeastOnceUntil(Lookahead(
           Try(Char(' ').Before(ParseVersionString)).Or(
               Try(Char('(').Or(Char('[')))))).Select(string.Concat)
from version in Char(' ').Then(ParseVersionString).Optional()
select (title, version.Match(v => new RomTag[] { v }, () => Enumerable.Empty<RomTag>()));

While both of the terminator parsers have to run O(t) iterations where t is the length of title, it doesn't need to re-parse the version in the Rust version. Is it possible to achieve this with Pidgin or is this pretty much the best way to do it?

@benjamin-hodgson
Copy link
Owner

benjamin-hodgson commented Feb 23, 2022

Interesting! The code that you have looks like the best equivalent but I agree it’s ugly.

Feels like an API design hole on my part. I was using megaparsec’s endBy1 as a guide but I agree that it’s a shortcoming.

Changing AtLeastOnceUntil to return both results would break back compat, so we’d have to make it a new function. Call it, idk, AtLeastOnceUntil_ or AtLeastOnceThen or something. Not ideal either way. What do you think?

@chyyran
Copy link
Author

chyyran commented Feb 23, 2022

AtLeastOnceThen sounds better to me than AtLeastOnceUntil_, but there's only value in implementing it if it's able to return the terminator without needing to re-parse.

@benjamin-hodgson
Copy link
Owner

oh yeah, it shouldn't be a difficult code tweak - the current code throws away the terminatorResult, we'd just need to return it instead. Just need to bikeshed the API.

@atrauzzi
Copy link

atrauzzi commented Mar 31, 2022

Just gonna throw out SkipUntilWith SkipUntilThen.

In the scenario I'm facing, I could be encountering zero or more.

@benjamin-hodgson
Copy link
Owner

benjamin-hodgson commented Mar 31, 2022

Could also go with like …UntilAndReturn, idk maybe a bit of a mouthful

@chyyran
Copy link
Author

chyyran commented Jun 7, 2022

Any progress on the naming of this API? Might be good to just pick one and stick with it.

@benjamin-hodgson
Copy link
Owner

I haven't yet had time to think through the options carefully, sorry! If you're up for it, I'd be very happy to receive a PR with a concrete proposal which we can iterate. Otherwise it'll have to wait until I do have time

@benjamin-hodgson
Copy link
Owner

I ended up going with ...Then. So that's ManyThen, AtLeastOnceThen, SkipManyThen, SkipAtLeastOnceThen.

This has shipped in v3.2.0, which is working its way through the Nuget indexing pipeline as I write, should be up in about five mins.

Thanks for your interest!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants