-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Impossible DID URL ABNF using eager evaluation without backtracking? #344
Comments
BTW, this is could be in the same class of "parser-ate-my-homework" behavior found in #333. |
I got my grammar to parse
I am ensuring that |
@wyc I would like to thank you for this very detailed bug report! I am using parse2, and I have come across the same problem. The current ABNF doesn't seem to be "completely wrong", but average parsers that don't support backtracking have problems with it. I think I also agree with your proposed third way forward, i.e. figure out a restricted version of What do you think about doing something like this:
? |
You're right that it's not a matter of eagerness of evaluation but backtracking support! Thanks, I'll update the title. I think we should strive as much as possible to maintain interoperability with query parameters able to be used within HTTP URLs, and the proposed reduced set of characters would preclude being able to copy just any HTTP URL query parameters directly after the DID path. For example, the following direct transposition would be invalid without converting to percent encoding:
The I think the reason we're encountering this issue is that we're trying to tokenize the parameter names and values at the grammar layer, whereas HTTP URI implementations use the URL BNF only for query string validation, but handle the parsing of the query string with a different more sophisticated algorithm: If we look at |
@peacekeeper what do you think of this?
This should be logically compatible with all URL query strings while also giving us the |
I replicated this in the CDDL syntax and thus may need to revisit this for synergy. |
@wyc sorry for the late response. I think your latest version isn't 100% semantically correct, since the BTW in agree in principle with preserving the name and value tokens in the ABNF! |
@peacekeeper Yes, to be as permissive as possible, I did mean to say that
We can disallow If we disallowed, it would simplify it to something like your proposal with extra chars:
Do you think this evaluation of existing impls and usage is a good approach? |
@wyc I agree that maximum compatibility with query parameters in other URI schemes is desirable. After thinking this through a bit, I think you are right that either 1. we get rid of the I'm a bit surprised there doesn't seem to be an "official" reliable ABNF grammar anywhere for query parameters that are part of a URL, at least I couldn't find anything. What would your recommended solution be at this point? |
@peacekeeper It is with a heavy heart that I recommend we go with option 1 and remove the If you are agreed, we should ask for any objections in the next DID-WG call. |
I think I agree. Do you want to create a PR to make this change and then we see if there are any objections? |
Sure I'll do this today |
- Use `query` from RFC3986 (URI) instead of parsing param names and values - Correct RFC5324 (MIB for Fibre-Channel Security) to RFC5234 (ABNF) See discussion here for reasoning: w3c#344
- Use `query` from RFC3986 (URI) instead of parsing param names and values - Correct RFC5324 (MIB for Fibre-Channel Security) to RFC5234 (ABNF) See discussion here for reasoning: w3c#344
- Use `query` from RFC3986 (URI) instead of parsing param names and values - Correct RFC5324 (MIB for Fibre-Channel Security) to RFC5234 (ABNF) See discussion here for reasoning: w3c#344
- Use `query` from RFC3986 (URI) instead of parsing param names and values - Correct RFC5324 (MIB for Fibre-Channel Security) to RFC5234 (ABNF) See discussion here for reasoning: w3c#344
- Use `query` from RFC3986 (URI) instead of parsing param names and values - Correct RFC5324 (MIB for Fibre-Channel Security) to RFC5234 (ABNF) See discussion here for reasoning: w3c#344
- Use `query` from RFC3986 (URI) instead of parsing param names and values - Correct RFC5324 (MIB for Fibre-Channel Security) to RFC5234 (ABNF) See discussion here for reasoning: #344
Great, thanks for all the input. |
- Use `query` from RFC3986 (URI) instead of parsing param names and values - Correct RFC5324 (MIB for Fibre-Channel Security) to RFC5234 (ABNF) See discussion here for reasoning: w3c/did-core#344
Hi, I've tried to implement the ABNF DID URL found in section 3.2, but couldn't get it to work. However, DID Scheme is working fine. I am using the Rust pest library due to its rather direct representation of ABNF. One of the following is true: my implementation is incorrect, the ABNF notation is wrong, or the ABNF notation must be evaluated with some form of laziness. I have provided steps to reproduce this error, an explanation as to why it's possibly an error, an analysis of what could be wrong, and also how we might address this.
Steps to reproduce
Select "did_url" next to the Input box.
Notice how under
did_url
,did_scheme
andpath_abempty
are correctly populated, yetdid_query
is nowhere to be found. It did not fully parse.Why it's possibly an error
It's possibly an error because the parser did not recognize
?p1=v1&p2=v2
as a valid DID query string. If you would like to see it explicitly fail, you can replace the linewith
, thereby requiring the
("?" ~ did_query)
rule instead of allowing it to be optional.What could be wrong
I think what could be going wrong here is related to the rule of
. Specifically,
pchar
from RFC3986 contains thesub_delims
rule, which includes"="
as a valid atom. This may be crossing wires with the assumption ofparam
, which is defined as. With pest, the required
"="
inparam
never has a chance to be parsed because it is already eaten byparam_name
. Therefore, the rule ofparam
is impossible with the current ABNF notation if we assume eager evaluation, which is the default evaluation mode I've seen across BNF-like parsers including pest and ANTLR.Ways forward
To remedy this (or not), we could consider one of the following:
query
fordid_query
:query = *( pchar / "/" / "?" )
. This means that we will lose theparam
,param_name
, andparam_value
rules, which would be nice to have during token interpretation.pchar
excluding"="
and possibly other atoms to use to avoid the"="
-eaten-by-param_name
problem.If I have some time soon, I will try to implement a parser combinator for this with lazy evaluation in Rust or Haskell to see if it works.
Can someone confirm that this might be a problem? If so, what do folks think is the best course of action here?
The text was updated successfully, but these errors were encountered: