Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

regexp/syntax: accept (?<name>...) in addition to (?P<name>...) #58458

Closed
rsc opened this issue Feb 10, 2023 · 9 comments
Closed

regexp/syntax: accept (?<name>...) in addition to (?P<name>...) #58458

rsc opened this issue Feb 10, 2023 · 9 comments

Comments

@rsc
Copy link
Contributor

rsc commented Feb 10, 2023

Based on the discussion on rust-lang/regex#955, it seems that many regexp sources refer only to (?expr) for capturing group names, and at least a couple regexp implementations only support that form. Go, following RE2, only supports (?Pexpr). The original comment I wrote for RE2 says:

  // Check for named captures, first introduced in Python's regexp library.
  // As usual, there are three slightly different syntaxes:
  //
  //   (?P<name>expr)   the original, introduced by Python
  //   (?<name>expr)    the .NET alteration, adopted by Perl 5.10
  //   (?'name'expr)    another .NET alteration, adopted by Perl 5.10
  //
  // Perl 5.10 gave in and implemented the Python version too,
  // but they claim that the last two are the preferred forms.
  // PCRE and languages based on it (specifically, PHP and Ruby)
  // support all three as well.  EcmaScript 4 uses only the Python form.
  //
  // In both the open source world (via Code Search) and the
  // Google source tree, (?P<expr>name) is the dominant form,
  // so that's the one we implement.  One is enough.

Given the widespread usage and documentation, I am now inclined to accept (?expr) as well. There seems to be very little usage of (?'name'expr) so that one would still be rejected.

There would be no change to the data structures, only to the syntax accepted. (Go's regexp/syntax does not guarantee to round trip back to the exact same string.)

I talked to RE2 maintainer @junyer and he agreed to make the change in RE2 if Go does, which will help keep Go, RE2, and RE2/J in sync.

It also sounds like @BurntSushi is moving toward making the same change for Rust. It is not a goal for Rust and Go to exactly match, but it is more evidence for the decision.

@gopherbot gopherbot added this to the Proposal milestone Feb 10, 2023
BurntSushi pushed a commit to rust-lang/regex that referenced this issue Feb 10, 2023
It turns out that both '(?P<name>...)' and '(?<name>...)' are rather
common among regex engines. There are several that support just one or
the other. Until this commit, the regex crate only supported the former,
along with both RE2, RE2/J and Go's regexp package. There are also
several regex engines that only supported the latter, such as Onigmo,
Onuguruma, Java, Ruby, Boost, .NET and Javascript. To decrease friction,
and because there is somewhat little cost to doing so, we elect to
support both.

It looks like perhaps RE2 and Go's regexp package will go the same
route, but it isn't fully decided yet:
golang/go#58458

Closes #955
BurntSushi pushed a commit to rust-lang/regex that referenced this issue Feb 10, 2023
It turns out that both '(?P<name>...)' and '(?<name>...)' are rather
common among regex engines. There are several that support just one or
the other. Until this commit, the regex crate only supported the former,
along with both RE2, RE2/J and Go's regexp package. There are also
several regex engines that only supported the latter, such as Onigmo,
Onuguruma, Java, Ruby, Boost, .NET and Javascript. To decrease friction,
and because there is somewhat little cost to doing so, we elect to
support both.

It looks like perhaps RE2 and Go's regexp package will go the same
route, but it isn't fully decided yet:
golang/go#58458

Closes #955, Closes #956
@ianlancetaylor ianlancetaylor moved this to Incoming in Proposals Feb 10, 2023
BurntSushi pushed a commit to rust-lang/regex that referenced this issue Feb 18, 2023
It turns out that both '(?P<name>...)' and '(?<name>...)' are rather
common among regex engines. There are several that support just one or
the other. Until this commit, the regex crate only supported the former,
along with both RE2, RE2/J and Go's regexp package. There are also
several regex engines that only supported the latter, such as Onigmo,
Onuguruma, Java, Ruby, Boost, .NET and Javascript. To decrease friction,
and because there is somewhat little cost to doing so, we elect to
support both.

It looks like perhaps RE2 and Go's regexp package will go the same
route, but it isn't fully decided yet:
golang/go#58458

Closes #955, Closes #956
BurntSushi pushed a commit to rust-lang/regex that referenced this issue Mar 2, 2023
It turns out that both '(?P<name>...)' and '(?<name>...)' are rather
common among regex engines. There are several that support just one or
the other. Until this commit, the regex crate only supported the former,
along with both RE2, RE2/J and Go's regexp package. There are also
several regex engines that only supported the latter, such as Onigmo,
Onuguruma, Java, Ruby, Boost, .NET and Javascript. To decrease friction,
and because there is somewhat little cost to doing so, we elect to
support both.

It looks like perhaps RE2 and Go's regexp package will go the same
route, but it isn't fully decided yet:
golang/go#58458

Closes #955, Closes #956
BurntSushi pushed a commit to rust-lang/regex that referenced this issue Mar 4, 2023
It turns out that both '(?P<name>...)' and '(?<name>...)' are rather
common among regex engines. There are several that support just one or
the other. Until this commit, the regex crate only supported the former,
along with both RE2, RE2/J and Go's regexp package. There are also
several regex engines that only supported the latter, such as Onigmo,
Onuguruma, Java, Ruby, Boost, .NET and Javascript. To decrease friction,
and because there is somewhat little cost to doing so, we elect to
support both.

It looks like perhaps RE2 and Go's regexp package will go the same
route, but it isn't fully decided yet:
golang/go#58458

Closes #955, Closes #956
BurntSushi pushed a commit to rust-lang/regex that referenced this issue Mar 5, 2023
It turns out that both '(?P<name>...)' and '(?<name>...)' are rather
common among regex engines. There are several that support just one or
the other. Until this commit, the regex crate only supported the former,
along with both RE2, RE2/J and Go's regexp package. There are also
several regex engines that only supported the latter, such as Onigmo,
Onuguruma, Java, Ruby, Boost, .NET and Javascript. To decrease friction,
and because there is somewhat little cost to doing so, we elect to
support both.

It looks like perhaps RE2 and Go's regexp package will go the same
route, but it isn't fully decided yet:
golang/go#58458

Closes #955, Closes #956
BurntSushi pushed a commit to rust-lang/regex that referenced this issue Mar 15, 2023
It turns out that both '(?P<name>...)' and '(?<name>...)' are rather
common among regex engines. There are several that support just one or
the other. Until this commit, the regex crate only supported the former,
along with both RE2, RE2/J and Go's regexp package. There are also
several regex engines that only supported the latter, such as Onigmo,
Onuguruma, Java, Ruby, Boost, .NET and Javascript. To decrease friction,
and because there is somewhat little cost to doing so, we elect to
support both.

It looks like perhaps RE2 and Go's regexp package will go the same
route, but it isn't fully decided yet:
golang/go#58458

Closes #955, Closes #956
BurntSushi pushed a commit to rust-lang/regex that referenced this issue Mar 15, 2023
It turns out that both '(?P<name>...)' and '(?<name>...)' are rather
common among regex engines. There are several that support just one or
the other. Until this commit, the regex crate only supported the former,
along with both RE2, RE2/J and Go's regexp package. There are also
several regex engines that only supported the latter, such as Onigmo,
Onuguruma, Java, Ruby, Boost, .NET and Javascript. To decrease friction,
and because there is somewhat little cost to doing so, we elect to
support both.

It looks like perhaps RE2 and Go's regexp package will go the same
route, but it isn't fully decided yet:
golang/go#58458

Closes #955, Closes #956
BurntSushi pushed a commit to rust-lang/regex that referenced this issue Mar 15, 2023
It turns out that both '(?P<name>...)' and '(?<name>...)' are rather
common among regex engines. There are several that support just one or
the other. Until this commit, the regex crate only supported the former,
along with both RE2, RE2/J and Go's regexp package. There are also
several regex engines that only supported the latter, such as Onigmo,
Onuguruma, Java, Ruby, Boost, .NET and Javascript. To decrease friction,
and because there is somewhat little cost to doing so, we elect to
support both.

It looks like perhaps RE2 and Go's regexp package will go the same
route, but it isn't fully decided yet:
golang/go#58458

Closes #955, Closes #956
BurntSushi pushed a commit to rust-lang/regex that referenced this issue Mar 20, 2023
It turns out that both '(?P<name>...)' and '(?<name>...)' are rather
common among regex engines. There are several that support just one or
the other. Until this commit, the regex crate only supported the former,
along with both RE2, RE2/J and Go's regexp package. There are also
several regex engines that only supported the latter, such as Onigmo,
Onuguruma, Java, Ruby, Boost, .NET and Javascript. To decrease friction,
and because there is somewhat little cost to doing so, we elect to
support both.

It looks like perhaps RE2 and Go's regexp package will go the same
route, but it isn't fully decided yet:
golang/go#58458

Closes #955, Closes #956
BurntSushi pushed a commit to rust-lang/regex that referenced this issue Mar 21, 2023
It turns out that both '(?P<name>...)' and '(?<name>...)' are rather
common among regex engines. There are several that support just one or
the other. Until this commit, the regex crate only supported the former,
along with both RE2, RE2/J and Go's regexp package. There are also
several regex engines that only supported the latter, such as Onigmo,
Onuguruma, Java, Ruby, Boost, .NET and Javascript. To decrease friction,
and because there is somewhat little cost to doing so, we elect to
support both.

It looks like perhaps RE2 and Go's regexp package will go the same
route, but it isn't fully decided yet:
golang/go#58458

Closes #955, Closes #956
BurntSushi pushed a commit to rust-lang/regex that referenced this issue Apr 15, 2023
It turns out that both '(?P<name>...)' and '(?<name>...)' are rather
common among regex engines. There are several that support just one or
the other. Until this commit, the regex crate only supported the former,
along with both RE2, RE2/J and Go's regexp package. There are also
several regex engines that only supported the latter, such as Onigmo,
Onuguruma, Java, Ruby, Boost, .NET and Javascript. To decrease friction,
and because there is somewhat little cost to doing so, we elect to
support both.

It looks like perhaps RE2 and Go's regexp package will go the same
route, but it isn't fully decided yet:
golang/go#58458

Closes #955, Closes #956
BurntSushi pushed a commit to rust-lang/regex that referenced this issue Apr 17, 2023
It turns out that both '(?P<name>...)' and '(?<name>...)' are rather
common among regex engines. There are several that support just one or
the other. Until this commit, the regex crate only supported the former,
along with both RE2, RE2/J and Go's regexp package. There are also
several regex engines that only supported the latter, such as Onigmo,
Onuguruma, Java, Ruby, Boost, .NET and Javascript. To decrease friction,
and because there is somewhat little cost to doing so, we elect to
support both.

It looks like perhaps RE2 and Go's regexp package will go the same
route, but it isn't fully decided yet:
golang/go#58458

Closes #955, Closes #956
BurntSushi pushed a commit to rust-lang/regex that referenced this issue Apr 17, 2023
It turns out that both '(?P<name>...)' and '(?<name>...)' are rather
common among regex engines. There are several that support just one or
the other. Until this commit, the regex crate only supported the former,
along with both RE2, RE2/J and Go's regexp package. There are also
several regex engines that only supported the latter, such as Onigmo,
Onuguruma, Java, Ruby, Boost, .NET and Javascript. To decrease friction,
and because there is somewhat little cost to doing so, we elect to
support both.

It looks like perhaps RE2 and Go's regexp package will go the same
route, but it isn't fully decided yet:
golang/go#58458

Closes #955, Closes #956
BurntSushi pushed a commit to rust-lang/regex that referenced this issue Apr 17, 2023
It turns out that both '(?P<name>...)' and '(?<name>...)' are rather
common among regex engines. There are several that support just one or
the other. Until this commit, the regex crate only supported the former,
along with both RE2, RE2/J and Go's regexp package. There are also
several regex engines that only supported the latter, such as Onigmo,
Onuguruma, Java, Ruby, Boost, .NET and Javascript. To decrease friction,
and because there is somewhat little cost to doing so, we elect to
support both.

It looks like perhaps RE2 and Go's regexp package will go the same
route, but it isn't fully decided yet:
golang/go#58458

Closes #955, Closes #956
BurntSushi pushed a commit to rust-lang/regex that referenced this issue Apr 17, 2023
It turns out that both '(?P<name>...)' and '(?<name>...)' are rather
common among regex engines. There are several that support just one or
the other. Until this commit, the regex crate only supported the former,
along with both RE2, RE2/J and Go's regexp package. There are also
several regex engines that only supported the latter, such as Onigmo,
Onuguruma, Java, Ruby, Boost, .NET and Javascript. To decrease friction,
and because there is somewhat little cost to doing so, we elect to
support both.

It looks like perhaps RE2 and Go's regexp package will go the same
route, but it isn't fully decided yet:
golang/go#58458

Closes #955, Closes #956
@rsc
Copy link
Contributor Author

rsc commented Jun 28, 2023

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@rsc rsc moved this from Incoming to Active in Proposals Jun 28, 2023
@BurntSushi
Copy link

Ah I forgot to update this issue to say that as of regex 1.8, (?<name>...) is now supported syntax in Rust's regex crate.

@rsc
Copy link
Contributor Author

rsc commented Jun 28, 2023

Thanks @BurntSushi. It's all good, I forgot to move this proposal along too.
(It's not just non-Go-team proposals that get blocked for long periods of time!)

@rsc
Copy link
Contributor Author

rsc commented Jul 5, 2023

I still believe we should do this, to keep Go, RE2, and Rust aligned.
Have all concerns been addressed?

@junyer
Copy link
Contributor

junyer commented Jul 5, 2023

I see a couple of 👎 reactions, but no actual arguments against. Doing this still SGTM, FWIW. :)

@rsc rsc moved this from Active to Likely Accept in Proposals Jul 12, 2023
@rsc
Copy link
Contributor Author

rsc commented Jul 12, 2023

Based on the discussion above, this proposal seems like a likely accept.
— rsc for the proposal review group

@rsc rsc moved this from Likely Accept to Accepted in Proposals Jul 19, 2023
@rsc
Copy link
Contributor Author

rsc commented Jul 19, 2023

No change in consensus, so accepted. 🎉
This issue now tracks the work of implementing the proposal.
— rsc for the proposal review group

@rsc rsc changed the title proposal: regexp/syntax: accept (?<name>...) in addition to (?P<name>...) regexp/syntax: accept (?<name>...) in addition to (?P<name>...) Jul 19, 2023
@rsc rsc modified the milestones: Proposal, Backlog Jul 19, 2023
mauri870 added a commit to mauri870/go that referenced this issue Jul 28, 2023
Currently the only named capture supported by regexp is (?P<name>a).

The syntax (?<name>a) is also widely used and there is currently an effort from
 the Rust regex and RE2 teams to also accept this syntax.

Fixes golang#58458
mauri870 added a commit to mauri870/go that referenced this issue Jul 28, 2023
Currently the only named capture supported by regexp is (?P<name>a).

The syntax (?<name>a) is also widely used and there is currently an effort from
 the Rust regex and RE2 teams to also accept this syntax.

Fixes golang#58458
@gopherbot
Copy link
Contributor

Change https://go.dev/cl/513838 mentions this issue: regexp/syntax: accept (?<name>...) syntax as valid capture

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/515295 mentions this issue: regexp/syntax: add support for (?<name>expr)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants