Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend Regex alternation source generator optimization for backtracking #62441

Closed
stephentoub opened this issue Dec 6, 2021 · 1 comment
Closed

Comments

@stephentoub
Copy link
Member

In the Regex source generator, we special-case alternations where every branch begins with a different char, preferring to generate a switch statement, e.g. "abc|def" yields a switch statement with a case for 'a' and a case for 'd', as the implementation sees that all branches begin with a different character and thus there's no need to fall through from one case to the next. However, while the alternation itself doesn't require backtracking (since if any branch matches, it's then known impossible for another to), it's possible that a branch expression itself employs backtracking (e.g. if a branch contains a loop). In such a case, we currently don't apply the optimization involving a switch, because the backtracking would result in a label being output in the branch's code and that label needing to be visible to later code... if the label were to be declared inside the braces/scope for the switch, that would lead to compilation failures. However, while we can't emit it as a switch (which enables the C# compiler to generate whatever efficient code it's able to, especially for larger alternations), we could still avoid cascading from branch to branch, e.g. even though we don't emit a switch, we can still make it so that failing to match a branch fails the whole alternation rather than cascading to try the next branch.

@stephentoub stephentoub added this to the 7.0.0 milestone Dec 6, 2021
@ghost
Copy link

ghost commented Dec 6, 2021

Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions
See info in area-owners.md if you want to be subscribed.

Issue Details

In the Regex source generator, we special-case alternations where every branch begins with a different char, preferring to generate a switch statement, e.g. "abc|def" yields a switch statement with a case for 'a' and a case for 'd', as the implementation sees that all branches begin with a different character and thus there's no need to fall through from one case to the next. However, while the alternation itself doesn't require backtracking (since if any branch matches, it's then known impossible for another to), it's possible that a branch expression itself employs backtracking (e.g. if a branch contains a loop). In such a case, we currently don't apply the optimization involving a switch, because the backtracking would result in a label being output in the branch's code and that label needing to be visible to later code... if the label were to be declared inside the braces/scope for the switch, that would lead to compilation failures. However, while we can't emit it as a switch (which enables the C# compiler to generate whatever efficient code it's able to, especially for larger alternations), we could still avoid cascading from branch to branch, e.g. even though we don't emit a switch, we can still make it so that failing to match a branch fails the whole alternation rather than cascading to try the next branch.

Author: stephentoub
Assignees: -
Labels:

area-System.Text.RegularExpressions, tenet-performance

Milestone: 7.0.0

@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Dec 6, 2021
@joperezr joperezr removed the untriaged New issue has not been triaged by the area owner label Dec 6, 2021
@stephentoub stephentoub modified the milestones: 7.0.0, Future Jan 24, 2022
@stephentoub stephentoub self-assigned this Jan 11, 2024
@stephentoub stephentoub removed their assignment Apr 13, 2024
@stephentoub stephentoub closed this as not planned Won't fix, can't repro, duplicate, stale Dec 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants