-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use StartsWith(..., OrdinalIgnoreCase) in RegexCompiler / source generator #66339
Conversation
Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions Issue DetailsFixes #66324 When we encounter a sequence of sets representing case-insensitive ASCII, we can simplify the code generated to just call StartsWith, which both makes it more readable but also takes advantage of the new JIT optimization to lower that into efficient vectorized comparisons based on the supplied literal. This also cleans up some formatting in the source generator emitted code to make things much more concise and less noisy. Example: if ((uint)slice.Length < 7 ||
((slice[0] | 0x20) != 'h') || // Match a character in the set [Hh].
((slice[1] | 0x20) != 't') || // Match a character in the set [Tt] exactly 2 times.
((slice[2] | 0x20) != 't') ||
((slice[3] | 0x20) != 'p')) // Match a character in the set [Pp].
{
return false; // The input didn't match.
}
// Match the string "://".
{
if (!global::System.MemoryExtensions.StartsWith(slice.Slice(4), "://"))
{
return false; // The input didn't match.
}
} and with this PR looks like: if ((uint)slice.Length < 7 ||
!global::System.MemoryExtensions.StartsWith(slice, "http", global::System.StringComparison.OrdinalIgnoreCase) || // Match the string "http" (ordinal case-insensitive)
!global::System.MemoryExtensions.StartsWith(slice.Slice(4), "://")) // Match the string "://".
{
return false; // The input didn't match.
}
|
Could it instead generate // Match the string "http://".
if (!global::System.MemoryExtensions.StartsWith(slice, "http://", global::System.StringComparison.OrdinalIgnoreCase))
{
return false; // The input didn't match.
} And let |
It easily could. Happy to change that if @EgorBo says StartsWith will do the right thing and this will definitively be better. |
My question more or less follows along the same lines that @MihaZupan question was doing. Is the main reason why in your example it doesn't look for the whole |
Yes
It's easy. I just didn't under the assumption that it would be worse. If Egor tells me it'll be better, it'll be a couple of lines to switch it. |
I'd say it should 🙂 #66095 handles 0-32 chars strings. But I'm planning to file a follow-up PR to extend that to 64 (or even 128) for hot blocks. |
…rator When we encounter a sequence of sets representing case-insensitive ASCII, we can simplify the code generated to just call StartsWith, which both makes it more readable but also takes advantage of the new JIT optimization to lower that into efficient vectorized comparisons based on the supplied literal. This also cleans up some formatting in the source generator emitted code to make things much more concise and less noisy.
d3faca6
to
5622dc2
Compare
Presumably it can omit the length check now - at least, in the case where there's only one call to StartsWith? |
I've left it in because it's more common than not to have a series of calls all guarded by the same length check, and it wasn't worth special-casing just the case where the only thing guarded is a single StartsWith. I would hope in that case the JIT gets rid of the redundant length check. |
I didn't actually see any perf wins from this locally; we'll see if the perf lab disagrees. However, I didn't see any regressions, either, which means we can get the simpler/smaller/more readable code for the same throughput, which is a win, so I went ahead and merged it. |
Fixes #66324
Depends on #66095
Depends on #61048 (we have a partial temp solution in place, but that will provide the full one)
When we encounter a sequence of sets representing case-insensitive ASCII, we can simplify the code generated to just call StartsWith, which both makes it more readable but also takes advantage of the new JIT optimization to lower that into efficient vectorized comparisons based on the supplied literal.
This also cleans up some formatting in the source generator emitted code to make things much more concise and less noisy.
Example:
In
http://\w+.com
withRegexOptions.IgnoreCase
, the generated code for the "http://" part had looked like:and with this PR looks like:
I've not measured perf yet and will wait for that until #66095 is merged.