Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use StartsWith in regex compiler / source gen for shorter strings #65222

Merged
merged 1 commit into from
Feb 15, 2022

Conversation

stephentoub
Copy link
Member

@stephentoub stephentoub commented Feb 11, 2022

The RegexCompiler and source generator currently special-case strings < 64 chars in length and unroll the loop, using a series of ulong and uint comparisons where possible. While efficient, this makes the generated code harder to read, and the source generator also has endianness issues when the compiled binary is then used on a machine with different endianness. The JIT is going to start doing such unrolling as part of StartsWith (#65288), so we can leave the optimization up to it; it'll be able to do it better, anyway, with its optimization applying to more uses, using vectors where applicable, etc.

cc: @EgorBo, @joperezr

The RegexCompiler and source generator currently special-case strings < 64 chars in length and unroll the loop, using a series of ulong and uint comparisons where possible.  While efficient, this makes the generated code harder to read, and the source generator also has endianness issues when the compiled binary is then used on a machine with different endianness.  The JIT is going to start doing such unrolling as part of StartsWith, so we can leave the optimization up to it; it'll be able to do it better, anyway, with its optimization applying to more uses, using vectors where applicable, etc.
@stephentoub stephentoub added NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) area-System.Text.RegularExpressions labels Feb 11, 2022
@stephentoub stephentoub added this to the 7.0.0 milestone Feb 11, 2022
@ghost ghost assigned stephentoub Feb 11, 2022
@ghost
Copy link

ghost commented Feb 11, 2022

Tagging subscribers to this area: @dotnet/area-system-text-regularexpressions
See info in area-owners.md if you want to be subscribed.

Issue Details

The RegexCompiler and source generator currently special-case strings < 64 chars in length and unroll the loop, using a series of ulong and uint comparisons where possible. While efficient, this makes the generated code harder to read, and the source generator also has endianness issues when the compiled binary is then used on a machine with different endianness. The JIT is going to start doing such unrolling as part of StartsWith, so we can leave the optimization up to it; it'll be able to do it better, anyway, with its optimization applying to more uses, using vectors where applicable, etc.

cc: @EgorBo, @joperezr

Author: stephentoub
Assignees: -
Labels:

* NO MERGE *, area-System.Text.RegularExpressions

Milestone: 7.0.0

@stephentoub stephentoub removed the NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) label Feb 12, 2022
Copy link
Member

@joperezr joperezr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Is the NoMerge label here just to wait while the StartsWith PR goes in so that we don't get a temporary regression?

@stephentoub
Copy link
Member Author

Is the NoMerge label here just to wait while the StartsWith PR goes in so that we don't get a temporary regression?

It had been, but then removed it and decided we can live with some temporary regressions. As it stands, the current code is incorrect with big endian in mind, so we'd need to do something here regardless... may as well delete :)

@stephentoub stephentoub merged commit a934bd8 into dotnet:main Feb 15, 2022
@stephentoub stephentoub deleted the removeunrolling branch February 15, 2022 02:02
@EgorBo
Copy link
Member

EgorBo commented Feb 17, 2022

Regression: dotnet/perf-autofiling-issues#3560 and dotnet/perf-autofiling-issues#3571 (supposed to be fixed via #65288)

@ghost ghost locked as resolved and limited conversation to collaborators Mar 19, 2022
@adamsitnik
Copy link
Member

@EgorBo it seems that the regression has not been solved yet:

https://pvscmdupload.blob.core.windows.net/reports/allTestHistory%2frefs%2fheads%2fmain_x64_ubuntu%2018.04%2fBenchmarksGame.RegexRedux_1.RunBench.html

image

Result Ratio Alloc Delta Operating System Bit Processor Name
Slower 0.83 +26667 Windows 10 X64 Intel Xeon CPU E5-1650 v4 3.60GHz
Slower 0.84 +26271 Windows 11 X64 AMD Ryzen Threadripper PRO 3945WX 12-Cores
Slower 0.82 +26007 Windows 11 X64 AMD Ryzen 9 5900X
Same 1.08 +27728 Windows 11 X64 Intel Core i5-4300U CPU 1.90GHz (Haswell)
Same 1.05 +24768 Windows 11 X64 Unknown processor
Slower 0.86 +26398 Windows 11 X64 Intel Core i7-8700 CPU 3.20GHz (Coffee Lake)
Slower 0.84 +26667 Windows 11 X64 Intel Core i9-9900T CPU 2.10GHz
Slower 0.81 +26960 alpine 3.13 X64 Intel Core i7-7700 CPU 3.60GHz (Kaby Lake)
Slower 0.82 +26862 centos 7 X64 Intel Xeon CPU E5530 2.40GHz
Slower 0.83 +26467 debian 11 X64 Intel Core i7-7700 CPU 3.60GHz (Kaby Lake)
Same 0.92 +26960 pop 20.04 X64 Intel Core i7-6600U CPU 2.60GHz (Skylake)
Slower 0.86 +26901 ubuntu 18.04 X64 Intel Xeon CPU E5-1650 v4 3.60GHz
Faster 1.16 +27056 ubuntu 18.04 X64 Intel Core i7-2720QM CPU 2.20GHz (Sandy Bridge)
Slower 0.82 +26032 alpine 3.12 Arm64 Unknown processor
Slower 0.86 +27238 debian 11 Arm64 Unknown processor
Slower 0.82 +25552 ubuntu 18.04 Arm64 Unknown processor
Slower 0.86 +26876 ubuntu 20.04 Arm64 Unknown processor
Slower 0.85 +27080 Windows 10 Arm64 Microsoft SQ1 3.0 GHz
Slower 0.85 +26960 Windows 11 Arm64 Microsoft SQ1 3.0 GHz
Slower 0.87 +12832 Windows 10 X86 Intel Xeon CPU E5-1650 v4 3.60GHz
Slower 0.85 +11541 Windows 11 X86 AMD Ryzen Threadripper PRO 3945WX 12-Cores
Slower 0.55 +13438 Windows 11 X86 Intel Core i7-10510U CPU 1.80GHz
Slower 0.87 +12377 Windows 7 SP1 X86 Intel Core i7-7700 CPU 3.60GHz (Kaby Lake)
Slower 0.87 +19700 ubuntu 18.04 Arm ARMv7 Processor rev 3 (v7l)
Same 0.90 +12652 Windows 10 Arm Microsoft SQ1 3.0 GHz
Slower 0.85 +25566 macOS Monterey 12.2.1 X64 Intel Core i7-5557U CPU 3.10GHz (Broadwell)
Slower 0.87 +26368 macOS Monterey 12.3.1 X64 Intel Core i7-4870HQ CPU 2.50GHz (Haswell)

Please let me know if this is by design or not (I'll open a new issue)

@EgorBo
Copy link
Member

EgorBo commented Apr 11, 2022

@adamsitnik thanks, I'll take a look later today why

@stephentoub
Copy link
Member Author

Some other regex benchmarks have regressed as well.

Thanks. That's the same case... this is the code we generate now at the heart of the matching for that pattern:

                            // Match with 4 alternative expressions, atomically.
                            {
                                if (slice.IsEmpty)
                                {
                                    goto CharLoopBacktrack;
                                }
                                
                                switch (slice[0])
                                {
                                    case 'T':
                                        // Match the string "om".
                                        if (!slice.Slice(1).StartsWith("om"))
                                        {
                                            goto CharLoopBacktrack;
                                        }
                                        
                                        pos += 3;
                                        slice = inputSpan.Slice(pos);
                                        break;
                                        
                                    case 'S':
                                        // Match the string "awyer".
                                        if (!slice.Slice(1).StartsWith("awyer"))
                                        {
                                            goto CharLoopBacktrack;
                                        }
                                        
                                        pos += 6;
                                        slice = inputSpan.Slice(pos);
                                        break;
                                        
                                    case 'H':
                                        // Match the string "uckleberry".
                                        if (!slice.Slice(1).StartsWith("uckleberry"))
                                        {
                                            goto CharLoopBacktrack;
                                        }
                                        
                                        pos += 11;
                                        slice = inputSpan.Slice(pos);
                                        break;
                                        
                                    case 'F':
                                        // Match the string "inn".
                                        if (!slice.Slice(1).StartsWith("inn"))
                                        {
                                            goto CharLoopBacktrack;
                                        }
                                        
                                        pos += 4;
                                        slice = inputSpan.Slice(pos);
                                        break;
                                        
                                    default:
                                        goto CharLoopBacktrack;
                                }
                            }

Prior to this change, we would have manually unrolled all of those StartsWith in the generated code.

@EgorBo
Copy link
Member

EgorBo commented Apr 11, 2022

@stephentoub @adamsitnik ok I've just figured out why this regression wasn't addressed - it's because of this VM call -

if (IsDynamicScope(moduleHnd))
{
*length = GetDynamicResolver(moduleHnd)->GetStringLiteralLength(metaTOK);
}

for Dynamic Context this API only returns string literal's length and doesn't return actual string content so my optimization just gives up. I'll file a fix

@stephentoub
Copy link
Member Author

Interesting. Even before your fix, we should be able to confirm that's the issue by trying the same benchmark with the source generator instead of RegexOptions.Compiled, since the source generator doesn't use reflection emit / dynamic methods.

@EgorBo
Copy link
Member

EgorBo commented Apr 11, 2022

Yes, I don't see a reason why it'd not work with source-gen based Regexes (except too long string fed to StartsWith, currently it's limited with 32 chars on x64 and 16 on arm64)

@stephentoub
Copy link
Member Author

currently it's limited with 32 chars on x64 and 16 on arm64

You're planning to increase that via use of wider vector operations, right? Regardless, those limits should be plenty wide for these benchmarks. We also previously only unrolled up to 64 chars.

I'll try today with the source generator.

@EgorBo
Copy link
Member

EgorBo commented Apr 12, 2022

currently it's limited with 32 chars on x64 and 16 on arm64

You're planning to increase that via use of wider vector operations, right? Regardless, those limits should be plenty wide for these benchmarks. We also previously only unrolled up to 64 chars.

I'll try today with the source generator.

Yes, I already have a branch for it EgorBo@e28e5cc but wanted to do some general clean up as part of it

@stephentoub
Copy link
Member Author

Just as confirmation, I tried the perf test that regressed above, comparing RegexOptions.Compiled with the source generator, and the source generated version is indeed much faster:

Method Mean
Compiled 29.07 ms
SourceGen 13.66 ms

so hopefully your dynamic fix will address the problem.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants