-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Huge performance difference in string.StartsWith and EndsWith between Linux and Windows for non Invariant, non en-US. #6074
Comments
cc: @ellismg Interesting. I don't have that exact set up handy, but on Ubuntu 14.04 with CLI 1.0.0-preview2-002853 and relatively recent rc3 builds, I see:
so much faster than what you're seeing, and actually faster than what I'm seeing on the Windows host, e.g.
But... my LANG is en-US. If I do:
and then run it again, I get numbers much closer to yours:
Not a real solution, but as an experiment, @jskeet, if you change your LANG to en-US, do you see a perf improvement? |
If the problem is the LANG setting, then what's happening is that our fast paths for ASCII StartsWith/EndsWith are specific today to en-US and Invariant (see the code here.) Unlike Windows, having to do a full linguistic StartsWith or EndsWith is slow because we have to construct some ICU searching objects which we can't cache across runs (since the object is specific to a target string you are searching for, and we don't maintain a cache of searcher objects). For 1.1 we should look at expanding the fast path to work for ASCII strings if the collation rules for the current locale don't tailor anything in the ASCII range, which would allow this fast path to be hit for locales like en-GB. We could also consider trying to re-implement IndexOf in terms of some lower level ICU primitives that we might be able to cache across calls. |
@stephentoub: Bingo! Yes, with $ LANG=en-US dotnet run I get:
Applying the same workaround to running my Noda Time tests halves the total test time, too... (That's where all this started.) |
@jskeet Another option (and I'm not sure if this is possible via the interfaces NUnit exposes, a quick glance over the source doesn't give me much hope) is to use Ordinal or OrdinalIgnoreCase, which will ignore all the ICU gunk. |
@ellismg: The checks here are deep in the bowels of NUnit - but could be fixed very easily with a patch. It's unfortunate that it's necessary, but it feels like a good practical solution to a very real issue. Will file a feature request now... |
@jskeet Thanks! I very much expect we will do the extension of the ASCII fast paths for 1.1, but we may be in a world where linguistic StartsWith and EndsWith are slower for non ASCII strings or strings where collation for ASCII characters differs (e.g tr-TR) because of how this stuff is implemented in terms of ICU, so if we can upstream general goodness changes to force ordinal comparisions when you don't need linguistic behavior that would be great. |
Yup. Would be good if every call to |
Given how disgusting and unexpected such ICU regressions are likely to be, could there be a syntax added to .net core that lets you specify ie StartsWith(arg, forceAscii:true) ? |
@wpostma: What would |
Um, forget I suggested that. :-) |
@ellismg Is this actionable for 1.1.0? |
Anyone working on this issue? If not I have time to take a look. |
@mazong1123 that would be great! Thank you. No-one is assigned, so that means no-one is working on this. I've just assigned you. |
@mazong1123 have you had a chance to look further? |
@danmosemsft sorry I'm afraid I don't have so much time for working on this issue right now. I'm busy with my day job and for .NET Core I'm only focusing on dotnet/coreclr#14646 However, I can support anyone who wants to work on this issue in my spare time. |
I have noticed this issue when running netcore code along side mono:
The above code takes 6 seconds on my PC with 2.1.300-preview1-008174, and 370 milliseconds on mono. Interestingly, when I change my LANG to en-US, it takes 7 seconds (my default is en_AU.UTF-8). The fix as mentioned above is to add Ordinal as the string comparison option:
This brings it back down to an acceptable 31 milliseconds on my PC. |
Just scratched my head for hours on this. Was running a console app with a 30,000 record iteration with multiple CentOS 7.7-1908
|
@mikes-gh which version of .NET Core are you using? |
|
I think the issue should be re-reviewed since we moved to ICU on Windows. |
dotnet/coreclr#26759 and dotnet/coreclr#26621 combined together have fixed this issue. The PRs were merged in September and October 2019 so they did not make it to .NET Core 3.1, but they are part of the 5.0 release. I've run these benchmarks from the dotnet/performance repository to verify that: git clone https://github.com/dotnet/performance
cd performance
python3 ./scripts/benchmarks_ci.py -f netcoreapp3.1 --filter '*StringSearch*Is*'
python3 ./scripts/benchmarks_ci.py -f netcoreapp5.0 --filter '*StringSearch*Is*' BenchmarkDotNet=v0.12.1, OS=ubuntu 18.04
Intel Xeon CPU E5-1650 v4 3.60GHz, 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=5.0.100-preview.6.20318.15
[Host] : .NET Core 3.1.5 (CoreCLR 4.700.20.26901, CoreFX 4.700.20.27001), X64 RyuJIT
Job-OKAAXU : .NET Core 3.1.5 (CoreCLR 4.700.20.26901, CoreFX 4.700.20.27001), X64 RyuJIT
Job-QELOCB : .NET Core 5.0.0 (CoreCLR 5.0.20.30506, CoreFX 5.0.20.30506), X64 RyuJIT
The |
Very simple code:
The project.json is mostly the default from
dotnet new
:On both Linux (Ubuntu 15.10 and Ubuntu 16.04) the output is something like:
On Windows, on the same hardware, it's:
Version info: .NET Command Line Tools (1.0.0-preview1-002702) on Windows and Ubuntu 15.10; 1.0.0-preview2-002886 on Ubuntu 16.04.
Note that under NUnit, every equality assertion involves three
EndsWith
calls, making NUnit assertions basically horrifically expensive on Linux...The text was updated successfully, but these errors were encountered: