Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

String.StartsWith slower on Linux with some characters #30716

Closed
kevingosse opened this issue Aug 29, 2019 · 20 comments
Closed

String.StartsWith slower on Linux with some characters #30716

kevingosse opened this issue Aug 29, 2019 · 20 comments
Assignees
Labels
Milestone

Comments

@kevingosse
Copy link
Contributor

string.StartsWith on Linux becomes 2 orders of magnitude slower when the string contains a dash (-).

On Linux:

BenchmarkDotNet=v0.11.3, OS=centos 7
Intel Xeon CPU E5-2630L v3 1.80GHz, 2 CPU, 32 logical and 16 physical cores
  [Host]     : .NET Core 3.0.0-preview8-28405-07 (CoreCLR 4.700.19.37902, CoreFX 4.700.19.40503), 64bit RyuJIT
  Job-UBBGCZ : .NET Core 3.0.0-preview8-28405-07 (CoreCLR 4.700.19.37902, CoreFX 4.700.19.40503), 64bit RyuJIT

Runtime=Core  Toolchain=netcoreapp3.0

         Method |        Mean |      Error |     StdDev |
--------------- |------------:|-----------:|-----------:|
     StartsWith |    35.79 ns |  0.1069 ns |  0.0948 ns |
 StartsWithDash | 4,411.13 ns | 35.0054 ns | 29.2311 ns |

On Windows (only for reference, the hardware is not the same):

BenchmarkDotNet=v0.11.3, OS=Windows 10.0.18362
Intel Xeon CPU E3-1271 v3 3.60GHz, 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=3.0.100-preview8-013656
  [Host]     : .NET Core 3.0.0-preview8-28405-07 (CoreCLR 4.700.19.37902, CoreFX 4.700.19.40503), 64bit RyuJIT
  DefaultJob : .NET Core 3.0.0-preview8-28405-07 (CoreCLR 4.700.19.37902, CoreFX 4.700.19.40503), 64bit RyuJIT


         Method |     Mean |     Error |    StdDev |
--------------- |---------:|----------:|----------:|
     StartsWith | 69.42 ns | 0.2523 ns | 0.2236 ns |
 StartsWithDash | 69.47 ns | 1.4200 ns | 1.6904 ns |

Benchmark code:

    public class StartsWithBenchmark
    {
        private string _str1 = "aaaaaaaaaz";
        private string _str2 = "aaaaaaaaa-";

        [Benchmark]
        public bool StartsWith()
        {
            return _str1.StartsWith("i");
        }

        [Benchmark]
        public bool StartsWithDash()
        {
            return _str2.StartsWith("i");
        }
    }

The performance issue does not occur if using ordinal comparison.

@kevingosse
Copy link
Contributor Author

kevingosse commented Aug 29, 2019

It seems to affect characters outside of the [9], [11-12], [32-38], [40-44], and [47-126] ranges.

@danmoseley
Copy link
Member

@krwq that's an extraordinary multiplier. I am curious what this is about.

@krwq
Copy link
Member

krwq commented Aug 29, 2019

Yes this is interesting. @kevingosse what's you current culture set to? Looking at the code it behaves differently depending on that.

@tarekgh do you know if we (or ICU) have any special optimizations for some ranges?

@tarekgh
Copy link
Member

tarekgh commented Aug 29, 2019

I believe Linux treat '-' with some sort weights which can affect the operation with different cultures. on Windows, '-' is treated as normal ASCII and we optimize the call as ordinal at that time.

@jkotas
Copy link
Member

jkotas commented Aug 29, 2019

do you know if we (or ICU) have any special optimizations for some ranges?

We do have special optimization for specific ranges. Look for IsFastSort in https://github.com/dotnet/coreclr/blob/52651008a5fb72eb678467cd2eb42aac4e5334e8/src/System.Private.CoreLib/shared/System/Globalization/CompareInfo.Unix.cs#L242 . - prevents us from taking this fast path because of https://github.com/dotnet/coreclr/blob/ab2f9caaf35c96d029b96aa171ee65d04253cf7c/src/utilcode/util_nodependencies.cpp#L910

This "optimization" is used on Unix only. We used to have it on Windows as well, but removed it there because of it was not very helpful.

One way to fix this problem is to remove this "optimization" on Unix too (PR: dotnet/coreclr#24400) and replace it with fast path and slow path with smaller degradation.

@tarekgh
Copy link
Member

tarekgh commented Aug 29, 2019

we need to be careful here as there is some characters in ASCII range is treated differently than Windows in ICU. trying to change that will start deviate us from the ICU sorting behavior and may not be a good thing to do.

@tarekgh
Copy link
Member

tarekgh commented Aug 29, 2019

Also, in such corner cases, it is usually better to use the ordinal operations and not the linguistic operations.

@kevingosse
Copy link
Contributor Author

kevingosse commented Aug 29, 2019

Yes this is interesting. @kevingosse what's you current culture set to? Looking at the code it behaves differently depending on that.

en-US. With other cultures the performance is bad with any set of character but this is a known issue since 1.0 (https://github.com/dotnet/coreclr/issues/5612)

Also, in such corner cases, it is usually better to use the ordinal operations and not the linguistic operations.

Yes, we're currently adding StringComparer.Ordinal everywhere in our codebase as this performance drop is a blocker for us. The problem is that developers on .net framework or .net core/windows aren't really used to adding StringComparer.Ordinal, since the performance of culture-based string comparisons is pretty good on Windows (so it's mostly seen as a micro optimization). And then it breaks when we start using those bits on Linux.
I wish there was a way to default all string comparisons to ordinal. There is the invariant mode, but it also breaks explicit culture-based comparisons, which we need in some parts of our code.

@tarekgh
Copy link
Member

tarekgh commented Aug 29, 2019

Note that this issue happens only in case of using some specific characters in the ASCII range while the whole string is ASCII too. These characters are mainly control characters which not really used in main stream cases. The only characters that is interesting are the - and '. So we are talking about a corner case here and not really main stream scenarios.

You can look at the special characters here https://github.com/dotnet/coreclr/blob/ab2f9caaf35c96d029b96aa171ee65d04253cf7c/src/utilcode/util_nodependencies.cpp#L856

@mnmr
Copy link

mnmr commented Aug 30, 2019

@tarekgh It’s not truly helpful to simply label this as a corner case. People expect consistent performance and if you have a method that is suddenly magnitudes slower due to certain characters being used it’s like dropping time bombs into peoples code.

@tarekgh
Copy link
Member

tarekgh commented Aug 30, 2019

People expect consistent performance and if you have a method that is suddenly magnitudes slower due to certain characters being used it’s like dropping time bombs into peoples code.

How this is different than inserting non ASCII character in the string even on Windows? you will get the same performance hit. Again, this is why we provide Ordinal option to let users control the behavior.

The issue here is the correctness or sort behavior against the perf. we can fix the perf by not specializing the hyphen on Linux but we should be clear this for sure can be a problem for some expected sort behavior. we can try to find a middle ground here which may work but we need to detect the cases we need to allow ordinal even when hyphen exist. just need some more investigation.

@tarekgh
Copy link
Member

tarekgh commented Aug 30, 2019

Looking more, looks on Windows we don't even try to optimize for ASCII cases and always call the underlying OS. so what @jkotas mentioned before looks reasonable to try.

@filipnavara
Copy link
Member

On a related note, we are running dotnet/performance benchmarks on CoreCLR and Mono/netcore with LLVM JIT. I was investigating the spots where Mono performed especially poorly. One of them is the Perf_String.Contains benchmark with StringComparison.CurrentCultureIgnoreCase. It eventually hits fast-sort the optimization here.

The difference in the benchmark between Mono (which does not have this code path) and CoreCLR (which has it) was quite enormous:

Slower diff/base Base Median (ns) Diff Median (ns) Modality
System.Tests.Perf_String.StartsWith(size: 1000) 714.91 37.47 26784.84
System.Tests.Perf_String.StartsWith(size: 100) 311.68 26.41 8233.04
System.Tests.Perf_String.Contains(text: "This is a very nice sentence", value: " 41.56 190.40 7913.76
System.Tests.Perf_String.Contains(text: "This is a very nice sentence", value: " 27.03 276.06 7462.35

However, the numbers are actually lying because the IsFastSort/IsAscii optimization is computed during the benchmark initialization and not accounted for in the benchmark run. To accurately measure the impact we would likely have to get rid of the cache in the object header first and see what happens. Another question is whether people run StartsWith/Contains on the same string more than once.

@kevingosse
Copy link
Contributor Author

kevingosse commented Sep 2, 2019

Looking more, looks on Windows we don't even try to optimize for ASCII cases and always call the underlying OS. so what @jkotas mentioned before looks reasonable to try.

Unless I misunderstand, @jkotas's suggestion assumes that the perf disparity comes from the IsFastSort check before fallbacking to the slow path. I think the whole problem is how incredibly slow the ICU fallback actually is.

I've done a few more benchmarks to get more data. I'm measuring the cost of return string.Concat(new string('a', length), "-").StartsWith("i"); (length being the number appended to the name of the benchmark). I'm creating a new string every time to make sure we don't fall into the corner-case mentioned by @filipnavara. The CreateString benchmark just measures the cost of the creation of the string, to make sure it's negligible.

                Method |         Mean |      Error |     StdDev |
---------------------- |-------------:|-----------:|-----------:|
        CreateString_1 |     34.68 ns |  0.0902 ns |  0.0753 ns |
      CreateString_800 |    452.03 ns |  1.4944 ns |  1.3979 ns |
          StartsWith_1 |     76.85 ns |  0.4202 ns |  0.3725 ns |
        StartsWith_800 |  1,095.46 ns |  4.7791 ns |  4.4703 ns |
 StartsWithOrdinal_800 |  1,085.14 ns |  6.9067 ns |  6.1226 ns |
      StartsWithDash_1 |  4,269.29 ns | 29.9301 ns | 27.9966 ns |
     StartsWithDash_80 |  7,548.37 ns | 57.8701 ns | 51.3003 ns |
    StartsWithDash_160 | 10,363.72 ns | 45.1756 ns | 42.2573 ns |
    StartsWithDash_800 | 34,426.77 ns | 74.0593 ns | 65.6516 ns |

We see that StartsWith_800 and StartsWithOrdinal_800 are pretty much as fast (10ns absolute difference, I'd say 20ns if we take the worst of the confidence interval). So the cost of the prior IsFastSort is negligible.

On the other hand, we see how the performance degrades with the length of the string. We have an initial 4µs cost whenever we take the slow path, no matter the length of the string. Then we see the cost increasing linearly (3µs for 80, 6µs for 160, 30µs for 800).

So we really have two issues:

  • The initial cost of 4µs is ridiculously expensive. I see there has been a lot of optimizations on the ICU path, but clearly there's still something wrong
  • The performance shouldn't degrade linearly with the length of the string, since the first character doesn't match (so StartsWith should exit immediately). Looking at the implementation, I'd say it's because GlobalizationNative_StartsWith is implemented with the same logic as an IndexOf operation, and therefore will inspect the full string until the pattern is found. The ICU API is a mess, but I'm sure we can find a more efficient way

@filipnavara
Copy link
Member

The performance shouldn't degrade linearly with the length of the string, since the first character doesn't match (so StartsWith should exit immediately). Looking at the implementation, I'd say it's because GlobalizationNative_StartsWith is implemented with the same logic as an IndexOf operation, and therefore will inspect the full string until the pattern is found. The ICU API is a mess, but I'm sure we can find a more efficient way

I am sure there are some edge cases I didn't consider but it should be relatively easy to implement GlobalizationNative_StartsWith in terms of ucol_strcoll call.

@adamsitnik
Copy link
Member

I've profiled following app:

class Program
{
    static void Main()
    {
        while (true)
        {
            Consume(string.Concat(new string('a', 512), "-").StartsWith("i"));
        }
    }

    [MethodImpl(MethodImplOptions.NoInlining)]
    private static void Consume<T>(in T _) { }
}

with PerfCollect and the most expensive methods are:

Name Exc % Exc Exc Ct Inc % Inc
libicui18n.so.60.2!unknown 22.1 6,811 6,811 22.2 6,834
libicui18n.so.60.2!icu_60::CollationElementIterator::next 14.2 4,383 4,383 14.3 4,388
libicui18n.so.60.2!icu_60::UCollationPCE::processCE 9.0 2,755 2,755 9.0 2,757
libicui18n.so.60.2!icu_60::UCollationPCE::nextProcessed 8.5 2,615 2,615 8.5 2,617
libicui18n.so.60.2!usearch_search_60 7.0 2,154 2,154 7.0 2,157
libicui18n.so.60.2!icu_60::UTF16CollationIterator::handleNextCE32 6.5 2,005 2,005 6.5 2,006
libicuuc.so.60.2!u_strlen_60 6.5 1,984 1,984 6.5 1,986
libc-2.27.so!unknown 3.3 1,029 1,029 3.4 1,054
libicui18n.so.60.2!icu_60::CollationElementIterator::getOffset 2.5 777 777 2.5 778
libicuuc.so.60.2!unknown 1.9 596 596 1.9 597
libicui18n.so.60.2!icu_60::UTF16CollationIterator::getOffset 1.7 513.291 514 1.7 514.291
libcoreclr.so!StringObject::InternalCheckHighChars 1.3 399 399 1.3 399
libicui18n.so.60.2!usearch_openFromCollator_60 1.1 349 349 1.1 349
libc-2.27.so!cfree 0.9 280 280 0.9 280
libc-2.27.so!malloc 0.9 263 263 0.9 263
libicui18n.so.60.2!ucol_primaryOrder_60 0.7 214 214 0.7 214
libicui18n.so.60.2!icu_60::PCEBuffer::reset 0.7 204 204 0.7 204
libicui18n.so.60.2!ucol_tertiaryOrder_60 0.6 196 196 0.6 197
libicui18n.so.60.2!ucol_secondaryOrder_60 0.6 190 190 0.6 190
libicuuc.so.60.2!icu_60::CharString::append 0.5 165 165 0.5 165

So IsFastSort is not a problem, but the ICU part is.

@adamsitnik
Copy link
Member

adamsitnik commented Sep 2, 2019

@tarekgh is there any reason why we should not implement StartsWith in the following way:

bool StartsWith(string source, string prefix, StringComparison stringComparison)
  => CompareString(source.AsSpan(start: 0, length: prefix.Length), prefix, stringComparison);

Edit: nevermind, I've got an answer from @kevingosse in dotnet/coreclr#26481 ;)

@adamsitnik adamsitnik self-assigned this Sep 3, 2019
@tarekgh
Copy link
Member

tarekgh commented Sep 3, 2019

is there any reason why we should not implement StartsWith in the following way:

@kevingosse is right this will not work except for ordinal cases.

@tarekgh
Copy link
Member

tarekgh commented Sep 3, 2019

@kevingosse thanks for you measurements. I believe @adamsitnik PR is going to help some with the StartsWith scenario.

@adamsitnik
Copy link
Member

dotnet/coreclr#26759 and dotnet/coreclr#26621 combined together have fixed this problem.

Fun fact: while working on improving the performance of StartsWith on Linux we have found and fixed an 18 year old bug in ICU unicode-org/icu#840 ;)

@msftgits msftgits transferred this issue from dotnet/corefx Feb 1, 2020
@msftgits msftgits added this to the 5.0 milestone Feb 1, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 12, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

9 participants