Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Analyzer proposal: Consider using String.Equals instead of String.Compare #45552

Closed
xtqqczze opened this issue Nov 20, 2020 · 49 comments
Closed
Assignees
Labels
api-approved API was approved in API review, it can be implemented area-System.Globalization code-analyzer Marks an issue that suggests a Roslyn analyzer code-fixer Marks an issue that suggests a Roslyn code fixer
Milestone

Comments

@xtqqczze
Copy link
Contributor

xtqqczze commented Nov 20, 2020

Category: Microsoft.Usage

Fix is breaking or non-breaking: Non-breaking

Cause

This rule locates calls to Compare where the result is used to check for equality, and suggests using Equals instead, to improve readability.

Rule description

When Compare is used to check if the result is equal to 0, the call can be safely substituted with Equals.

Overload Suggested fix
String.Compare(string) String.Equals(string)
String.Compare(string, false) String.Equals(string, StringComparison.CurrentCulture)
String.Compare(string, true) String.Equals(string, StringComparison.CurrentCultureIgnoreCase)
String.Compare(string, StringComparison) String.Equals(string, StringComparison)

Examples

Code with Diagnostic

string.Compare(x, y) == 0
string.Compare(x, y) != 0
string.Compare(x, y, false) == 0
string.Compare(x, y, true) == 0
string.Compare(x, y, StringComparison.CurrentCulture) == 0

Code with Fix

string.Equals(x, y)
!string.Equals(x, y)
string.Equals(x, y, StringComparison.CurrentCulture)
string.Equals(x, y, StringComparison.CurrentCultureIgnoreCase)
string.Equals(x, y, StringComparison.CurrentCulture)

When to suppress warnings

It's safe to suppress a violation of this rule if improving code readability is not a concern.

Additional context

  • Inspiration came from RCS1235: Optimize method call.
  • Proposal was modelled on documentation for CA2249: Consider using String.Contains instead of String.IndexOf.
@Evangelink
Copy link
Member

@mavasani not sure who could confirm there is indeed a performance issue.

@xtqqczze
Copy link
Contributor Author

xtqqczze commented Dec 2, 2020

Perhaps this a more of a code style issue?

@Therzok
Copy link
Contributor

Therzok commented Dec 3, 2020

Compare and Equals

From the looks of it, between the two, Equals fast-paths on a .Length check, Compare does a full string comparison.

@mavasani
Copy link

mavasani commented Dec 3, 2020

This is a .NET API usage analyzer suggestion. @jeffhandley @carlossanlop @buyaa-n can this please be ported to dotnet/runtime repo and triaged?

@buyaa-n buyaa-n transferred this issue from dotnet/roslyn-analyzers Dec 3, 2020
@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added area-System.Globalization untriaged New issue has not been triaged by the area owner labels Dec 3, 2020
@ghost
Copy link

ghost commented Dec 3, 2020

Tagging subscribers to this area: @tarekgh, @safern, @krwq
See info in area-owners.md if you want to be subscribed.

Issue Details

Code with Diagnostic

string.Compare(x, y, StringComparison.CurrentCulture) == 0

Code with Fix

string.Equals(x, y, StringComparison.CurrentCulture)

Additional context

See RCS1235

Author: xtqqczze
Assignees: -
Labels:

area-System.Globalization, untriaged

Milestone: -

@buyaa-n buyaa-n added code-analyzer Marks an issue that suggests a Roslyn analyzer code-fixer Marks an issue that suggests a Roslyn code fixer api-suggestion Early API idea and discussion, it is NOT ready for implementation labels Dec 3, 2020
@tarekgh
Copy link
Member

tarekgh commented Dec 3, 2020

I am not sure why we need this? can you explain more why we need that? both are valid options and both are preferable as long as providing the StringComparison options. I don't think we need that in the analyzers.

@tarekgh tarekgh removed the untriaged New issue has not been triaged by the area owner label Dec 3, 2020
@tarekgh tarekgh added this to the Future milestone Dec 3, 2020
@buyaa-n
Copy link
Contributor

buyaa-n commented Dec 7, 2020

I am not sure why we need this? can you explain more why we need that? both are valid options and both are preferable as long as providing the StringComparison options. I don't think we need that in the analyzers.

Sure, both are valid options, as @Therzok mentioned above we expect it will give a bit of performance benefit. From that point isn't string.Equals(x, y, StringComparison.CurrentCulture) is preferable than string.Compare(x, y, StringComparison.CurrentCulture) == 0? What you think @tarekgh

@tarekgh
Copy link
Member

tarekgh commented Dec 7, 2020

it will give a bit of performance benefit

Do we have any benchmark numbers prove that?

I am not sure we add any analyzer rules recommend the faster methods. do we?

@buyaa-n
Copy link
Contributor

buyaa-n commented Dec 7, 2020

Do we have any benchmark numbers prove that?

not sure, i don't think so

I am not sure we add any analyzer rules recommend the faster methods. do we?

We do, for example CA1836: Prefer IsEmpty over Count == 0

@tarekgh
Copy link
Member

tarekgh commented Dec 7, 2020

not sure, i don't think so

Ok, prove it is worth it to add such rule. I am not expecting there will be any perf difference and if it does, it will be something ignorable.

@xtqqczze
Copy link
Contributor Author

xtqqczze commented Dec 7, 2020

    string x = "ssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss";
    string y = "ssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss";

    [Benchmark(Baseline = true)]
    public bool Compare()
    {
        return string.Compare(x, y, StringComparison.CurrentCulture) == 0;
    }

    [Benchmark]
    public bool Equals()
    {
        return string.Equals(x, y, StringComparison.CurrentCulture);
    }
| Method |     Mean |     Error |    StdDev |   Median | Ratio | RatioSD | Gen 0 | Gen 1 | Gen 2 | Allocated | Code Size |
|------- |---------:|----------:|----------:|---------:|------:|--------:|------:|------:|------:|----------:|----------:|
| Compare| 2.908 ns | 0.1129 ns | 0.3293 ns | 3.047 ns |  1.00 |    0.00 |     - |     - |     - |         - |     457 B |
|  Equals| 1.872 ns | 0.1089 ns | 0.3124 ns | 1.914 ns |  0.65 |    0.13 |     - |     - |     - |         - |     448 B |

x and y and both 1872 chars.

@tarekgh
Copy link
Member

tarekgh commented Dec 7, 2020

@xtqqczze thanks. could you add cases with short strings? and with non-ASCII strings? I am afraid the test case you have used is not the main stream case uses here.

@xtqqczze
Copy link
Contributor Author

xtqqczze commented Dec 7, 2020

    string x = "🌊🜷🣫🹇🬖🭗🙔🧪🛞🫜💛🃤🵷🊅🚢🐄🇁🂀🂗🳞🪝🔗🜞🚖🅜🟓🚯🳫🄩🌂🻙🁎🎂🮟🃭🊉🍆🊙🥅🩯🁴🌃💵🱻👍🔗🅿🝃🮆🺈🆂🹃🱳🬌🃡🶺📽🙫🍙🦂🔾💛🡻🉽";
    string y = "🌊🜷🣫🹇🬖🭗🙔🧪🛞🫜💛🃤🵷🊅🚢🐄🇁🂀🂗🳞🪝🔗🜞🚖🅜🟓🚯🳫🄩🌂🻙🁎🎂🮟🃭🊉🍆🊙🥅🩯🁴🌃💵🱻👍🔗🅿🝃🮆🺈🆂🹃🱳🬌🃡🶺📽🙫🍙🦂🔾💛🡻🉽";
| Method |     Mean |     Error |    StdDev |   Median | Ratio | RatioSD | Gen 0 | Gen 1 | Gen 2 | Allocated | Code Size |
|------- |---------:|----------:|----------:|---------:|------:|--------:|------:|------:|------:|----------:|----------:|
| Compare| 3.165 ns | 0.1077 ns | 0.3037 ns | 3.251 ns |  1.00 |    0.00 |     - |     - |     - |         - |     457 B |
|  Equals| 2.094 ns | 0.0959 ns | 0.2797 ns | 2.231 ns |  0.67 |    0.12 |     - |     - |     - |         - |     448 B |

x and y are both 64 chars.

@tarekgh
Copy link
Member

tarekgh commented Dec 7, 2020

@xtqqczze thanks again. I am fine to add the analyzer rule and have it mention this preference is for perf reason.

side point, I am really surprised with this difference. I think this suggest String.Compare can be optimized. CC @adamsitnik

@xtqqczze
Copy link
Contributor Author

xtqqczze commented Dec 7, 2020

    string x = "🌊🡻";
    string y = "🌊🡻";
| Method |     Mean |     Error |    StdDev |   Median | Ratio | RatioSD | Gen 0 | Gen 1 | Gen 2 | Allocated | Code Size |
|------- |---------:|----------:|----------:|---------:|------:|--------:|------:|------:|------:|----------:|----------:|
| Compare| 3.194 ns | 0.1102 ns | 0.3198 ns | 3.287 ns |  1.00 |    0.00 |     - |     - |     - |         - |     457 B |
|  Equals| 1.826 ns | 0.1056 ns | 0.3047 ns | 1.915 ns |  0.58 |    0.12 |     - |     - |     - |         - |     448 B |

@danmoseley
Copy link
Member

Some subtle codegen thing maybe? The codepaths are almost identical

https://source.dot.net/#System.Private.CoreLib/String.Comparison.cs,215
https://source.dot.net/#System.Private.CoreLib/String.Comparison.cs,640

Is there a similar difference for other StringComparison types?

@xtqqczze
Copy link
Contributor Author

xtqqczze commented Dec 8, 2020

    string x = "ff";
    string y = "ff";
| Method |     Mean |     Error |    StdDev |   Median | Ratio | RatioSD | Gen 0 | Gen 1 | Gen 2 | Allocated | Code Size |
|------- |---------:|----------:|----------:|---------:|------:|--------:|------:|------:|------:|----------:|----------:|
| Compare| 2.705 ns | 0.0978 ns | 0.2305 ns | 2.796 ns |  1.00 |    0.00 |     - |     - |     - |         - |     457 B |
|  Equals| 1.892 ns | 0.0842 ns | 0.2484 ns | 2.005 ns |  0.72 |    0.12 |     - |     - |     - |         - |     448 B |

@xtqqczze
Copy link
Contributor Author

xtqqczze commented Dec 8, 2020

What is surprising, is that the absolute difference in time measurement is considerably greater when the strings are not the same:

    string x = "ff";
    string y = "fx";
| Method |     Mean |    Error |   StdDev | Ratio | RatioSD | Gen 0 | Gen 1 | Gen 2 | Allocated | Code Size |
|------- |---------:|---------:|---------:|------:|--------:|------:|------:|------:|----------:|----------:|
| Compare| 88.83 ns | 1.809 ns | 1.692 ns |  1.00 |    0.00 |     - |     - |     - |         - |     457 B |
|  Equals| 82.60 ns | 1.668 ns | 2.284 ns |  0.92 |    0.03 |     - |     - |     - |         - |     448 B |

@xtqqczze
Copy link
Contributor Author

xtqqczze commented Dec 8, 2020

@xtqqczze
Copy link
Contributor Author

xtqqczze commented Dec 8, 2020

Equals method isn't always quicker if the strings are different:

    string x = "ffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff";
    string y = "fffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffx";
|  Method |     Mean |   Error |   StdDev | Ratio | RatioSD | Gen 0 | Gen 1 | Gen 2 | Allocated | Code Size |
|-------- |---------:|--------:|---------:|------:|--------:|------:|------:|------:|----------:|----------:|
| Compare | 218.1 ns | 4.33 ns |  7.59 ns |  1.00 |    0.00 |     - |     - |     - |         - |     457 B |
|  Equals | 223.5 ns | 4.52 ns | 11.50 ns |  1.02 |    0.06 |     - |     - |     - |         - |     448 B |

@adamsitnik
Copy link
Member

adamsitnik commented Dec 8, 2020

I think this suggest String.Compare can be optimized

I agree with @tarekgh .

I think it would be best to fix the perf issue in dotnet/runtime and simply have no perf issue and no need for the analyzer.

@xtqqczze could you please create a new issue in dotnet/runtime with the benchmark code and results attached? If possible, please do provide details about OS version, .NET Version, and culture used.

@xtqqczze
Copy link
Contributor Author

xtqqczze commented Dec 8, 2020

@adamsitnik Looking at the disassembly, I think performance of Compare would be covered by #12093 and #402.

In any case, I think an analyzer still makes sense as Equals is clearer as to the intent of the code. If we want to compare for equality we do not require in an integer indicating position in the sort order, a bool is all that is needed.

@buyaa-n
Copy link
Contributor

buyaa-n commented Dec 8, 2020

Sorry, i forgot to mentioned that I agree with @xtqqczze for the benefit of the rule:

an analyzer still makes sense as Equals is clearer as to the intent of the code.

I would prefer Equals over Compare, but it is just my personal opinion

I'll leave it to you to decide.

I am not the one who will decided it 😄, the final decision will be made in the API review, it might be declined too

@tarekgh
Copy link
Member

tarekgh commented Dec 8, 2020

the final decision will be made in the API review

You have to represent the case for the design committee. I mean you need to go and sell that there :-) of course you already know my recommendation here.

@buyaa-n
Copy link
Contributor

buyaa-n commented Dec 8, 2020

Yep, i got your point, it was a good discussion before API review, thanks!

@xtqqczze xtqqczze changed the title Prefer string.Equals to Compare Analyzer proposal: Prefer string.Equals to Compare Dec 10, 2020
@xtqqczze xtqqczze changed the title Analyzer proposal: Prefer string.Equals to Compare Analyzer proposal: Consider using String.Equals instead of String.Compare Dec 10, 2020
@xtqqczze
Copy link
Contributor Author

I've updated the issue description to propose a new analyzer in the Microsoft.Usage category.

@Evangelink
Copy link
Member

@xtqqczze You are mentionning that == 0 can be replaced by Equals but != 0 could also be replaced by not Equals.

@xtqqczze
Copy link
Contributor Author

@xtqqczze You are mentionning that == 0 can be replaced by Equals but != 0 could also be replaced by not Equals.

I've added some more examples to make this explicit.

@tarekgh
Copy link
Member

tarekgh commented Dec 10, 2020

One last question here.

does the analyzer will flag a code like:

int res = String.Compare(...);
if (res == 0)
{
   // ...
}

if it does, that will be very wrong except if analyzer will check the reset of the code that the res value is not used as res > 0 or res < 0.

@mavasani
Copy link

We can do either of the following:

  1. Only flag when == check is done in the same expression as Compare OR
  2. Track that res in above case only has single reference which compares it with 0

@pgovind did an implementation that matches (2) for CA2249: https://github.com/dotnet/roslyn-analyzers/blob/6dea9b867e32607bc02f958efe46d4b2d1220e45/src/NetAnalyzers/Core/Microsoft.NetCore.Analyzers/Runtime/PreferStringContainsOverIndexOfAnalyzer.cs#L19

@tarekgh
Copy link
Member

tarekgh commented Dec 10, 2020

@mavasani your proposal is reasonable. if yo go with #1 make sure this will not include.

if ((res = string.Compare(....)) == 0)
{
}

@buyaa-n
Copy link
Contributor

buyaa-n commented Dec 10, 2020

Sure @tarekgh those scenarios shouldn't be flagged, i would vote for the #1 from @mavasani's proposal

@xtqqczze
Copy link
Contributor Author

int res = String.Compare(...);
if (res == 0)
{
   // ...
}

I think it would be worthwhile to have a separate analyzer to flag redundant assignment, like RCS1212.

If we had such an analyzer, rules like CA2249 and the rule proposed in this issue could use the rule as a pre-processor to improve detection of violations.

@mavasani
Copy link

I think it would be worthwhile to have a separate analyzer to flag redundant assignment, like RCS1212.

@xtqqczze The assignment above is not redundant, RCS1212 seems to be worded incorrectly, it should probably say inline variable (which we already provide as a refactoring in the VS IDE). Also note that we do have https://docs.microsoft.com/dotnet/fundamentals/code-analysis/style-rules/ide0059 for real redundant assignments where the value written is never read on any control flow path.

@mavasani
Copy link

@buyaa-n Sounds good to me. I think it will cover most realistic user code.

@terrajobst terrajobst added api-approved API was approved in API review, it can be implemented and removed api-ready-for-review API is ready for review, it is NOT ready for implementation labels Feb 4, 2021
@buyaa-n buyaa-n added the help wanted [up-for-grabs] Good issue for external contributors label Feb 4, 2021
@NewellClark
Copy link
Contributor

I would like to be assigned to this, please.

@buyaa-n buyaa-n removed the help wanted [up-for-grabs] Good issue for external contributors label May 17, 2021
@Youssef1313
Copy link
Member

@buyaa-n This was already implemented. Should it be closed now?

@ghost ghost locked as resolved and limited conversation to collaborators Jul 12, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-approved API was approved in API review, it can be implemented area-System.Globalization code-analyzer Marks an issue that suggests a Roslyn analyzer code-fixer Marks an issue that suggests a Roslyn code fixer
Projects
None yet
Development

No branches or pull requests