Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The result of full-width string comparison with InvariantCultureIgnoreCase has changed in .NET 5 #44789

Closed
Nukepayload2 opened this issue Nov 17, 2020 · 10 comments

Comments

@Nukepayload2
Copy link

Description

The result of String.Equals("AE", "ae", StringComparison.InvariantCultureIgnoreCase) has changed to False in .NET 5 .

Steps:

  1. Create a new .NET Core 3.1 or .NET Framework 4.8 VB console project
  2. Run the following code. The output is True .
Module Program
    Sub Main()
        Console.WriteLine(String.Equals("AE", "ae", StringComparison.InvariantCultureIgnoreCase))
    End Sub
End Module
  1. Set target framework to .NET 5.0, and run the project again.

Expected behavior
The output is True

Actual behavior
The output is False

Configuration

.NET SDK (reflecting any global.json):
Version: 5.0.100
Commit: 5044b93829

Runtime Environment:
OS Name: Windows
OS Version: 10.0.19041
OS Platform: Windows
RID: win10-x64
Base Path: C:\Program Files\dotnet\sdk\5.0.100\

Host (useful for support):
Version: 5.0.0
Commit: cf258a1

Regression?

Yes. This problem doesn't exist in .NET Framework 4.8 and .NET Core 3.1 .

Other information

Probably caused by switching to ICU. But this documentation didn't mention string comparison behavior changes for full-width strings.

We switched back to NLS because of this problem.

<ItemGroup>
  <RuntimeHostConfigurationOption Include="System.Globalization.UseNls" Value="true" />
</ItemGroup>
@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added area-System.Globalization untriaged New issue has not been triaged by the area owner labels Nov 17, 2020
@ghost
Copy link

ghost commented Nov 17, 2020

Tagging subscribers to this area: @tarekgh, @safern, @krwq
See info in area-owners.md if you want to be subscribed.

Issue Details
Description:

Description

The result of String.Equals("AE", "ae", StringComparison.InvariantCultureIgnoreCase) has changed to False in .NET 5 .

Steps:

  1. Create a new .NET Core 3.1 or .NET Framework 4.8 VB console project
  2. Run the following code. The output is True .
Module Program
    Sub Main()
        Console.WriteLine(String.Equals("AE", "ae", StringComparison.InvariantCultureIgnoreCase))
    End Sub
End Module
  1. Set target framework to .NET 5.0, and run the project again.

Expected behavior
The output is True

Actual behavior
The output is False

Configuration

.NET SDK (reflecting any global.json):
Version: 5.0.100
Commit: 5044b93829

Runtime Environment:
OS Name: Windows
OS Version: 10.0.19041
OS Platform: Windows
RID: win10-x64
Base Path: C:\Program Files\dotnet\sdk\5.0.100\

Host (useful for support):
Version: 5.0.0
Commit: cf258a1

Regression?

Yes. This problem doesn't exist in .NET Framework 4.8 and .NET Core 3.1 .

Other information

Probably caused by switching to ICU. But this documentation didn't mention string comparison behavior changes for full-width strings.

We switched back to NLS because of this problem.

<ItemGroup>
  <RuntimeHostConfigurationOption Include="System.Globalization.UseNls" Value="true" />
</ItemGroup>
Author: Nukepayload2
Assignees: -
Labels:

area-System.Globalization, untriaged

Milestone: -

@tarekgh tarekgh removed the untriaged New issue has not been triaged by the area owner label Nov 17, 2020
@tarekgh
Copy link
Member

tarekgh commented Nov 18, 2020

@Nukepayload2 thanks for reporting the issue. I think you analysis is correct that this happen because switching using ICU but the issue is not ICU itself but how we internally handle this full-width range. we'll work on fixing this issue.

@tarekgh tarekgh added the bug label Nov 18, 2020
@tarekgh tarekgh added this to the 6.0.0 milestone Nov 18, 2020
@tarekgh
Copy link
Member

tarekgh commented Nov 20, 2020

@Nukepayload2 just to mention the workaround for this issue is do something like:

string.Compare("AE", "ae", CultureInfo.InvariantCulture, CompareOptions.IgnoreWidth | CompareOptions.IgnoreCase) == 0

This should gives you the desired behavior and you don't have to switch back to NLS.

@Nukepayload2
Copy link
Author

@Nukepayload2 just to mention the workaround for this issue is do something like:

string.Compare("AE", "ae", CultureInfo.InvariantCulture, CompareOptions.IgnoreWidth | CompareOptions.IgnoreCase) == 0

This should gives you the desired behavior and you don't have to switch back to NLS.

@tarekgh
Thanks for providing the workaround. However, we still need to switch back to NLS. Because the comparison result of Japanese Hiragana and Katakana strings has changed.

Console.WriteLine(String.Compare("まりお", "マリオ", StringComparison.InvariantCultureIgnoreCase))

.NET Framework output: 1
.NET 5 output: -1

@tarekgh
Copy link
Member

tarekgh commented Nov 23, 2020

@Nukepayload2 thanks again for the feedback.

However, we still need to switch back to NLS. Because the comparison result of Japanese Hiragana and Katakana strings has changed.

Could you please elaborate more about your scenario? I mean why comparing まりお and マリオ new results will be broken to you? I understand it will change the sort order but why this break you?

@Nukepayload2
Copy link
Author

Nukepayload2 commented Nov 24, 2020

@tarekgh

I understand it will change the sort order but why this break you?

Because our Japanese end users expect the same order as Excel when sorting Japanese strings.

@ewfian
Copy link

ewfian commented Nov 24, 2020

@tarekgh

I understand it will change the sort order but why this break you?

Because our Japanese end users expect the same order as Excel when sorting Japanese strings.

FYI.
image
Refs:

@tarekgh
Copy link
Member

tarekgh commented Nov 24, 2020

Because our Japanese end users expect the same order as Excel when sorting Japanese strings.

@Nukepayload2 do you expect this order with all cultures? or with Japanese cultures only?

@ewfian thanks for the references. The second one is from Unicode which is what ICU implementing and that is the behavior I guess the complaint is about. no? could you elaborate more about what you are trying to point at from these references?

@ewfian
Copy link

ewfian commented Dec 1, 2020

@tarekgh The new behavior base on ICU is what I expected. Just provide some references about Japanese sort order.

@tarekgh
Copy link
Member

tarekgh commented Jan 4, 2021

Closing this one per the PR #45079

@tarekgh tarekgh closed this as completed Jan 4, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Feb 3, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants