Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[API Proposal]: Trim method for string #102822

Closed
Mr0N opened this issue May 29, 2024 · 34 comments
Closed

[API Proposal]: Trim method for string #102822

Mr0N opened this issue May 29, 2024 · 34 comments
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Runtime

Comments

@Mr0N
Copy link

Mr0N commented May 29, 2024

Background and motivation

I propose to add the following overloads for the Trim, TrimEnd, and TrimStart methods. The ones that currently work only with the char type."
https://learn.microsoft.com/en/dotnet/api/system.string.trim?view=net-8.0
https://learn.microsoft.com/en-us/dotnet/api/system.string.trimend?view=net-8.0
https://learn.microsoft.com/en-us/dotnet/api/system.string.trimstart?view=net-8.0

API Proposal

class Example
{
    public string Trim(string trimText)
        => throw new NotSupportedException();
    public string TrimStart(string trimText)
        => throw new NotSupportedException();
    public string TrimEnd(string trimText)
        => throw new NotSupportedException();
}

API Usage

string text = "__32__".Trim("__");//Result:32
string textBegin = "__32__".TrimStart("__");//Result:32__
string textEnd = "__32__".TrimEnd("__");//Result:__32

Alternative Designs

No response

Risks

No response

@Mr0N Mr0N added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label May 29, 2024
@dotnet-issue-labeler dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label May 29, 2024
@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label May 29, 2024
@huoyaoyuan huoyaoyuan added area-System.Runtime and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels May 29, 2024
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-runtime
See info in area-owners.md if you want to be subscribed.

@KeterSCP
Copy link

.NET 9 already added overloads that accept ReadOnlySpan<char>:

@Mr0N Mr0N closed this as completed May 29, 2024
@dotnet-policy-service dotnet-policy-service bot removed the untriaged New issue has not been triaged by the area owner label May 29, 2024
@huoyaoyuan
Copy link
Member

.NET 9 already added overloads that accept ReadOnlySpan<char>

No, this is the one that's easily confused. Here ReadOnlySpan<char> means a collection of chars, not a string.

If we add a method to trim for strings, it should better be a different name.

@Mr0N
Copy link
Author

Mr0N commented May 29, 2024

.NET 9 already added overloads that accept ReadOnlySpan<char>

No, this is the one that's easily confused. Here ReadOnlySpan<char> means a collection of chars, not a string.

If we add a method to trim for strings, it should better be a different name.

.NET 9 already added overloads that accept ReadOnlySpan<char>

No, this is the one that's easily confused. Here ReadOnlySpan<char> means a collection of chars, not a string.

If we add a method to trim for strings, it should better be a different name.

A string in other quotes, how can they be confused?

""//string
''//char

@KeterSCP
Copy link

easily confused

It is indeed very confusing. As an API consumer, I would expect that "abHELLOab".Trim("ba") would not change the input string. I suspect that when .NET 9 is released, this new API will cause a lot of false positive bug reports

@Mr0N
Copy link
Author

Mr0N commented May 29, 2024

Well, just replace ReadOnlySpan with string and everything will be clear.

@stephentoub
Copy link
Member

It is indeed very confusing

cc: @terrajobst, @bartonjs

@Mr0N
Copy link
Author

Mr0N commented May 29, 2024

Well, just replace ReadOnlySpan with string and everything will be clear.

Without considering this new attribute, for the reload rating.
Well, it's not great because you need to know the internal implementation of the class to call its methods.

@hez2010
Copy link
Contributor

hez2010 commented May 29, 2024

Maybe we need an attribute for the compiler to prevent implicit conversions happening:

string.Trim([NoImplicitConversionFrom(typeof(string))] ReadOnlySpan<char> cs);

@mikernet
Copy link
Contributor

I feel like forcing a cast there will not help clarity when reading code.

The new methods that take ROSpan should be the ones that are renamed, IMO, not the ones that take a string.

x.TrimChars("abc"); // trims all the chars in the input param
x.Trim("abc"); // trims the string

Seems a lot clearer to me than

x.Trim((ReadOnlySpan<char>)"abc"); // just gotta know that the cast somehow means chars treated individually
x.TrimString("abc"); // meh

@stephentoub
Copy link
Member

stephentoub commented May 29, 2024

The new methods that take ROSpan should be the ones that are renamed, IMO, not the ones that take a string.

There's already the long-standing method:

public string Trim(params char[]? trimChars)

It'd be really strange if:

public string Trim(params ReadOnlySpan<char> trimChars)

behaved differently. It would also defeat one of the purposes of adding this overload, which is that any existing usage of calls like:

... = str.Trim('a', 'b', 'c')

will compile to target the new overload and save the params array allocation.

@mikernet
Copy link
Contributor

I don't think string Trim(params ReadOnlySpan<char> trimChars) should be marked as params. I don't know though, this is a tough one. Yeah you'll lose the "automatic" optimization, but I think clarity in API usage is more important.

Usually ROSpan<char> is understood to be a string segment, not a list of individual chars. The method that treats them as individual chars should have a clear name to indicate it as such.

@mikernet
Copy link
Contributor

To be clear, string TrimChars(params ReadOnlySpan<char> trimChars) should be marked params, in my preferred API design.

@mikernet
Copy link
Contributor

mikernet commented May 29, 2024

The method that treats them as individual chars should have a clear name to indicate it as such.

And I think this would be good a good guideline to apply to any other future BCL APIs that could have ambiguous interpretation in the given context. If it is possible for an ROSpan<char> input to have sensible behavior treated as both a string segment and a list of chars, default to it being a string segment, and offer an xxxChars method that treats it as individual characters.

@stephentoub
Copy link
Member

Usually ROSpan is understood to be a string segment, not a list of individual chars

It's both.

@mikernet
Copy link
Contributor

I know it is both, but in the vast majority of contexts it is understood to be a string segment, so in the case of ambiguity where both make sense, that should be the default understanding.

@stephentoub
Copy link
Member

It depends on the context. Here, string.Trim already takes individual characters. This overload is exactly the same, just less allocating.

@Mr0N
Copy link
Author

Mr0N commented May 29, 2024

If you change the interface from ReadOnlySpan to the type 'string', then the API seems fine, transparent

@stephentoub
Copy link
Member

stephentoub commented May 29, 2024

If you change the interface from ReadOnlySpan to the type 'string', then the API seems fine, transparent

I disagree.
a) It doesn't address the problem of removing allocation from callers.
b) String is often used for its list of characters nature, e.g.

private static readonly SearchValues<char> s_illegalChars = SearchValues.Create(
"\u0000\u0001\u0002\u0003\u0004\u0005\u0006\u0007\u0008\u0009\u000A\u000B\u000C\u000D\u000E\u000F" +
"\u0010\u0011\u0012\u0013\u0014\u0015\u0016\u0017\u0018\u0019\u001A\u001B\u001C\u001D\u001E\u001F" +
"\"*:<>?|");

@Mr0N
Copy link
Author

Mr0N commented May 29, 2024

b) String is often used for its sequence of characters nature, e.g.

a) It doesn't address the problem of removing allocation from callers.

You don't need to iterate through the string character by character in the method. Instead, need to find the indices (positions) of the characters that need to be deleted. For example, if we have a string: ||27777||, and we need to delete ||, the method should find the positions of the || chars. In this case, it would be 2, meaning the 2nd character from the beginning. Then, it searches for the characters from the end. The last character is 8. Then, it uses the substring method to extract the substring from the 2nd to the 8th character, and deletes it.

@stephentoub
Copy link
Member

b) String is often used for its sequence of characters nature, e.g.

a) It doesn't address the problem of removing allocation from callers.

You don't need to iterate through the string character by character in the method. Instead, need to find the indices (positions) of the characters that need to be deleted. For example, if we have a string: ||27777||, and we need to delete ||, the method should find the positions of the || chars. In this case, it would be 2, meaning the 2nd character from the beginning. Then, it searches for the characters from the end. The last character is 8. Then, it uses the substring method to extract the substring from the 2nd to the 8th character, and deletes it.

Sorry, I don't understand how that comment relates to what I said. Can you clarify?

@Mr0N
Copy link
Author

Mr0N commented May 29, 2024

b) String is often used for its sequence of characters nature, e.g.

a) It doesn't address the problem of removing allocation from callers.

You don't need to iterate through the string character by character in the method. Instead, need to find the indices (positions) of the characters that need to be deleted. For example, if we have a string: ||27777||, and we need to delete ||, the method should find the positions of the || chars. In this case, it would be 2, meaning the 2nd character from the beginning. Then, it searches for the characters from the end. The last character is 8. Then, it uses the substring method to extract the substring from the 2nd to the 8th character, and deletes it.

Sorry, I don't understand how that comment relates to what I said. Can you clarify?

Quote reply

Not very clear on the issue of allocation for callers.

@Mr0N
Copy link
Author

Mr0N commented May 29, 2024

Well, it's like the difference between this and that overload: this one deletes consecutive characters, while that one is randoms deletes chars.

@Mr0N
Copy link
Author

Mr0N commented May 29, 2024

"12|12".Trim('1','2')//result 2|1
"12|12".Trim("12")//result |

@Mr0N
Copy link
Author

Mr0N commented May 29, 2024

"12|13".Trim("12","13")//result |
"12|sasscsac13".Trim("12","13")//result |sasscsac
"13|sasscsac12."Trim("12","13")//result |sasscsac


@hez2010
Copy link
Contributor

hez2010 commented May 29, 2024

I think we can simply disallow passing a string to params ROS<char> to resolve this.

@Mr0N
Copy link
Author

Mr0N commented May 29, 2024

"12|13".Trim("12","13")//result |
"12|sasscsac13".Trim("12","13")//result |sasscsac
"13|sasscsac12."Trim("12","13")//result |sasscsac

Well, it will be quite difficult to implement without using regex. So it's better without params.

@Mr0N
Copy link
Author

Mr0N commented May 29, 2024

Well, a new method with such functionality would also be good if Trim doesn't fit for this.

@Mr0N
Copy link
Author

Mr0N commented May 29, 2024

An example implementation of a similar method.
image

using System;

Console.WriteLine("11 443 43743 11".TrimText("11"));
Console.WriteLine("11 443 43743 1".TrimText("11"));
Console.WriteLine("1 443 43743 11".TrimText("11"));
Console.ReadKey();

static class Example
{
    public static string TrimText(this string text, string trimText)
    {
        //int lengthText = text.Length;
        int? beginIndexSubstr = null;
        int? endIndexSubstr = null;
        int positionBegin = text.IndexOf(trimText, 0);
        if (positionBegin == 0)
            beginIndexSubstr = positionBegin;
        int endIndex = text.LastIndexOf(trimText);
        int lengthFindKey = text.Length - endIndex;
        if (lengthFindKey == trimText.Length)
            endIndexSubstr = endIndex;
        beginIndexSubstr = beginIndexSubstr  != null ? beginIndexSubstr+trimText.Length : 0;
        int? lengthCutText = endIndexSubstr != null ? endIndexSubstr-beginIndexSubstr : text.Length-beginIndexSubstr;
        return text.Substring(beginIndexSubstr.Value, lengthCutText.Value);

    }
}

@RenderMichael
Copy link
Contributor

RenderMichael commented Jun 3, 2024

While .NET 9 hasn’t shipped yet, Renaming the new Trim overloads to TrimAny sounds like a great way to cut down on the ambiguity without sacrificing IDE discoverability or requiring any unintuitive hacks.

@stephentoub
Copy link
Member

While .NET 9 hasn’t shipped yet, Renaming the new Trim overloads to TrimAny sounds like a great way to cut down on the ambiguity without sacrificing IDE discoverability.

The existing Trim methods that have shipped since .NET Framework 1.0 are all "trim any" but are called "Trim". Adding "TrimAny" methods suggests that the "Trim" methods aren't "Any".

@RenderMichael
Copy link
Contributor

The existing Trim methods that have shipped since .NET Framework 1.0 are all "trim any" but are called "Trim"

This is clear when writing out a char[] but less so with a string literal.

With full knowledge of the historical overloads this makes perfect sense. However, reading the code without an IDE to see the overloads and parameter name, trying to inuit the functionality from the method name might lead to some confusion (as it did in this thread).

@stephentoub
Copy link
Member

However, reading the code without an IDE to see the overloads and parameter name, trying to inuit the functionality from the method name might lead to some confusion (as it did in this thread).

I understand the concern. My expectation is that with an IDE / something that shows you completion lists, seeing both "Trim" and "TrimAny" suggests that "Trim" is not "Any", even though it is, and will cause confusion. Two different method names for the exact same operation is something we try hard to avoid, especially when they're on the same type.

@RenderMichael
Copy link
Contributor

In fact if there's interest in supporting the "trim substrings from start/end/both" functionality, that TrimAny would be a fitting name there:

public sealed class String
{
    public string TrimAny(params ReadOnlySpan<string> trimSubstrings);
}

@github-actions github-actions bot locked and limited conversation to collaborators Jul 3, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Runtime
Projects
None yet
Development

No branches or pull requests

7 participants