Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix LF line-ending auto format bug #10802

Merged
merged 12 commits into from
Sep 4, 2024
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -83,8 +83,10 @@ public async Task<TextEdit[]> FormatAsync(
var filteredEdits = range is null
? result.Edits
: result.Edits.Where(e => range.LineOverlapsWith(e.Range));
var minimalEdits = GetMinimalEdits(originalText, filteredEdits);
minimalEdits = NormalizeLineEndings(originalText, minimalEdits);
jordi1215 marked this conversation as resolved.
Show resolved Hide resolved

return GetMinimalEdits(originalText, filteredEdits);
return minimalEdits;
}

private static TextEdit[] GetMinimalEdits(SourceText originalText, IEnumerable<TextEdit> filteredEdits)
Expand Down Expand Up @@ -240,4 +242,44 @@ private static void UnwrapCSharpSnippets(TextEdit[] snippetEdits)
snippetEdit.NewText = unwrappedText;
}
}

// This method counts the occurrences of CRLF and LF line endings in the original text. If LF line endings are more
// prevalent, it removes any standalone CR characters from the text edits to ensure consistency with the LF style.
// This can be removed once we figure out how to get proper line ending information from the client.
private TextEdit[] NormalizeLineEndings(SourceText originalText, TextEdit[] minimalEdits)
{
var (crlfCount, lfCount) = CountLineEndings(originalText);
if (lfCount > crlfCount)
jordi1215 marked this conversation as resolved.
Show resolved Hide resolved
{
minimalEdits = minimalEdits
.Where(edit => edit.NewText != "\r")
jordi1215 marked this conversation as resolved.
Show resolved Hide resolved
.ToArray();
}

return minimalEdits;
}

private (int crlfCount, int lfCount) CountLineEndings(SourceText sourceText)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: This method could just return a bool and be called HasLFLineEndings.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a couple of thoughts/questions about this method:

  1. Has SourceText.Lines already been populated at this point? I'd be surprised if it isn't, since this occurs after all of the formatting passes. Given that, would it be more efficient to just walk through the Lines collection and check the line ending? That would give you the exact span of each line ending, so you wouldn't need to walk every character in the SourceText.
  2. Should we be concerned about other legal "new line" characters? Here are the others that I know of:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, this method does not access instance data and could be static. In fact, it's probably useful enough as a utility function to move it to SourceTextExtensions and make it an extension method on SourceText.

{
var crlfCount = 0;
var lfCount = 0;

for (var i = 0; i < sourceText.Length; i++)
{
if (sourceText[i] == '\r')
{
if (i + 1 < sourceText.Length && sourceText[i + 1] == '\n')
{
crlfCount++;
i++; // Skip the next character as it's part of the CRLF sequence
}
}
else if (sourceText[i] == '\n')
{
lfCount++;
}
}

return (crlfCount, lfCount);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For an algorithm like this, I might do a few things differently to improve the efficiency. You can take these suggestions or leave them! Or, you can take an entirely different approach! I just wanted to share a couple of opportunities:

  1. I would extract SourceText.Length to a local variable to avoid accessing it each iteration of the loop. SourceText.Length is an abstract property and is overridden by every SourceText implementation. So, I wouldn't expect the JIT to do any smart inlining here.
  2. I would capture the first character before the loop and then start the loop at index 1. That way, I could avoid the extra length check inside the loop for \r.
  3. I would likely extract sourceText[i] to a local variable or use a switch statement to avoid indexing extra times. This allows you to walk characters in pairs -- the previous and current character.
  4. This doesn't help with efficiency so much, but C# tuples are mutable structs. At the method body level, it can be useful to think of them as little packs of variables. So, you could just declare a single local at the top of the method for the result tuple and just update the fields. This won't affect efficiency and is largely a stylistic choice in this case, but I thought it might be useful to mention.
Suggested change
var crlfCount = 0;
var lfCount = 0;
for (var i = 0; i < sourceText.Length; i++)
{
if (sourceText[i] == '\r')
{
if (i + 1 < sourceText.Length && sourceText[i + 1] == '\n')
{
crlfCount++;
i++; // Skip the next character as it's part of the CRLF sequence
}
}
else if (sourceText[i] == '\n')
{
lfCount++;
}
}
return (crlfCount, lfCount);
var result = (crlfCount: 0, lfCount: 0);
var length = sourceText.Length;
if (length == 0)
{
return result;
}
var previous = sourceText[0];
if (previous == '\n')
{
result.lfCount++;
}
for (var i = 1; i < length; i++)
{
var current = sourceText[i];
if (current == '\n')
{
if (previous == '\r')
{
result.crlfCount++;
// Skip ahead to avoid counting the '\n' again during the next iteration.
// However, we need to be careful not to index past the end of the SourceText!
if (++i < length)
{
// Set previous to the character *after* current. And, since we've already
// set previous, we can continue the loop. Otherwise, previous will get set
// to the wrong value below.
previous = sourceText[i];
continue;
}
}
else
{
result.lfCount++;
}
}
previous = current;
}
return result;

As I mentioned above, it might just be simpler (and more efficient) to walk through SourceText.Lines. However, I wanted to provide some suggestions in case you keep this algorithm. Also, please note that I have not tested this code! I wrote it directly into the GitHub comment field. So, please take the accuracy with a grain of salt. I've only had one cup of coffee this morning. 😄

}
}
Loading