Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replacing \r\n line endings with \r line endings #3449

Merged
merged 2 commits into from
Nov 6, 2019

Conversation

d-bingham
Copy link
Contributor

Summary of the Pull Request

References

#1091
#1094
#2390
#3314

PR Checklist

  • Closes Pasted text includes CRLF pairs where unneeded #1091
  • CLA signed. If not, go over here and sign the CLA
  • Tests added/passed
  • Requires documentation to be updated
  • I've discussed this with core contributors already. If not checked, I'm ready to accept this work might be rejected in favor of a different grand plan. Issue number where discussion took place: #xxx

Detailed Description of the Pull Request / Additional comments

Combination of the PRs #1094, #2390, and #3314, especially as discussed in #3314.

In short, this changes line endings from Windows-space \r\n to the more universal \r.

Validation Steps Performed

Copied and pasted text into the terminal without the patch, line endings were doubled.
With the patch, line endings weren't doubled.

@DHowett-MSFT
Copy link
Contributor

Does the std::erase solution not do it? Hmm

@d-bingham
Copy link
Contributor Author

Does the std::erase solution not do it? Hmm

It solves the problem if we're not concerned about stripping all \n characters; the solution here only changes \r\n pairs.

@DHowett-MSFT
Copy link
Contributor

ah, interesting. I wonder how common a degenerate clipboard is that contains \r\n and \n simultaneously, and how we should treat that. I also now wonder if all \n should become \r and all \r\r should become \r...

@d-bingham
Copy link
Contributor Author

ah, interesting. I wonder how common a degenerate clipboard is that contains \r\n and \n simultaneously, and how we should treat that. I also now wonder if all \n should become \r and all \r\r should become \r...

I tend to think it's warranted to err on the side of conservative character-stripping here. \r\r, for example, is probably legitimate -- for example, the sequence Line 1\r\rLine 2 is double-spaced text in Unix-land, and I'm not sure there's any other reasonable interpretation of the sequence or reason to convert it to anything else.

An argument could be made to convert \n to \r universally (after the \r\n to \r conversion, of course, to not just double up line-endings...), but I'm not sure it actually matters.

@d-bingham
Copy link
Contributor Author

Oops, misread the comment a bit. Anyway:

\n -> \r, then \r\r -> \r doesn't work because of the possibility of getting valid Unix-land data.
\r\n ->\r, then \n -> \r probably works? But if we're seeing a lone \n, we're not looking at standard Windows-land text, and it's not clear to me what assumptions can be made about it.

@DHowett-MSFT
Copy link
Contributor

Sounds good to me. Thanks for digging in. 😄

@fcharlie
Copy link
Contributor

fcharlie commented Nov 6, 2019

@d-bingham @DHowett-MSFT When the string is long and multiple lines, can the following scheme reduce the memory copy?

std::wstring ReplaceNewline(std::wstring_view str) {
  std::wstring s;
  s.reserve(str.size());
  while (!str.empty()) {
    auto pos = str.find(L"\r\n");
    if (pos == std::wstring::npos) {
      s.append(str);
      break;
    }
    s.append(str.substr(0, pos)).append(L"\n");
    str.remove_prefix(pos + 2);
  }
  return s;
}

Copy link
Member

@zadjii-msft zadjii-msft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@zadjii-msft zadjii-msft merged commit fee3fdf into microsoft:master Nov 6, 2019
@zadjii-msft zadjii-msft added Area-Input Related to input processing (key presses, mouse, etc.) Issue-Bug It either shouldn't be doing this or needs an investigation. Product-Terminal The new Windows Terminal. labels Nov 6, 2019
@zadjii-msft zadjii-msft mentioned this pull request Nov 19, 2019
@ghost
Copy link

ghost commented Nov 26, 2019

🎉Windows Terminal Preview v0.7.3291.0 has been released which incorporates this pull request.:tada:

Handy links:

@vvuk
Copy link

vvuk commented Dec 2, 2019

Err... apologies if I'm missing something, but:

In short, this changes line endings from Windows-space \r\n to the more universal \r.

for example, the sequence Line 1\r\rLine 2 is double-spaced text in Unix-land

Unix-land (and pretty much everything non-Windows -- "more universal"?), uses \n line endings. '\r' was only used as a line ending on MacOS 9 and earlier, and a few other much older places.

Shouldn't this patch/feature be changing \r\n to \n?

Edit: note if this was tested by pasting while in, say, a ssh session to a Linux host, most unix ttys have the icrnl stty mode enabled by default. This converts CR (\r) characters to LF (\n) on input, so this patch would have appeared to work fine, but is actually relying on icrnl to do the right translation.

@DHowett-MSFT
Copy link
Contributor

@vvuk you’re correct for textual data serialized to disk, but it appears (based on a couple popular terminal emulators available today) that the universal encoding used by terminal emulators for a newline is U+000D CARRIAGE RETURN, \r.

Windows Terminal’s backing pseudoterminal infrastructure transforms an incoming carriage return (and an incoming line feed, which incidentally is the cause of the problem this PR fixed) into a key event encoding Enter.

That is handled on the other end in an application-specific manner.

@DHowett-MSFT
Copy link
Contributor

DHowett-MSFT commented Dec 2, 2019

I guess, in short, the Windows PTY always acts in icrnl mode. Judging by the behavior in putty and xterm, they make the same assumption for data that transits the clipboard.

@vvuk
Copy link

vvuk commented Dec 2, 2019

Ahh, got it. I was indeed missing something. Thanks for the explanation :)

@noedelorme
Copy link

It is well more interesting to replace \r\n by \n. Indeed, thanks to this change we can paste several line in the terminal without executing them.
For example, I use Sagemath, and I always have to paste a whole python function but I can't because lines end with \r\n and the terminal execute thes lines after \r.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-Input Related to input processing (key presses, mouse, etc.) Issue-Bug It either shouldn't be doing this or needs an investigation. Product-Terminal The new Windows Terminal.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Pasted text includes CRLF pairs where unneeded
6 participants