Should \R match \u001bE? #4

fstirlitz · 2022-06-21T07:43:27Z

One of the code points that are supposed to be matched by \R is <NL>, that is U+0085, which is the C1 control code NEXT LINE (NEL). The definition of <NL> is missing from the specification text, but is implied by the contents of the README.

However, C1 control codes have an alternative representation using ASCII code points; U+0085 has an alternative representation as U+001B U+0045, and for example terminal emulators that support the former as a line-ending character tend to also support the latter (e.g. VTE).

$ printf 'qwe\x1bErty\nabc\xc2\x85def\n'
qwe
rty
abc
def

Some, in fact, only support the the latter (e.g. xterm, native Linux console subsystem):

$ printf 'qwe\x1bErty\nabc\xc2\x85def\n'
qwe
rty
abcdef

$ printf 'qwe\x1bErty\nabc\xc2\x85def\n'
qwe
rty
abc◈def

As such U+0085 can be considered equivalent to (or at least no better than) U+001B U+0045, and it is inconsistent to recognise the former, but not the latter. As such, U+001B U+0045 should be included as a recognised line ending sequence.

On the other hand, the inclusion of NEL (in either form) makes the escape not align with ^ and $ in mu mode, despite the claim in the README. So perhaps removing NEL altogether is also an option.

Which is it going to be?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should \R match \u001bE? #4

Should \R match \u001bE? #4

fstirlitz commented Jun 21, 2022

Should \R match \u001bE? #4

Should \R match \u001bE? #4

Comments

fstirlitz commented Jun 21, 2022