[BUG] Regular Expressions: matching the dot .
doesn't fully exclude all unicode line terminator characters
#5415
Labels
bug
Something isn't working
Describe the bug
When using the wildcard
.
, the plugin will transpile it to[^\r\n]
, because.
in Java excludes line terminator characters when using the.
wildcard, while cuDF does not (matches any character). However, this not exclude other unicode line terminator characters described here in the section Line terminators. We should also exclude the next-line (\u0085
), line-separator (\u2028
), and paragraph-separator (\u2029
) characters as well.Expected behavior
The wildcard
.
should exclude these extra unicode line terminator characters in addition to carriage-return and newline.Additional context
None
The text was updated successfully, but these errors were encountered: