From 25a7485eb752c66e042012e78f0832199ab20aeb Mon Sep 17 00:00:00 2001 From: David Wendt <45795991+davidwendt@users.noreply.github.com> Date: Tue, 11 Jan 2022 17:28:30 -0500 Subject: [PATCH] Fix regex doc describing hexadecimal escape characters (#10009) Fixes a documentation error found while diagnosing a hex regex pattern question. The hex escape sequence only specifies a single character (not a single byte). So this means it can only be used to match ASCII characters (code-points 0-127) and not all UTF-8 characters. This is the same as for octal escape sequences. Also, the example provided for hex in the documentation has been corrected to use a valid ASCII character. Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Mike Wilson (https://github.com/hyperbolic2346) - https://github.com/nvdbaranec URL: https://github.com/rapidsai/cudf/pull/10009 --- cpp/doxygen/regex.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/cpp/doxygen/regex.md b/cpp/doxygen/regex.md index b721448b45a..76ebb48d195 100644 --- a/cpp/doxygen/regex.md +++ b/cpp/doxygen/regex.md @@ -30,7 +30,7 @@ The details are based on features documented at https://www.regular-expressions. | Literal character | Any character except `[\^$.⎮?*+()` | All characters except the listed special characters match a single instance of themselves | `a` matches `a` | | Literal curly braces | `{` and `}` | `{` and `}` are literal characters, unless they are part of a valid regular expression token such as a quantifier `{3}` | `{` matches `{` | | Backslash escapes a metacharacter | `\` followed by any of `[\^$.⎮?*+(){}` | A backslash escapes special characters to suppress their special meaning | `\*` matches `*` | -| Hexadecimal escape | `\xFF` where `FF` are 2 hexadecimal digits | Matches the character at the specified position in the code page | `\xA9` matches `©` | +| Hexadecimal escape | `\xFF` where `FF` are 2 hexadecimal digits | Matches the character at the specified position in the ASCII table | `\x40` matches `@` | | Character escape | `\n`, `\r` and `\t` | Match an line-feed (LF) character, carriage return (CR) character and a tab character respectively | `\r\n` matches a Windows CRLF line break | | Character escape | `\a` | Match the "alert" or "bell" control character (ASCII 0x07) | | | Character escape | `\f` | Match the form-feed control character (ASCII 0x0C) | |