-
Notifications
You must be signed in to change notification settings - Fork 153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IFS: UTF-8 support is incomplete #1372
Comments
@McDutchie This is definitely broken. I'm pretty sure I had commented earlier that IFS only works for single byte locales but I can't find it now. However, I'm seeing slightly different behavior than you documented. I suspect you made a copy/paste mistake and meant to say the first output you showed was from the current source; i.e., I took your script and modified it slightly to make understanding the behavior easier. Note that
Output from ksh93u+ included with macOS and ksh93v-:
Output from ksh built from the current source:
So somewhere along the line it appears we fixed the first test. Probably as a result of my replacing most of the AST locale code with the platform's locale support. P.S., Whomever fixes this should be certain to add appropriate unit tests. |
This commit fixes BUG_MULTIBIFS, which had two bug reports in the ksh2020 branch. The modernish regression test suite now only reports eight test failures. src/cmd/ksh93/sh/macro.c: - Backport Eric Scrivner's fix for multibyte IFS characters (slightly modified for compatibility with C89). Explanation from att#737: Previously, the varsub method used for the macro expansion of $param, ${param}, and ${param op word} would incorrectly expand the internal field separator (IFS) if it was a multibyte character. This was due to truncation based on the incorrect assumption that the IFS would never be larger than a single byte. This change fixes this issue by carefully tracking the number of bytes that should be persisted in the IFS case and ensuring that all bytes are written during expansion and substitution. Bug report: att#13 - Fixed another bug that caused multibyte characters with the same initial byte to be treated as the same character by the IFS. This bug was occurring because the first byte of a multibyte character wasn't being written to the stack when the IFS delimiter had the same initial byte: $ IFS=£ $ v='§' $ set -- $v $ v="${1-}" $ echo "$v" | hd # The first byte should be c2, but it isn't due to the bug 00000000 a7 0a |..| 00000002 Bug report: att#1372 src/cmd/ksh93/tests/variables.sh: - Add (reworked) regression tests from ksh2020 for the multibyte IFS bugs. - Add a regression test for att#1372 based on the reproducer.
This commit fixes BUG_MULTIBIFS, which had two bug reports in the ksh2020 branch. The modernish regression test suite now only reports eight test failures. src/cmd/ksh93/sh/macro.c: - Backport Eric Scrivner's fix for multibyte IFS characters (slightly modified for compatibility with C89). Explanation from att#737: Previously, the varsub method used for the macro expansion of $param, ${param}, and ${param op word} would incorrectly expand the internal field separator (IFS) if it was a multibyte character. This was due to truncation based on the incorrect assumption that the IFS would never be larger than a single byte. This change fixes this issue by carefully tracking the number of bytes that should be persisted in the IFS case and ensuring that all bytes are written during expansion and substitution. Bug report: att#13 - Fixed another bug that caused multibyte characters with the same initial byte to be treated as the same character by the IFS. This bug was occurring because the first byte of a multibyte character wasn't being written to the stack when the IFS delimiter had the same initial byte: $ IFS=£ $ v='§' $ set -- $v $ v="${1-}" $ echo "$v" | hd # The first byte should be c2, but it isn't due to the bug 00000000 a7 0a |..| 00000002 Bug report: att#1372 src/cmd/ksh93/tests/variables.sh: - Add (reworked) regression tests from ksh2020 for the multibyte IFS bugs. - Add a regression test for att#1372 based on the reproducer.
This commit fixes BUG_MULTIBIFS, which had two bug reports in the ksh2020 branch. The modernish regression test suite now only reports eight test failures. src/cmd/ksh93/sh/macro.c: - Backport Eric Scrivner's fix for multibyte IFS characters (slightly modified for compatibility with C89). Explanation from att#737: Previously, the varsub method used for the macro expansion of $param, ${param}, and ${param op word} would incorrectly expand the internal field separator (IFS) if it was a multibyte character. This was due to truncation based on the incorrect assumption that the IFS would never be larger than a single byte. This change fixes this issue by carefully tracking the number of bytes that should be persisted in the IFS case and ensuring that all bytes are written during expansion and substitution. Bug report: att#13 - Fixed another bug that caused multibyte characters with the same initial byte to be treated as the same character by the IFS. This bug was occurring because the first byte of a multibyte character wasn't being written to the stack when the IFS delimiter had the same initial byte: $ IFS=£ $ v='§' $ set -- $v $ v="${1-}" $ echo "$v" | hd # The first byte should be c2, but it isn't due to the bug 00000000 a7 0a |..| 00000002 Bug report: att#1372 src/cmd/ksh93/tests/variables.sh: - Add (reworked) regression tests from ksh2020 for the multibyte IFS bugs. - Add a regression test for att#1372 based on the reproducer.
Add support for multibyte characters to $IFS This commit fixes BUG_MULTIBIFS, which had two bug reports in the ksh2020 branch. src/cmd/ksh93/sh/macro.c: - Backport Eric Scrivner's fix for multibyte IFS characters (slightly modified for compatibility with C89). Explanation from att#737: Previously, the varsub method used for the macro expansion of $param, ${param}, and ${param op word} would incorrectly expand the internal field separator (IFS) if it was a multibyte character. This was due to truncation based on the incorrect assumption that the IFS would never be larger than a single byte. This change fixes this issue by carefully tracking the number of bytes that should be persisted in the IFS case and ensuring that all bytes are written during expansion and substitution. Bug report: att#13 - Fixed another bug that caused multibyte characters with the same initial byte to be treated as the same character by the IFS. This bug was occurring because the first byte of a multibyte character wasn't being written to the stack when the IFS delimiter had the same initial byte: $ IFS=£ $ v='§' $ set -- $v $ v="${1-}" $ echo "$v" | hd # The first byte should be c2, but it isn't due to the bug 00000000 a7 0a |..| 00000002 Bug report: att#1372 src/cmd/ksh93/tests/variables.sh: - Add (reworked) regression tests from ksh2020 for the multibyte IFS bugs. - Add a regression test for att#1372 based on the reproducer.
Test script:
Expected output (UTF-8 locale):
Output on
kshcurrent development version:Version AJM 93u+ 2012-08-01
In the second test, the § characters get mangled.
For reference, output on the latest release
Version AJM 93u+ 2012-08-01
:The text was updated successfully, but these errors were encountered: