-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not short-circuit decode_utf8 with utf8 flags #11
Conversation
From the user's point of view, because Before this patch:
The last version, After this patch, all the calls to |
I checked
While I'm not sure what other issues have been fixed in 2.40, i'm confident Let me know if I should better make that above test as an actual unit test. I'm happy to. |
Do not short-circuit decode_utf8 with utf8 flags
Contradict #10 which is merged too |
Good catch. I would vote for reverting #10 then since it is not necessary?
Sent from Mailbox for iPhone On Thu, Aug 29, 2013 at 8:05 AM, Victor Efimov [email protected]
|
@vsespb can't see a test suite ( |
@miyagawa It's plaintext (not attachment) in first message https://rt.cpan.org/Public/Bug/Display.html?id=87267 |
done #12 |
Unit test for decoding behavior change in #11
$Revision: 2.54 $ $Date: 2013/08/29 16:47:39 $ ! Encode.xs + t/cow.t Addressed: COW breakage with _utf8_on() https://rt.cpan.org/Ticket/Display.html?id=88230 ! Encode.pm Reverted the document accordingly to #11 dankogai/p5-encode#10 + t/decode.t Unit test for decoding behavior change in #11 dankogai/p5-encode#12 2.53 2013/08/29 15:20:31 ! Encode.pm Merged: Do not short-circuit decode_utf8 with utf8 flags dankogai/p5-encode#11 Merged: document decode_utf8 behaviour more precise dankogai/p5-encode#10 ! Makefile.PL Added repository cpan metadata dankogai/p5-encode#9 2.52 2013/08/14 02:29:54 ! ucm/*.ucm Addressed: Unicode Mappping tables are missing Unicode Inc. license notification All files including "as long as this notice remains attached" now have that notice attached in the comment section. (cp* and mac* do not since their source files do not include that notice) https://rt.cpan.org/Ticket/Display.html?id=87340 ! lib/Encode/MIME/Header.pm t/mime-header.t Addressed: encoding "0" with MIME-Headers gets a blank string https://rt.cpan.org/Ticket/Display.html?id=87831 ! Encode.pm Addressed: Documentation buglet https://rt.cpan.org/Ticket/Display.html?id=84992 ! Byte/Makefile.PL CN/Makefile.PL EBCDIC/Makefile.PL Encode/Makefile_PL.e2x JP/Makefile.PL KR/Makefile.PL Symbol/Makefile.PL TW/Makefile.PL Applied: Patch to output #includes in deterministic order https://rt.cpan.org/Ticket/Display.html?id=86974 2.51 2013/04/29 22:19:11 ! Encode.xs Addressed: Encode.xs doesn't compile with Microsoft C compiler https://rt.cpan.org/Public/Bug/Display.html?id=84920 ! MANIFEST Addressed: t/taint.t missing https://rt.cpan.org/Public/Bug/Display.html?id=84919 2.50 2013/04/26 18:30:46 ! Encode.xs Unicode/Unicode.xs lib/Encode/Unicode/UTF7.pm lib/CN/HZ.pm lib/Encode/GSM0338.pm t/taint.t Addressed: Encode::encode and Encode::decode gratuitously launders tainted data Taintedness now propagates as it should. https://rt.cpan.org/Ticket/Display.html?id=84879 ! encoding.pm Addressed: 5.18 deprecation https://rt.cpan.org/Ticket/Display.html?id=84709 ! bin/piconv Applied: Update piconv documentation https://rt.cpan.org/Ticket/Display.html?id=84695 2.49 2013/03/05 03:12:49 ! Encode.xs Addressed: Encoding objects leak memory if decoding fails dankogai/p5-encode#8 2.48 2013/02/18 02:23:56 ! encoding.pm t/Mod_EUCJP.pm t/enc_data.t t/enc_eucjp.t t/enc_module.t t/enc_utf8.t t/encoding.t t/jperl.t [PATCH] Deprecate encoding.pm https://rt.cpan.org/Ticket/Display.html?id=81255 ! Encode/Supported.pod Fixed: Pod errors https://rt.cpan.org/Ticket/Display.html?id=81426 ! Encode.pm t/Encode.t [PATCH] Fix for shared hash key scalars https://rt.cpan.org/Ticket/Display.html?id=80608 ! Encode.pm Fixed: Uninitialized value warning from Encode->encodings() https://rt.cpan.org/Ticket/Display.html?id=80181 ! Makefile.PL Install to 'site' instead of 'perl' when perl version is 5.11+ https://rt.cpan.org/Ticket/Display.html?id=78917 ! Encode/Makefile_PL.e2x find enc2xs.bat if it works on windows. dankogai/p5-encode#7 ! t/piconv.t Fix finding piconv in t/piconv.t dankogai/p5-encode#6
[DELTA] $Revision: 2.54 $ $Date: 2013/08/29 16:47:39 $ ! Encode.xs + t/cow.t Addressed: COW breakage with _utf8_on() https://rt.cpan.org/Ticket/Display.html?id=88230 ! Encode.pm Reverted the document accordingly to #11 dankogai/p5-encode#10 + t/decode.t Unit test for decoding behavior change in #11 dankogai/p5-encode#12 2.53 2013/08/29 15:20:31 ! Encode.pm Merged: Do not short-circuit decode_utf8 with utf8 flags dankogai/p5-encode#11 Merged: document decode_utf8 behaviour more precise dankogai/p5-encode#10 ! Makefile.PL Added repository cpan metadata dankogai/p5-encode#9
$Revision: 2.54 $ $Date: 2013/08/29 16:47:39 $ ! Encode.xs + t/cow.t Addressed: COW breakage with _utf8_on() https://rt.cpan.org/Ticket/Display.html?id=88230 ! Encode.pm Reverted the document accordingly to #11 dankogai/p5-encode#10 + t/decode.t Unit test for decoding behavior change in #11 dankogai/p5-encode#12 2.53 2013/08/29 15:20:31 ! Encode.pm Merged: Do not short-circuit decode_utf8 with utf8 flags dankogai/p5-encode#11 Merged: document decode_utf8 behaviour more precise dankogai/p5-encode#10 ! Makefile.PL Added repository cpan metadata dankogai/p5-encode#9 2.52 2013/08/14 02:29:54 ! ucm/*.ucm Addressed: Unicode Mappping tables are missing Unicode Inc. license notification All files including "as long as this notice remains attached" now have that notice attached in the comment section. (cp* and mac* do not since their source files do not include that notice) https://rt.cpan.org/Ticket/Display.html?id=87340 ! lib/Encode/MIME/Header.pm t/mime-header.t Addressed: encoding "0" with MIME-Headers gets a blank string https://rt.cpan.org/Ticket/Display.html?id=87831 ! Encode.pm Addressed: Documentation buglet https://rt.cpan.org/Ticket/Display.html?id=84992 ! Byte/Makefile.PL CN/Makefile.PL EBCDIC/Makefile.PL Encode/Makefile_PL.e2x JP/Makefile.PL KR/Makefile.PL Symbol/Makefile.PL TW/Makefile.PL Applied: Patch to output #includes in deterministic order https://rt.cpan.org/Ticket/Display.html?id=86974 2.51 2013/04/29 22:19:11 ! Encode.xs Addressed: Encode.xs doesn't compile with Microsoft C compiler https://rt.cpan.org/Public/Bug/Display.html?id=84920 ! MANIFEST Addressed: t/taint.t missing https://rt.cpan.org/Public/Bug/Display.html?id=84919 2.50 2013/04/26 18:30:46 ! Encode.xs Unicode/Unicode.xs lib/Encode/Unicode/UTF7.pm lib/CN/HZ.pm lib/Encode/GSM0338.pm t/taint.t Addressed: Encode::encode and Encode::decode gratuitously launders tainted data Taintedness now propagates as it should. https://rt.cpan.org/Ticket/Display.html?id=84879 ! encoding.pm Addressed: 5.18 deprecation https://rt.cpan.org/Ticket/Display.html?id=84709 ! bin/piconv Applied: Update piconv documentation https://rt.cpan.org/Ticket/Display.html?id=84695 2.49 2013/03/05 03:12:49 ! Encode.xs Addressed: Encoding objects leak memory if decoding fails dankogai/p5-encode#8 2.48 2013/02/18 02:23:56 ! encoding.pm t/Mod_EUCJP.pm t/enc_data.t t/enc_eucjp.t t/enc_module.t t/enc_utf8.t t/encoding.t t/jperl.t [PATCH] Deprecate encoding.pm https://rt.cpan.org/Ticket/Display.html?id=81255 ! Encode/Supported.pod Fixed: Pod errors https://rt.cpan.org/Ticket/Display.html?id=81426 ! Encode.pm t/Encode.t [PATCH] Fix for shared hash key scalars https://rt.cpan.org/Ticket/Display.html?id=80608 ! Encode.pm Fixed: Uninitialized value warning from Encode->encodings() https://rt.cpan.org/Ticket/Display.html?id=80181 ! Makefile.PL Install to 'site' instead of 'perl' when perl version is 5.11+ https://rt.cpan.org/Ticket/Display.html?id=78917 ! Encode/Makefile_PL.e2x find enc2xs.bat if it works on windows. dankogai/p5-encode#7 ! t/piconv.t Fix finding piconv in t/piconv.t dankogai/p5-encode#6
$Revision: 2.54 $ $Date: 2013/08/29 16:47:39 $ ! Encode.xs + t/cow.t Addressed: COW breakage with _utf8_on() https://rt.cpan.org/Ticket/Display.html?id=88230 ! Encode.pm Reverted the document accordingly to #11 dankogai/p5-encode#10 + t/decode.t Unit test for decoding behavior change in #11 dankogai/p5-encode#12 2.53 2013/08/29 15:20:31 ! Encode.pm Merged: Do not short-circuit decode_utf8 with utf8 flags dankogai/p5-encode#11 Merged: document decode_utf8 behaviour more precise dankogai/p5-encode#10 ! Makefile.PL Added repository cpan metadata dankogai/p5-encode#9 2.52 2013/08/14 02:29:54 ! ucm/*.ucm Addressed: Unicode Mappping tables are missing Unicode Inc. license notification All files including "as long as this notice remains attached" now have that notice attached in the comment section. (cp* and mac* do not since their source files do not include that notice) https://rt.cpan.org/Ticket/Display.html?id=87340 ! lib/Encode/MIME/Header.pm t/mime-header.t Addressed: encoding "0" with MIME-Headers gets a blank string https://rt.cpan.org/Ticket/Display.html?id=87831 ! Encode.pm Addressed: Documentation buglet https://rt.cpan.org/Ticket/Display.html?id=84992 ! Byte/Makefile.PL CN/Makefile.PL EBCDIC/Makefile.PL Encode/Makefile_PL.e2x JP/Makefile.PL KR/Makefile.PL Symbol/Makefile.PL TW/Makefile.PL Applied: Patch to output #includes in deterministic order https://rt.cpan.org/Ticket/Display.html?id=86974 2.51 2013/04/29 22:19:11 ! Encode.xs Addressed: Encode.xs doesn't compile with Microsoft C compiler https://rt.cpan.org/Public/Bug/Display.html?id=84920 ! MANIFEST Addressed: t/taint.t missing https://rt.cpan.org/Public/Bug/Display.html?id=84919 2.50 2013/04/26 18:30:46 ! Encode.xs Unicode/Unicode.xs lib/Encode/Unicode/UTF7.pm lib/CN/HZ.pm lib/Encode/GSM0338.pm t/taint.t Addressed: Encode::encode and Encode::decode gratuitously launders tainted data Taintedness now propagates as it should. https://rt.cpan.org/Ticket/Display.html?id=84879 ! encoding.pm Addressed: 5.18 deprecation https://rt.cpan.org/Ticket/Display.html?id=84709 ! bin/piconv Applied: Update piconv documentation https://rt.cpan.org/Ticket/Display.html?id=84695 2.49 2013/03/05 03:12:49 ! Encode.xs Addressed: Encoding objects leak memory if decoding fails dankogai/p5-encode#8 2.48 2013/02/18 02:23:56 ! encoding.pm t/Mod_EUCJP.pm t/enc_data.t t/enc_eucjp.t t/enc_module.t t/enc_utf8.t t/encoding.t t/jperl.t [PATCH] Deprecate encoding.pm https://rt.cpan.org/Ticket/Display.html?id=81255 ! Encode/Supported.pod Fixed: Pod errors https://rt.cpan.org/Ticket/Display.html?id=81426 ! Encode.pm t/Encode.t [PATCH] Fix for shared hash key scalars https://rt.cpan.org/Ticket/Display.html?id=80608 ! Encode.pm Fixed: Uninitialized value warning from Encode->encodings() https://rt.cpan.org/Ticket/Display.html?id=80181 ! Makefile.PL Install to 'site' instead of 'perl' when perl version is 5.11+ https://rt.cpan.org/Ticket/Display.html?id=78917 ! Encode/Makefile_PL.e2x find enc2xs.bat if it works on windows. dankogai/p5-encode#7 ! t/piconv.t Fix finding piconv in t/piconv.t dankogai/p5-encode#6
Somewhere between 2.51-1 and 2.54-1, perl-Encode changed from leaving ñ (n-tilde) untouched to converting it to ñ (ampersand-pound-241-semicolon). I think this is why: dankogai/p5-encode#11 ...and I think the new behavior is actually correct: invoking decode_utf8() with known good UTF-8 is a bad idea. Solution: test before calling decode_utf8(). If our input string is valid UTF-8, use it as-is.
for the record, this totally broke unicode support in at least one Perl application I use (ikiwiki), see http://ikiwiki.info/bugs/garbled_non-ascii_characters_in_body_in_web_interface/ for the fun stuff there... maybe keeping a "safe_decode_utf8" function for backwards compatibility could be useful for all those tools out there that made the mistake of surviving with the hack? |
@anarcat sorry to hear that, but I'm sure that was due to the broken model of string handlings in the application, where it was accidentally running due to the bug in Encode.pm. If you really liked the previous behavior of Encode#decode_utf8, you can do so by providing a wrapper function that checks |
i know it was probably a "bug" in the application, but the reality of this is that this is a major change in the way this function actually works. Perl 5.20 now ships with this and this will break probably hundreds of perl programs that worked because of this handler that is now removed. i also know how to make my own wrapper, but it seems to me this should have been part of the perldelta for 5.20, at the very least, and ideally provide backwards-compatible wrappers for people. i am not saying this change is wrong, mind you, I am just saying that things that used to work are now broken and that porting old application to this new paradigm is a significant amount of work that seems to have been overlooked here. |
that's sad. sorry for your trouble.
I hope no. I hope people use things not just because "that works", but only when it's documented (or, at least sane) feature that they use.
probably yes, you're right. |
$Revision: 2.54 $ $Date: 2013/08/29 16:47:39 $ ! Encode.xs + t/cow.t Addressed: COW breakage with _utf8_on() https://rt.cpan.org/Ticket/Display.html?id=88230 ! Encode.pm Reverted the document accordingly to #11 dankogai/p5-encode#10 + t/decode.t Unit test for decoding behavior change in #11 dankogai/p5-encode#12 2.53 2013/08/29 15:20:31 ! Encode.pm Merged: Do not short-circuit decode_utf8 with utf8 flags dankogai/p5-encode#11 Merged: document decode_utf8 behaviour more precise dankogai/p5-encode#10 ! Makefile.PL Added repository cpan metadata dankogai/p5-encode#9 2.52 2013/08/14 02:29:54 ! ucm/*.ucm Addressed: Unicode Mappping tables are missing Unicode Inc. license notification All files including "as long as this notice remains attached" now have that notice attached in the comment section. (cp* and mac* do not since their source files do not include that notice) https://rt.cpan.org/Ticket/Display.html?id=87340 ! lib/Encode/MIME/Header.pm t/mime-header.t Addressed: encoding "0" with MIME-Headers gets a blank string https://rt.cpan.org/Ticket/Display.html?id=87831 ! Encode.pm Addressed: Documentation buglet https://rt.cpan.org/Ticket/Display.html?id=84992 ! Byte/Makefile.PL CN/Makefile.PL EBCDIC/Makefile.PL Encode/Makefile_PL.e2x JP/Makefile.PL KR/Makefile.PL Symbol/Makefile.PL TW/Makefile.PL Applied: Patch to output #includes in deterministic order https://rt.cpan.org/Ticket/Display.html?id=86974 2.51 2013/04/29 22:19:11 ! Encode.xs Addressed: Encode.xs doesn't compile with Microsoft C compiler https://rt.cpan.org/Public/Bug/Display.html?id=84920 ! MANIFEST Addressed: t/taint.t missing https://rt.cpan.org/Public/Bug/Display.html?id=84919 2.50 2013/04/26 18:30:46 ! Encode.xs Unicode/Unicode.xs lib/Encode/Unicode/UTF7.pm lib/CN/HZ.pm lib/Encode/GSM0338.pm t/taint.t Addressed: Encode::encode and Encode::decode gratuitously launders tainted data Taintedness now propagates as it should. https://rt.cpan.org/Ticket/Display.html?id=84879 ! encoding.pm Addressed: 5.18 deprecation https://rt.cpan.org/Ticket/Display.html?id=84709 ! bin/piconv Applied: Update piconv documentation https://rt.cpan.org/Ticket/Display.html?id=84695 2.49 2013/03/05 03:12:49 ! Encode.xs Addressed: Encoding objects leak memory if decoding fails dankogai/p5-encode#8 2.48 2013/02/18 02:23:56 ! encoding.pm t/Mod_EUCJP.pm t/enc_data.t t/enc_eucjp.t t/enc_module.t t/enc_utf8.t t/encoding.t t/jperl.t [PATCH] Deprecate encoding.pm https://rt.cpan.org/Ticket/Display.html?id=81255 ! Encode/Supported.pod Fixed: Pod errors https://rt.cpan.org/Ticket/Display.html?id=81426 ! Encode.pm t/Encode.t [PATCH] Fix for shared hash key scalars https://rt.cpan.org/Ticket/Display.html?id=80608 ! Encode.pm Fixed: Uninitialized value warning from Encode->encodings() https://rt.cpan.org/Ticket/Display.html?id=80181 ! Makefile.PL Install to 'site' instead of 'perl' when perl version is 5.11+ https://rt.cpan.org/Ticket/Display.html?id=78917 ! Encode/Makefile_PL.e2x find enc2xs.bat if it works on windows. dankogai/p5-encode#7 ! t/piconv.t Fix finding piconv in t/piconv.t dankogai/p5-encode#6
The documentation about
decode_utf8
says:which is not true, because it bypasses the whole decoding process if
$octet
already has utf-8 flags.I think it's incorrect to bypass the decoding depending on the utf-8 flags, since a) it will emit unreliable, inconsistent results depending on how the byte string has been generated, and b) it's not consistent to
decode("utf8", $octets)
in any ways.This patch eliminates the check, and passes all tests.