-
Notifications
You must be signed in to change notification settings - Fork 376
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Default options for Text (encoded) export filter with LibreOffice #98
Comments
Upon further inspection I found out that unoconv v0.5 used "UTF8,LF" as default filter options. This got changed in commit ad3c68d. I guess this change was based on information on openoffice wiki (http://wiki.openoffice.org/wiki/Documentation/DevGuide/Spreadsheets/Filter_Options). The Text (encoded) output filter however is document - not spreadsheet filter and the same rules need not apply. |
Ok, this is likely a regression. However, I would like to understand how it is supposed to work, because we likely document it incorrectly in the manual page as well. So before I am changing it back I would like to have an authoritative source confirming this, modify the manual page accordingly and make sure we are not breaking something else along the way. |
I'm seeing the same thing with libreoffice 4.0.4.2. Using an explicit FilterOptions=UTF8,LF fixes things for me. |
Sorry for not getting back to this sooner. I did some tests to understand what is going on and whether this is still relevant. Here are my results: Current unoconv 0.6 behavior using [dag@moria unoconv]$ /opt/libreoffice5.0/program/python ./unoconv -f txt test.fodt
[dag@moria unoconv]$ file test.txt
test.txt: UTF-8 Unicode (with BOM) English text, with very long lines, with overstriking
[dag@moria unoconv]$ /opt/libreoffice4.4/program/python ./unoconv -f txt test.fodt
[dag@moria unoconv]$ file test.txt
test.txt: UTF-8 Unicode (with BOM) English text, with very long lines, with overstriking
[dag@moria unoconv]$ /opt/libreoffice4.3/program/python ./unoconv -f txt test.fodt
[dag@moria unoconv]$ file test.txt
test.txt: UTF-8 Unicode (with BOM) English text, with very long lines, with overstriking
[dag@moria unoconv]$ /opt/libreoffice4.2/program/python ./unoconv -f txt test.fodt
[dag@moria unoconv]$ file test.txt
test.txt: UTF-8 Unicode (with BOM) English text, with very long lines, with overstriking
[dag@moria unoconv]$ /opt/libreoffice4.1/program/python ./unoconv -f txt test.fodt
[dag@moria unoconv]$ file test.txt
test.txt: data
[dag@moria unoconv]$ /opt/libreoffice4.0/program/python ./unoconv -f txt test.fodt
[dag@moria unoconv]$ file test.txt
test.txt: data
[dag@moria unoconv]$ /opt/libreoffice3.6/program/python ./unoconv -f txt test.fodt
[dag@moria unoconv]$ file test.txt
test.txt: data
[dag@moria unoconv]$ /opt/libreoffice3.5/program/python ./unoconv -f txt test.fodt
[dag@moria unoconv]$ file test.txt
test.txt: data
[dag@moria unoconv]$ /opt/libreoffice3.4/program/python ./unoconv -f txt test.fodt
[dag@moria unoconv]$ file test.txt
test.txt: data Patched unoconv behavior using [dag@moria unoconv]$ /opt/libreoffice5.0/program/python ./unoconv -f txt test.fodt
[dag@moria unoconv]$ file test.txt
test.txt: UTF-8 Unicode (with BOM) English text, with very long lines, with overstriking
[dag@moria unoconv]$ /opt/libreoffice4.4/program/python ./unoconv -f txt test.fodt
[dag@moria unoconv]$ file test.txt
test.txt: UTF-8 Unicode (with BOM) English text, with very long lines, with overstriking
[dag@moria unoconv]$ /opt/libreoffice4.3/program/python ./unoconv -f txt test.fodt
[dag@moria unoconv]$ file test.txt
test.txt: UTF-8 Unicode (with BOM) English text, with very long lines, with overstriking
[dag@moria unoconv]$ /opt/libreoffice4.2/program/python ./unoconv -f txt test.fodt
[dag@moria unoconv]$ file test.txt
test.txt: UTF-8 Unicode (with BOM) English text, with very long lines, with overstriking
[dag@moria unoconv]$ /opt/libreoffice4.1/program/python ./unoconv -f txt test.fodt
[dag@moria unoconv]$ file test.txt
test.txt: data
[dag@moria unoconv]$ /opt/libreoffice4.0/program/python ./unoconv -f txt test.fodt
[dag@moria unoconv]$ file test.txt
test.txt: data
[dag@moria unoconv]$ /opt/libreoffice3.6/program/python ./unoconv -f txt test.fodt
[dag@moria unoconv]$ file test.txt
test.txt: data
[dag@moria unoconv]$ /opt/libreoffice3.5/program/python ./unoconv -f txt test.fodt
[dag@moria unoconv]$ file test.txt
test.txt: data
[dag@moria unoconv]$ /opt/libreoffice3.4/program/python ./unoconv -f txt test.fodt
[dag@moria unoconv]$ file test.txt
test.txt: data So there seems to be absolutely no difference between Comparing the two different files generate (which are identical in size BTW): [dag@moria unoconv]$ diff -u <(hexdump -C test-76.txt) <(hexdump -C test-utf8.txt)
--- /dev/fd/63 2015-07-05 11:53:51.483061227 +0200
+++ /dev/fd/62 2015-07-05 11:53:51.483061227 +0200
@@ -113,7 +113,7 @@
00000700 20 6f 72 61 63 6c 65 69 69 6d 62 2d 72 67 2e 20 | oracleiimb-rg. |
00000710 49 74 73 20 70 72 65 66 65 72 72 65 64 20 73 65 |Its preferred se|
00000720 72 76 65 72 20 69 73 20 73 67 75 66 63 20 69 6e |rver is sgufc in|
-00000730 20 04 43 53 4d 05 2e 0a 54 68 69 73 20 72 65 73 | .CSM...This res|
+00000730 20 07 43 53 4d 08 2e 0a 54 68 69 73 20 72 65 73 | .CSM...This res|
00000740 6f 75 72 63 65 20 67 72 6f 75 70 20 61 6c 73 6f |ource group also|
00000750 20 68 6f 73 74 73 20 74 68 65 20 4d 51 20 71 75 | hosts the MQ qu|
00000760 65 75 65 20 6d 61 6e 61 67 65 72 20 6f 6e 20 74 |eue manager on t|
@@ -273,10 +273,10 @@
00001100 74 65 72 73 0a 0a 50 61 72 61 6d 65 74 65 72 0a |ters..Parameter.|
00001110 56 61 6c 75 65 0a 44 69 73 61 73 74 65 72 20 73 |Value.Disaster s|
00001120 65 72 76 65 72 20 61 6e 64 20 6c 6f 63 61 74 69 |erver and locati|
-00001130 6f 6e 0a 73 67 75 66 63 20 40 20 04 43 53 4d 05 |on.sgufc @ .CSM.|
+00001130 6f 6e 0a 73 67 75 66 63 20 40 20 07 43 53 4d 08 |on.sgufc @ .CSM.|
00001140 0a 46 61 69 6c 6f 76 65 72 20 73 65 72 76 65 72 |.Failover server|
-00001150 20 61 6e 64 20 6c 6f 63 61 74 69 6f 6e 0a 04 73 | and location..s|
-00001160 67 75 67 63 05 20 40 20 04 4d 61 72 6e 69 78 05 |gugc. @ .Marnix.|
+00001150 20 61 6e 64 20 6c 6f 63 61 74 69 6f 6e 0a 07 73 | and location..s|
+00001160 67 75 67 63 08 20 40 20 07 4d 61 72 6e 69 78 08 |gugc. @ .Marnix.|
00001170 0a 0a 43 6f 6e 66 69 67 75 72 61 74 69 6f 6e 0a |..Configuration.|
00001180 0a 43 6c 75 73 74 65 72 20 53 65 72 76 65 72 0a |.Cluster Server.|
00001190 4c 6f 63 61 74 69 6f 6e 0a 49 50 0a 73 67 75 66 |Location.IP.sguf|
@@ -290,43 +290,43 @@
00001210 43 6c 6f 76 65 72 6c 65 61 66 0a 6f 72 61 63 6c |Cloverleaf.oracl|
00001220 65 69 69 6d 62 2d 72 67 0a 63 67 75 69 69 6d 69 |eiimb-rg.cguiimi|
00001230 69 6d 62 0a 31 30 2e 36 36 2e 31 32 30 2e 31 33 |imb.10.66.120.13|
-00001240 0a 73 67 75 66 63 0a 04 43 53 4d 05 0a 53 47 49 |.sgufc..CSM..SGI|
+00001240 0a 73 67 75 66 63 0a 07 43 53 4d 08 0a 53 47 49 |.sgufc..CSM..SGI|
00001250 49 4d 42 0a 0a 63 67 75 69 69 6d 69 78 66 62 2d |IMB..cguiimixfb-|
00001260 72 67 0a 63 67 75 69 69 6d 69 78 66 62 0a 31 30 |rg.cguiimixfb.10|
00001270 2e 36 36 2e 31 32 30 2e 31 36 35 0a 73 67 75 66 |.66.120.165.sguf|
-00001280 63 0a 04 43 53 4d 05 0a 2d 0a 54 69 76 6f 6c 69 |c..CSM..-.Tivoli|
+00001280 63 0a 07 43 53 4d 08 0a 2d 0a 54 69 76 6f 6c 69 |c..CSM..-.Tivoli|
00001290 20 54 4d 46 0a 73 63 69 69 6d 74 6d 66 61 2d 72 | TMF.sciimtmfa-r|
000012a0 67 0a 63 67 75 69 69 6d 74 6d 66 61 0a 31 30 2e |g.cguiimtmfa.10.|
000012b0 36 36 2e 31 32 30 2e 31 34 0a 73 67 75 67 63 0a |66.120.14.sgugc.|
-000012c0 04 4d 61 72 6e 69 78 05 0a 2d 0a 46 69 6e 61 6e |.Marnix..-.Finan|
+000012c0 07 4d 61 72 6e 69 78 08 0a 2d 0a 46 69 6e 61 6e |.Marnix..-.Finan|
000012d0 63 65 20 4b 69 74 0a 63 67 75 69 69 6d 73 79 62 |ce Kit.cguiimsyb|
000012e0 62 2d 72 67 0a 63 67 75 69 69 6d 73 79 62 62 0a |b-rg.cguiimsybb.|
... There is a difference between byte 0x04 and 0x05 and resp. 0x07 and 0x08, and that's the only difference between LibreOffice 4.1 and older, and LibreOffice 4.2 and newer. I don't minder putting the default More info in commit 3b25f64 |
I closed the ticket. If anyone find this makes a different, please reopen this ticket and added the needed info in order to reproduce this. |
Unoconv's (version 0.6) default filter options for Text (encoded) output filter are "76,LF" (UTF-8, line feeds for paragraph breaks). With LibreOffice (3.4.5 and 3.5.7, don't know about other versions) the output is not encoded in UTF-8.
Setting FilterOptions="UTF8,LF" seems to render the desired result. Seems that LibreOffice guys changed the encoding options mapping.
The text was updated successfully, but these errors were encountered: