-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
“encoding” Problem #195
Comments
I found if the old.tex and new.tex are encoded by UTF-8 with BOM, the diff.tex can be output with correct UTF8 characters and is encoded by UTF-16, which can be re-encoded to UTF-8 easily. |
So is it solved? What is BOM? |
The UTF-8 BOM is a sequence of Bytes at the start of a text-stream (0xEF,0xBB,0xBF) that allows the reader to more reliably guess a file as being encoded in UTF-8. I re-encoded the files by Vscode's "Save with Encoding" function. And I think there is something wrong with the variable $encoding, but I haven't learned Perl. |
Thanks for this report. The encoding is mostly dealt with by perl and (as you could see from my question) I have no real insight into the encoding. So I will not tackle this anytime soon but will leave the issue open in case anyone has an insight. |
I have just encountered this issue. You should use the good old CMD on Windows or PowerShell 6.2+ as the default Powershell in Windows 10/11 output file encoded with UTF-16 when you use |
Edit: The command below works, but also breaks utf-8 characters. I will stick with You can use the following in powershell to get a utf-8 output file, but it will still break when there are non-standard characters in the .tex files. latexdiff a.tex b.tex | Out-File output.tex -Encoding utf8 |
EditThe bigger issue seems to be that Powershell does not use Unicode to pipe the output from one command into another, see https://markw.dev/unicode_powershell/. I was able to get latexdiff to work in powershell using the following: > [Console]::OutputEncoding = [System.Text.Encoding]::UTF8
> latexdiff .\latex_test_files\utf8_a.tex .\latex_test_files\utf8_b.tex | Out-File -Encoding utf8 out.tex I would still recommend using Original textAddendum: It appears that this is known problem with Perl in general under Windows. See e.g. https://stackoverflow.com/a/66281302 and StrawberryPerl/Perl-Dist-Strawberry#18. See also https://stackoverflow.com/q/4942305; many other languages like Python and Node.js have since solved this issue. I messed around a bit in Perl, tried some things, but it seems like there is no working pure-Perl solution. It seems like the Perl developers cannot easily change this, either, as it will break legacy code. Solution for nowit seems to be best to just use FutureI have two ideas how one could mitigate this problem:
|
The version of latexdiff is
Working on Windows10 1909.
When I try to latexdiff the tex with the command like "latexdiff old.tex new.tex > diff.tex" or "latexdiff --encoding=utf8 old.tex new.tex > diff.tex", the "diff.tex" is encoded by UTF-16 LE, where the "old.tex" and "new.tex" are encoded by UTF-8. And the UTF-8 characters like Chinese and Japanese will be garbled.
For example,
"old.tex"
"new.tex"
“diff.tex"
The text was updated successfully, but these errors were encountered: