-
Notifications
You must be signed in to change notification settings - Fork 29.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test: encoding auto detection #23322
Comments
I've created this repository with several encoding samples: |
Will these changes also address #21146? Or is there still more to be done to get this into the diff editor? |
There is still a persistent issue where VSCode will ignore the current file encoding in favor of the Example: I create a new shift-jis encoded file with corresponding characters. Save it with Shift-JS encoding. Close it. Open it. VSCode opens the file as UTF-8. It happens with other encodings as well, such as Latin-1 (ISO-8895-1). When I reopen an encoded file, it will sometimes pick the wrong encoding instead of using the existing encoding. Here is a gif of the bug in action: You can see how the files are encoded as Shift-JIS and ISO-8895-1, but when I reopen the files, they are opened as Windows 1252 and UTF-8 encoding respectively, and this breaks some characters. If I then save the file, the encoding is saved wrong and the characters will remain broken. |
@duanehutchins that is why it is called "guessing". there is no such thing as detection with 100% certainity. I suggest you report your samples to the library we are using: |
Hopefully it isn't a problem that this doesn't apply to search. |
1. Some jcharset сharset names do not correspond with vscode charset aliases For example: test sample: 2. Autodetection does not work on search Fix will likely on searchworker.ts: private readlinesAsync(filename: string, perLineCallback: (line: string, lineNumber: number) => void, options: ReadLinesOptions): TPromise<void> {
...
...
// Check for BOM offset
switch (mimeAndEncoding.encoding) {
case UTF8:
pos = i = bomLength(UTF8);
options.encoding = UTF8;
break;
case UTF16be:
pos = i = bomLength(UTF16be);
options.encoding = UTF16be;
break;
case UTF16le:
pos = i = bomLength(UTF16le);
options.encoding = UTF16le;
break;
// fix here
default:
if (mimeAndEncoding.encoding) {
pos = i = 1;
options.encoding = mimeAndEncoding.encoding;
}
break;
}
.... |
@buzzzzer I fixed that case with CP866, if there are more cases, let me know. Encoding detection is only for files, not for search right now. |
It's a pity. |
Sometimes simple things need years to get right. |
@buzzzzer Search is provided by ripgrep in 1.11, not searchWorker.ts, and ripgrep only does encoding autodetection for utf-8/16. You can still set |
@roblourens so to be clear, ripgrep allows to set the encoding, but only for all files it goes through, not per particular file/folder? |
Yes, pre-ripgrep search worked the same way |
And what should set to files.encoding ? |
pre-ripgrep search (searchworker) with autodetect I have been using for about 4 months, and that WFM Any planned to support autoguessencoding with ripgrep search? pre-ripgrep will be retained or deleted in future versions? |
Encoding autodetect has only existed for less than a month, and doesn't apply to non-ripgrep search, so I'm not sure how that was any different for you. I think it's unlikely that ripgrep would get fancier encoding detection since that will slow it down, and their focus is entirely on speed. I'll keep it in the short term. If there is a usecase that just can't be handled by ripgrep, I might keep it longer. |
I build vscode from the sources with modified (for search) #10013 every stable build. |
Auto-detection doesn't seem to work for "windows1251" encoding. It thinks that it is UTF-8... |
Refs: #5388
We are now using the jschardet library to try to guess the encoding from the contents of a file. As a preparation for testing, try to get some text files in the supported encodings. There is a shift-jis file checked in that is known to work but other encodings would also be interesting.
From @joaomoreno
There are two scenarios to test:
From the new
files.autoGuessEncoding
setting:files.encoding
setting is still being used as long asfiles.autoGuessEncoding
is not enabledfiles.autoGuessEncoding
is enabled and you try with files that use a specific encoding not set as configured workspace encoding (e.g.shift-jis
)utf8
by default) if the detection is not returning any more specialized encoding (in particular, opening anASCII
file should show youutf8
)From the encoding selector picker:
files.encoding
toshift-jis
)The text was updated successfully, but these errors were encountered: