-
Notifications
You must be signed in to change notification settings - Fork 322
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add option to allow column position report in characters #45
Comments
I don't really want to implement any encodings other than utf8, so I prefer |
I personally don't use Lua file encoded other than utf8. And I prefers |
I think it's OK to just always use UTF8, without any options. Then this feature won't have strange interactions with caching, and can be implemented right away. |
There is also another use case: there is value in being able to check that input files are purely ASCII. E.g. it ensures that there are no comments in some other alphabet in codebase that's supposed to use purely English comments. So there should be an option to use ascii encoding. Then it makes sense to have
An option to use UTF8 but ignore decoding errors does not seem to be very useful. |
May be it would be usefull to get other option: try to use UTF8 but fall back to Or another option: detect UTF8 by BOM. It come in handy if we have to check mixed set of files. |
It will be possible to set encoding per file in config. Autodetection can be implemented separately as |
I've got a particularly fun job with respect to this one. Visual Studio Code, and then by extension anybody that wants to implement an LSP server, uses the word "character" to mean utf16 code units. I don't realistically expect you to implement this, but it means to match spec I need to go from bytes -> unicode code points -> utf16 code units. I am doing this using the lua API, so I prefer working in bytes so the built-in |
@Alloyed weird cases like that could be implemented as a separate encoding in luacheck, or maybe luacheck could load custom encodings from external modules, like with formatters. And yes, ideally columns should be reported in grapheme clusters and not just code points. Grapheme clusters seem to be pretty clearly defined in the unicode spec. |
Just a bit late but on |
@Alloyed could you clarify the LSP issue? I'm not sure what |
VSCode stores its text buffers internally as utf-16, even if the file that gets saved and loaded is saved as utf-8. This means (by intention or by accident), that when the LSP protocol wants to communicate a position in a text file, it sends it as See |
Unicode support merged into master, closing this, new release coming soon. @Alloyed if there are any issues with this please open a new one. |
Currently column position are reported in bytes.
Eg:
luacheck --ranges test.lua
will reports follow warning:In atom with linter-luacheck displays:
The problems is
♥
has 3 bytes in utf8 which caused 2 character offset in Atom.Is it possible to include an option like
--characters
in luacheck to report column position in characters rather than bytes? Maybe a related options such as--encoding
is needed (or--characters
default treat input as utf8 is enough).The text was updated successfully, but these errors were encountered: