You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have some files with a bad encoding. I think they came from a FAT32 pendrive using the latin-1 encoding, and are now in a EXT4 filesystem. Once I try to see the directory in rover, those files appear in a empty line; not even the size is shown.
I can replicate the behaviour by creating a bogus file (I'm using spanish locale, but any UTF-8 should work):
The problem is that the \355 character (í in latin encoding) is 0xED and hence a 2 multibyte starting byte in UTF-8. As the next character does't continue the 2 multibyte encoding, is an incorrect UTF-8 string.
The functions mbstowcs() and swprintf() are failling silently, returning -1, as they cannot deal with the string. So nothing gets copied to the WBUF buffer, and the row remains empty.
If you create a bogus directory, the behavior is even more interesting. The WBUF gets reused from the last usage, and the filename seems to be named as the CWD or the previous directory.
$ mkdir bad-$'\355'
$ rover
I was thinking in how to solve the issue, perhaps some workaround like the ls(1) program does, replacing the spurious character with an ? symbol or similar. Deletion and other operations work fine.
The text was updated successfully, but these errors were encountered:
I have some files with a bad encoding. I think they came from a FAT32 pendrive using the latin-1 encoding, and are now in a EXT4 filesystem. Once I try to see the directory in rover, those files appear in a empty line; not even the size is shown.
I can replicate the behaviour by creating a bogus file (I'm using spanish locale, but any UTF-8 should work):
The problem is that the \355 character (í in latin encoding) is 0xED and hence a 2 multibyte starting byte in UTF-8. As the next character does't continue the 2 multibyte encoding, is an incorrect UTF-8 string.
The functions mbstowcs() and swprintf() are failling silently, returning -1, as they cannot deal with the string. So nothing gets copied to the WBUF buffer, and the row remains empty.
If you create a bogus directory, the behavior is even more interesting. The WBUF gets reused from the last usage, and the filename seems to be named as the CWD or the previous directory.
I was thinking in how to solve the issue, perhaps some workaround like the ls(1) program does, replacing the spurious character with an ? symbol or similar. Deletion and other operations work fine.
The text was updated successfully, but these errors were encountered: