Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Display errors with certain characters #234

Open
whiteplastic opened this issue Apr 23, 2012 · 33 comments · May be fixed by #1289
Open

Display errors with certain characters #234

whiteplastic opened this issue Apr 23, 2012 · 33 comments · May be fixed by #1289

Comments

@whiteplastic
Copy link

I use a custom irssi theme that contains the UTF-8 "Fleur de Lys" symbol (U+269C - ⚜). While this character is displayed just fine when I use ssh, it just disappears in mosh. Also, there are display errors in irssi: random characters just disappear or get swapped by other characters. This only occurs when I use my custom theme so there might be a connection.

@keithw
Copy link
Member

keithw commented Apr 23, 2012

On Linux, this works fine for me, but on Mac OS X 10.7.3, the system does not know about this character and wcwidth() returns -1 (unprintable), so mosh does not know how many columns the character will occupy.

Assuming you are using a Mac, that unfortunately is the answer. We will report this to Apple.

@whiteplastic
Copy link
Author

Yes, I'm on OSX 10.7.3. It seems like the system does know about this character. ssh and any other application I use knows and displays it, the only application that seems not to know it is mosh.

@kmcallister
Copy link
Contributor

SSH doesn't need to know about characters; it just conveys a stream of bytes from one end to the other. Mosh has a terminal state object which is synchronized between server and client, so it needs the character metadata on both machines.

What outer terminal emulator are you using; is it OS X's standard Terminal.app? And do you have any other terminal emulators in the mix, e.g. screen or tmux?

You can compile and run this C program on both machines to check if wcwidth knows about U+269C.

#define _XOPEN_SOURCE
#include <wchar.h>
#include <locale.h>
#include <stdio.h>

int main() {
    setlocale(LC_ALL, "");
    printf("%d\n", wcwidth(0x269C));
    return 0;
}

(I didn't test this on OS X, so it's possible it will fail to compile for some reason.)

It will print a positive number iff the character is known. Make sure to run it in a Unicode locale. If you don't have one by default, you can do something like

gcc -o foo foo.c;  LANG=en_US.UTF-8 ./foo

If you get a positive number on both server and client, and yet Mosh does not work correctly, then there's a bug in Mosh and we can investigate further.

(In the long run I would like to use a dedicated Unicode library, and drop our dependence on the system locale libraries, which have caused no end of trouble. See discussion on #74.)

@keithw
Copy link
Member

keithw commented May 5, 2012

I think officially speaking, a Unicode app is supposed to use the "default" properties of the code point range (including width) if it doesn't know about the particular character. Unfortunately there doesn't seem to be a way to get these default properties in POSIX. A dedicated Unicode library would help with this.

@EdSchouten
Copy link
Contributor

Hi Keith,

Just checking. I think you can't assume wchar_t is ISO 10646. It is just an implementation defined `wide character'. If you are working with ISO 10646 inside Mosh explicitly (not wide characters), then you shouldn't use wcwidth(). In the past I once needed a compact implementation of wcwidth(), explicitly for use with ISO 10646. Markus Kuhn has an implementation that seems to work quite nicely:

http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c

Maybe it is of any use to you? Otherwise, I'm pretty sure IBM's ICU should be of use:

http://site.icu-project.org/

Ed

@keithw
Copy link
Member

keithw commented Jun 23, 2012

Hi Ed,

At configure time we check for __STDC_ISO_10646__, which the C library is supposed to define if wchar_t is ISO/IEC 10646 / UTF-32. (We used to assert it, but in practice only GNU libc seems to define it, even though OS X and FreeBSD do also obey it in practice. We print a warning on configure on these systems.)

We may have to ship our own Unicode library eventually. ICU is kind of a monstrous beast though.

-Keith

@EdSchouten
Copy link
Contributor

Hi Keith,

Thanks for the explanation!

@lilyball
Copy link
Contributor

I just commented this on #361, but since it's about OS X it seems a bit more relevant to this ticket (although both tickets appear to be virtually the same thing):


I keep periodically hitting situations where various characters don't render in Mosh, because wcwidth() doesn't support them (OS X client, Ubuntu server). As documented, some characters are because OS X's wcwidth() returns -1, but I also see a bunch of characters (notably, emoji like U+1F4A9 PILE OF POO) that OS X supports but Ubuntu's doesn't (curiously, __STDC_ISO_10646__ on Ubuntu claims that Unicode 6.0 is supported, and the code chart for Unicode 6.0 does list this character, so I don't know why wcwidth() is returning -1).

At this point I'm thinking the only real solution to this problem is for Mosh to calculate character widths itself. Perhaps it could fall back to its own calculation if the platform-provided wcwidth() returns -1, thus allowing the platform's idea of width to take precedence for all characters it knows about. The only real issue with this that comes to mind is if the calculated width disagrees with how the rendering terminal thinks the character should display, but I did some research earlier today and it seems that all characters (including reserved ones) outside of the already-defined East_Asian_Width blocks are assumed to be "Neutral", which basically means they'll never have a width of 2. Assuming a width of 1 for any reserved characters seems reasonable, because if the OS disagrees it will provide an explicit 0 instead of -1 (and I'm suggesting you use this calculation only when the OS version returns -1).


Or as suggested in this ticket you could just ship your own unicode library entirely. My concern is that if Mosh thinks a character has a width of 1 but the terminal emulator thinks it has a width of 2, that will presumably render incorrectly. I'm assuming that the terminal emulator agrees with wcwidth() (for all characters where wcwidth() returns a non-negative value; Terminal.app on OS X renders e.g. U+26A1 HIGH VOLTAGE SIGN as one cell but wcwidth() on OS X returns -1). That assumption is why I suggested above to use the return value of wcwidth() whenever it's non-negative and fall back to a custom implementation otherwise.

@lilyball
Copy link
Contributor

Addendum: Apparently glibc uses Unicode 6.0 but it's LC_CTYPE support is still stuck at Unicode 5.0 (and wcwidth() uses LC_CTYPE).

@jhrmnn
Copy link

jhrmnn commented Oct 3, 2014

⚡, U+26A1 seems to be problematic for example. Mosh under Terminal.app displays it is as a zero-width character in Vim. Leading to very strange behaviour in a shell...

The left terminal is mosh/tmux/fish, right ssh/tmux/fish in the same tmux session.

When the mosh terminal is smaller than ssh, mosh is off by one character on the command-line. But if the ssh terminal is bigger, mosh is by some miracle right even though skipping ⚡.

This is probably not worth any work, I guess, but it might useful to mention this problem in documentation, so one can find it upon searching for unicode or utf-8. I spent good two hours on this :)

@cgull
Copy link
Member

cgull commented May 26, 2015

My current thinking on Unicode issues:

Mosh is a virtual terminal, split across client and server, and it
uses normal terminal datastreams between client and server.
Therefore, it must be consistent between client and server, and should
be as advanced with its Unicode version as it can be. If we are up to
date on Unicode, there's no need to match the server application's
notion of Unicode: if a server application outputs a Unicode character
that it doesn't know about, then it has already lost: if it's doing
any formatting of the output, it doesn't know how wide the character
is and may be feeding us corrupt line or full screen formatting to
begin with.

This argument dictates that Mosh must have its own internal wcwidth
implementation for its virtual terminal, because client & server may
have different host wcwidth implementations. If mosh receives a
character known by the server's wcwidth but not the client's, then
its placement of subsequent characters on the line will be wrong in
our virtual terminal, and we will lose badly, because Mosh quite
efficiently avoids redisplaying characters it doesn't think have
changed.

Mosh then sends the character off to the client's terminal, where it
can be correctly formatted and displayed. Now we have the problem
that the display terminal may have a lower version of Unicode than
Mosh does, and may therefore corrupt output if its notion of character
width differs from ours.

This is in general a hard problem: Most current terminal emulators
either depend on a system's GUI environment (gnome, kde) for i18n, or
have their own implementation to escape the vagaries of host OS
implementation. So most terminal emulators actually do something
better than the host OS's wcwidth implementation, which also means
that the host wcwidth does not usefully tell us what the terminal
will actually do. Mosh cannot know what version of Unicode the
display terminal is using; the only thing it can even begin to do is
output characters and check the cursor position after output. There
is one heuristic that we can check for: most terminal emulators set
environment variables to indicate their presence and sometimes even
their version. Using this heuristics means maintaining tables of
programs/versions against Unicode versions they support, though.

But the user can legitimately ssh into a remote host and run
mosh-client there, in which case these variables have been discarded
and we have no clue. We can't handle that. At all.

My current best idea for handling this is to offer the user two
options:

  • Pass through all the Unicode characters Mosh knows about,
    naively expecting the client terminal emulator to handle it all. This
    is not at all what ssh does by blindly passing bytes through, but is
    similar in spirit and will be similar in behavior.
  • Offer a --restricted-unicode option that implements, say, Unicode
    5.0 or the client host's wcwidth, and translates all characters
    unknown to that wcwidth to U+FFFD REPLACEMENT CHARACTER (�), padded with
    a space if our internal wcwidth tells us it's double-wide. This
    would probably happen at the point of final output from the client's
    virtual terminal.

One unfortunate thing here is that Unicode will continue to grow
with new versions. When that happens, if we upgrade our internal
wcwidth, we are back to the current situation of differing client and
server Unicode versions-- but if we have an up-to-date wcwidth
implementation, we are doing better than using the system
implementation.

Perhaps we need to design a scheme where the client gets a character
width table from the server. I think this idea has been mentioned before.

About the Markus Kuhn wcwidth implementation: It's
been brought up several times in Mosh discussion. It's an excellent
easy-to-understand sample implementation, But it has a number of
unpredictable branches, and then an expensive binary search through
its tables. The commonly-available copies of it available around the
net are now out of date, and it is slow. It has significant
performance impact when coupled with my performance code; I have
benchmarked it against the FreeBSD wcwidth and the musl wcwidth,
both are much better (but a lot less readable). Also, I offer you
this tidbit:

http://osdir.com/ml/internationalization.linux/2001-01/msg00191.html

Mosh is an application that uses wcwidth heavily, and can spend
significant time in slower wcwidth implementations, slowing down
character handling noticeably.

Separately, Google shows me a discussion on GNU libc that its wcwidth
calls an expensive linear search to determine which locale it's in.
That will no doubt get fixed, but.

I have not looked at it as closely, or in a while, but if I remember right ICU does not directly offer a wcwidth function, and in general it's a heavyweight featureful implementation not suited to be called for individual characters as often as we do.

@zuzak
Copy link

zuzak commented Jul 1, 2015

This doesn't appear to be a mac-specific issue: I have this problem in gnome-terminal on Ubuntu. Emoji don't render in an irssi screen session over mosh 1.2.4a, but do on the same screen session over SSH.

@rapha8l
Copy link

rapha8l commented Jul 16, 2015

Hi,
Also on Linux ⮂ and ⮀ do not display at all with mosh 1.2.4a with any terminal and utf-8 set on both sides
Thanks

@chenkaie
Copy link

Yeap, I think for a heavy terminal user, powerline is a well known package.
However certain symbols/patched fonts are used to make it looks fancy, like all these symbols ⭠ ⭡ ⭢⭣ ⭤ ⮀ ⮁ ⮂ ⮃ ⋅ ⋮ ❐
If this issue can be handled, that would be awesome 👍

@raine
Copy link

raine commented Nov 20, 2015

I have the same problem where emojis are not rendered when connecting with mosh but they do when using just ssh.

@andrey-str
Copy link

Have the same issue as @raine : mosh does not display unicode emoji symbols(🏠 in my case), but ssh does. I tried with iTerm2 and iTerm3 Beta on OS X.

@NHDaly
Copy link

NHDaly commented Apr 12, 2016

Bump to resurrect this thread. I'm having the same issue as above, also for emojis (🏠, 🖥, 🚀, 👾 in my case, coming from the hostnames file in my dotfiles).

Is there a plan to move forward with @cgull's proposal?

@bhamiltoncx
Copy link

If you don't want to bring in the beast that is ICU, you can just ship the EastAsianWidth.txt file:

http://www.unicode.org/Public/UCD/latest/ucd/EastAsianWidth.txt

It's pretty easy to parse this and transform it into whatever form you want.

@daviddias
Copy link

Is there any update with a solution for this? Specially for the chars mentioned here: #234 (comment) ? Thank you!

@tombh
Copy link

tombh commented Apr 2, 2017

I've just been down the rabbit hole of this problem. There are so many places that could take responsibility for it;

  • glibc (on Linux)
  • mosh
  • tmux
  • powerline
  • individual terminal clients
  • compiling without utf8proc on OSX
  • compiling with utf8proc on Linux

In summary it just seems like a subtle problem, that can't be easily fixed in one place. So for now I'm just going to remove any special characters from my setup.

@rwuwon
Copy link

rwuwon commented Jun 27, 2018

Edit: After writing all this, I've gone over the earlier comments again and they make more sense to me now. Please disregard if all of the following is already well understood and has no fix.

I've been trying to troubleshoot this over the past few days and believe I've started to make some progress in narrowing this down as far as the 🤔 emoji/utf-8 display goes (it's UTF/unicode, but I'm testing with the thinking face emoji so I'll refer to it as that here). By the way, don't try to copy the emoji from here on GitHub because they turn it into an image - instead, head to emojipedia to copy & paste into your own terminals.

I don't think there's significant relevance to what (modern) terminal program is being used (gnome-terminal, macOS Terminal, iTerm2, JuiceSSH, etc - they all default quite well these days). I also don't think tmux or even irssi has anything to do with it - but to be clear, I've been testing with only plain bash and fish; no tmux, no powerline - no other user-complications to the best of my knowledge.

What's working in CentOS 7, Ubuntu Server 14.04.5 LTS, Ubuntu Server 18.04 LTS, Fedora 28:

  • Emojis work through plain ssh - any terminal, any server. Default en_AU.UTF-8 configurations set by initial system installation through location selection ("locale" displays the exact same thing in every instance I've tested). All my locales are the same everywhere. US keyboard layout.

What not working in CentOS 7.5.1804 (including one non-test install; mosh 1.3.0), Ubuntu Server 14.04.5 LTS (mosh 1.3.2):

  • Pasting emojis right after connecting using mosh (mosh --ssh 'ssh -p 22222' localhost or mosh localhost --ssh 'ssh -p 22222' for the fresh server-install test VMs on my machine). Again, identical locales as far as I know.

The two cases where emojis through a mosh connection does work:

  • My Fedora 28 desktop as server (1.3.0) - connecting from macOS with mosh 1.3.2. I'm still working on trying to get the Mac firewall to open up properly but I anticipate that unicode will work on it as a mosh server.
  • The test Ubuntu 18.04 server edition running inside VirtualBox (mosh 1.3.2)

Suggestion for all in this thread:
Please note these aren't intended as workarounds and are only to help eliminate what I believe are some red herrings (tmux, irssi, terminal emulators, etc).

  1. See if you can all reproduce this issue by installing a basic server/minimal install of CentOS 7.5, Ubuntu 14.04.5 in VirtualBox (or qemu-kvm if you prefer, but make sure you understand how to SSH/Mosh to it from the host) - I think it's likely you will, should you set up CentOS 7.5 or Ubuntu 14.04 (and maybe 16.04??).
  2. Set up port forwarding so you can SSH into it (I've written up some quick VirtualBox/network port forwarding tips in a gist here - let me know if you need more help).
  3. Also try Ubuntu Server 18.04 - that should work. I haven't tried 16.04 or other distros yet. With the set-ups that work, emojis will also display inside tmux (both ssh and mosh) but again, I don't think we're dealing with a tmux issue here when bash under Mosh isn't displaying the emoji types of unicode either.

What I haven't tried:

  • Setting everything to a completely en_US.UTF-8 locale. If there's anyone who does use en_US everywhere, I'd love to know what results you get with the three suggested steps I've outlined above.
  • macOS 10.13.5 and Ubuntu 16.04 as a server, or other current distros.
  • Apologies about the inconsistencies in versions - I think I've covered most permutations anyway?

Please let me know if this gets us any closer to where the problem might be.

Edit 20180711: As per some of the other closed issues above, I only have glibc 2.17 on the server. I'm now considering a migration away from CentOS 7.5.1804 to sort this.

Edit 20180808: I've just completed a migration from CentOS 7.5 (glibc 2.17) to Debian 9.5 Stable (glibc 2.24) and am satisfied with the results. Also expecting to have something like glibc 2.27 with Debian 10 next year. Those who need or wish to remain with CentOS, hopefully version 8 isn't too far away.

@mpolden
Copy link

mpolden commented Aug 15, 2018

I'm having a similar issue with zero width spaces (U+200B). Printing U+200B typically causes some kind of display corruption.

Likely cause seems to be that my client and server disagrees about the width of this particular character (locale en_US.UTF-8):
Server: wcwidth(0x200B) == 0
Client: wcwidth(0x200B) == 1

Server is Debian stable (stretch) and mosh 1.2.6, client is macOS 10.13.6 and mosh 1.3.2.

@jshort
Copy link

jshort commented Jul 19, 2019

Same issue with an OMZ theme that displays a 'gear' character if you have background processes in your shell. Works fine with a raw ssh session but not with mosh.

@jquast
Copy link

jquast commented Jun 8, 2020

@tombh regaring your "rabbit hole", I think you may be pleased to find my article, "Offering a solution for Terminal Wide Character issues" https://jeffquast.com/post/terminal_wcwidth_solution/

I have authored a demonstration CLI utility that is able to automatically detect the version of Unicode supported by the Terminal emulator, https://github.com/jquast/ucs-detect/ and a new release of python wcwidth library https://github.com/jquast/wcwidth that supports all versions of unicode by selection using the exported environment variable.

@nferch
Copy link

nferch commented Jul 12, 2021

Apologies in advance for bumping this thread, have been affected by this issue and finally was able to identify mosh as the culprit. I have annoying text alignment issues similar to @mikaabra in #361 (in a TUI email client, nonetheless!).

Am not a Unicode expert by any means, so cannot begin to fathom the complexity of a fix of the root cause, but curious if there's been any other workarounds?

I'm using on mosh 1.3.2 from Homebrew on OS X 11.4 "Big Sur", connecting to a Ubuntu 18.04 "Bionic Beaver" box using mosh 1.3.2 and libc6 2.27-3ubuntu1.4. Having trouble displaying the ⚾ character.

I wonder if upgrading to 20.04 "Focal Fossa" would help? That seems to be using glibc 2.31-0ubuntu9.2. Although it seems like the simple passing of time hasn't done much to fix this issue :/

@Casandro
Copy link

The bug also seems to exist with the "symbols for legacy computing"
https://en.wikipedia.org/wiki/Symbols_for_Legacy_Computing
Test setup:
Server: Debian 10 (mosh 1.3.2, tmux 2.8)
Client: Debian 11 (mosh 1.3.2) and Xubuntu 21.10 (mosh 1.3.2)
When loading https://github.com/Casandro/teletext_ng/blob/main/tools/dump_tta_text_colour.c in vim the special mosaic characters are there via ssh, but just missing via mosh.

@lifei
Copy link

lifei commented May 16, 2023

image
image
image
image

I made some debug on MSYS2 mosh. I think something wrong with the convert in Cell class.

image
image

I find some code may be related.
image

I also do some research on there code. and I find something wrong.
image
image

@lifei

This comment was marked as duplicate.

@lifei
Copy link

lifei commented May 16, 2023

well i took ten hours try to find what's wrong. Then I find that the following code does not work right in msys2.

#define _XOPEN_SOURCE
#include <stdlib.h>
#include <stdio.h>
#include <wchar.h>
#include <locale.h>
#include <stdio.h>
#include <string>

int main() {
    setlocale(LC_ALL, "zh_CN.UTF-8");
    wprintf(L"wcwidth of 0x269C is %d\n", wcwidth(0x269C));
	std::wstring in = L"📁💕😘😒🤦";
	wprintf(L"length of string in is %d\n", in.size());
	for (std::wstring::const_iterator i = in.begin(); i != in.end(); i++)
	{
    	wprintf(L"wcwidth = %d\n", wcwidth(*i));
	}
	wprintf(L"%ls", in.c_str());
    return 0;
}

Result in msys2

wcwidth of 0x269C is 1
length of string in is 10
wcwidth = -1
wcwidth = -1
wcwidth = -1
wcwidth = -1
wcwidth = -1
wcwidth = -1
wcwidth = -1
wcwidth = -1
wcwidth = -1
wcwidth = -1
📁💕😘😒🤦

Result in debian or WSL

wcwidth of 0x269C is 1
length of string in is 5
wcwidth = 2
wcwidth = 2
wcwidth = 2
wcwidth = 2
wcwidth = 2
📁💕😘😒🤦

but there is no libc in msys2.

$ ldd a.exe
        ntdll.dll => /c/WINDOWS/SYSTEM32/ntdll.dll (0x7fff75af0000)
        KERNEL32.DLL => /c/WINDOWS/System32/KERNEL32.DLL (0x7fff743a0000)
        KERNELBASE.dll => /c/WINDOWS/System32/KERNELBASE.dll (0x7fff72f10000)
        ADVAPI32.DLL => /c/WINDOWS/System32/ADVAPI32.DLL (0x7fff75980000)
        msvcrt.dll => /c/WINDOWS/System32/msvcrt.dll (0x7fff75350000)
        sechost.dll => /c/WINDOWS/System32/sechost.dll (0x7fff74470000)
        RPCRT4.dll => /c/WINDOWS/System32/RPCRT4.dll (0x7fff74f60000)
        msys-stdc++-6.dll => /usr/bin/msys-stdc++-6.dll (0x526840000)
        msys-gcc_s-seh-1.dll => /usr/bin/msys-gcc_s-seh-1.dll (0x5e8160000)
        msys-2.0.dll => /usr/bin/msys-2.0.dll (0x180040000)

I suggest that using another way to split string into cells would be bring a high compatibility.

@lifei
Copy link

lifei commented May 19, 2023

ok. everyone.
i have spent more than 100 hours to figure out the method to render emoji on mosh of msys2.
i finally find a way. here is the snapshots.
image
image

@lifei
Copy link

lifei commented May 19, 2023

here is the pr: #1271

@JRGonz
Copy link

JRGonz commented Apr 19, 2024

Ok so it isn't just me. I am noticing this as well and just spent forever trying to figure out what it was in the chain. I have artifacts all over the place when using mosh+tmux+iamb. I guess I can just write this off as a mosh issue?

Edit: Forgot to add that I'm seeing this same behavior in blackbox (terminal) on my Fedora desktop using Gnome. ssh on its own renders just fine but when mosh connects then I get artifacts all over the place where there are emoji (tend to see this when moving around in iamb, gomuks, weechat)

@cbean
Copy link

cbean commented Jul 24, 2024

Same issue on gentoo, using UTF8 while using ohmyzsh agnoster theme with root, the thunder symbol is somehow not visible. ⚡

mosh-1.4.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.