Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attach add-on sometimes reorders output #1893

Closed
markspeters opened this issue Jan 12, 2019 · 9 comments
Closed

Attach add-on sometimes reorders output #1893

markspeters opened this issue Jan 12, 2019 · 9 comments

Comments

@markspeters
Copy link

The attach add-on sometimes reorders data despite the frames being received in-order over the socket.

Details

  • Browser and browser version: Chrome 71.0.3578.98
  • OS version: OSX 10.13.4
  • xterm.js version: 3.10.1 (not positive, but don't think the plugin has changed since)

Steps to reproduce

  1. Attach the terminal to a websocket.
  2. Send the websocket two consecutive messages where the first's .data will be a Blob and the second an ArrayBuffer or string.

Expected: The data from these two messages to be displayed in order.
Actual: They're sometimes reordered.

My hack fix

The issue appears to be that FileReader's readAsArrayBuffer as asynchronous, so subsequent messages can be received/decoded/displayed while the FileReader is still loading.

I'm able to work around this by attaching an ID to each displayable message as they come in, and using that in displayData to check whether there are outstanding messages (and buffering them if there are). I'm patching the JS, not the TS but hopefully the idea comes across:

var nextId = 1;
var lastDisplayedId = 0;
var msgsToWrite = {};
var myTextDecoder;
addonTerminal.__getMessage = function (ev) {
	var str;
	if (typeof ev.data === 'object') {
		var msgId = nextId++;
		if (!myTextDecoder) {
			myTextDecoder = new TextDecoder();
		}
		if (ev.data instanceof ArrayBuffer) {
			str = myTextDecoder.decode(ev.data);
			displayData(msgId, str);
		}
		else {
			var fileReader_1 = new FileReader();
			fileReader_1.addEventListener('load', function () {
				str = myTextDecoder.decode(fileReader_1.result);
				displayData(msgId, str);
			});
			fileReader_1.readAsArrayBuffer(ev.data);
		}
	}
	else if (typeof ev.data === 'string') {
		displayData(nextId++, ev.data);
	}
	else {
		throw Error("Cannot handle \"" + typeof ev.data + "\" websocket message.");
	}
};
function displayData(msgId, str, data) {
	msgsToWrite[msgId] = (str || data);
	for (var i = lastDisplayedId + 1; typeof msgsToWrite[i] !== 'undefined'; i++) {
		if (buffered) {
			addonTerminal.__pushToBuffer(msgsToWrite[i]);
		}
		else {
			addonTerminal.write(msgsToWrite[i]);
		}
		delete msgsToWrite[i];
		lastDisplayedId = i;
	}
}

This seems to be working great, the only concern I have is recoverability if the filereader fails to decode a chunk--with this change it'll buffer subsequent data till the end of time.

Thanks for your work on this fantastic software.

@jerch
Copy link
Member

jerch commented Jan 12, 2019

@markspeters I suggest to set the binary type to arraybuffer and avoiding the blob type at all. Mixing synchronous and asynchronous code at this stage unfortunately desyncs the websocket chunks, which are guaranteed to arrive in order by the protocol. Imho introducing another sync semantic on top of the websocket chunks (which already do this with frame ids) is a waste of bandwidth.
Since you already did the nasty work - how many positions in your ids are the chunks off if loaded as blob between sync string messages? I'd expect it to be off one event loop cycle only (means a sync string message is not more than one message ahead), at least for nodejs this is common pattern fixable with .nextTick.

@Tyriar Any thoughts on this? Is there a specific reason for blob support here, which I dont see? I think we should drop blob support completely (already removed it in my upcoming UTF8 support, which works fine without it).

@markspeters
Copy link
Author

markspeters commented Jan 14, 2019

Thanks @jerch: Just for context, I'm not using this for in-band syncing. Every message I'm sending, I'm sending the same way and synchronously. The backend is Tomcat's implementation of JSR 356 and in my case I'm transmitting these packets the exact same way, through the RemoteEndpoint.Basic's sendBinary method which synchronous. They arrive in order in the client (as far as Chrome debugger shows anyway), it's just the shorter message happens to be decoded (or maybe encoded?) as ArrayBuffer while the longer one as a Blob. I'll have to do some digging into the websocket spec and Tomcat's implementation to see if I have any control over this. Hopefully just setting the binaryType on the client side works as you suggest.

@markspeters
Copy link
Author

markspeters commented Jan 14, 2019

Hopefully just setting the binaryType on the client side works as you suggest.

Yep that was sufficient, thanks for saving me from my own workaround. Looks like the different versions of Chrome I was testing on defaulted to different binaryTypes. I must've been wrong about it giving me both ArrayBuffers and Blobs, it must've just been loading the later, smaller blob first.

I had taken the WS spec to mean that binaryType was only ever a hint, but that was talking about using it as a hint for storage characteristics. Given this is under the client's control, I get your argument about just dropping support for blob rather than fixing this issue. Maybe warn when attaching to a blob-binaryType websocket?

@Tyriar
Copy link
Member

Tyriar commented Jan 14, 2019

I don't really know anything about the blob binary type, @jerch do you think we should add a warning to code or just jsdoc?

@jerch
Copy link
Member

jerch commented Jan 15, 2019

Hmm, Im not that used to that code part, it just happens that I had to deal with this for the upcoming UTF8 support too. Had to read the spec myself to get an idea, what going on there. This is what the spec says:

  • Blob should be the default type for incoming binary messages
  • string should be the default type for incoming string messages
  • the binary type can be switched to ArrayBuffer by setting WebSocket.binaryType to "arraybuffer" on client side

The main difference between the two binary types is that an ArrayBuffer provides all needed primitives to access the data directly from JS while Blob is a blackbox from JS with arbitrary data. To get JS access to the data a Blob needs further actions, like async file reading.

Now on our xterm.js demo we have the following requirements:
Currently our data input for xterm.js is meant to be string only, means the websocket transport should be string. To get this the server must send the data as strings (our demo correctly sends strings from node-pty). Whenever the server sends binary, its gets set to Blob on client side, since we do not set the binaryType yet. @markspeters You might want to check if you can send string messages instead of binary data. If binary data was found, the attach addon currently tries to decode it to JS strings. This is error prone, as the binary data are UTF8 encoded and might miss bytes - decoding errors. Therefore I want to disable that completely, only strings are allowed as message types.

In a second step I want to reenable UTF8 binary data transport, but this time with our custom UTF8 decoder, that works streamlined and can correctly decode half transmitted unicode chars. To get direct access from JS Blob is a bad choice, therefore I want to force it to ArrayBuffer.

To sum this up:

  • Currently its a bad idea to send binary data at all, as the data might get scrambled. Always go with string messages. We should disable binary input and remove the binary fallback code in the attach addon.
  • Later on we can reenable binary input with UTF8, but only with ArrayBuffers as Blobs are more complicated.
  • We should clearly document the input ways with pros and cons, yeah.

Edit: Not sure yet how the zmodem addon will deal with this, I think it relies on arbitrary binary data being sent, thus a string or UTF8 only transport will fail. Hmm, this needs some further thinking.

@markspeters
Copy link
Author

@jerch As I understand it, your current concern w.r.t. ArrayBuffer is binary frames that are partitioned on a UTF-8 wide char, so it can't be decoded properly? I don't think that's a situation I'll hit given the nature of the backend I'm talking to but I understand the concern for the general case. Switching to text frames would be an option for us though.

Just doing some digging in the TextDecoder docs, looks like it does support decoding in chunks via the stream option.

const msg1 = new ArrayBuffer(3);
const msg2 = new ArrayBuffer(3);

new Uint8Array(msg1).set([65, 0xF0, 0x9F]);
new Uint8Array(msg2).set([0x98, 0x80, 66]);

const decoder = new TextDecoder("utf-8");
const output1 = decoder.decode(msg1, {stream: true});
const output2 = decoder.decode(msg2, {stream: true});

console.log(output1);
console.log(output2);

If you run this, the smiley will be correctly decoded as part of the second output. So maybe this concern would be mitigated by keeping the TextDecoder instance around and using the stream option?

@jerch
Copy link
Member

jerch commented Jan 20, 2019

@markspeters
Ah ok, was not aware of the stream flag. If you find some time maybe you can try out #1904. It introduces raw UTF8 support to xterm.js which omits several conversions on along the way and is way faster than the current string input.

@Tyriar
Copy link
Member

Tyriar commented Oct 7, 2019

@jerch I think we can close this one off?

@jerch
Copy link
Member

jerch commented Oct 7, 2019

Yes - this should not be the case anymore with v4. The code reponsible for reordering got removed (culprit was the file API with an event loop triggered read callback).

@jerch jerch closed this as completed Oct 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants