Handle fragmented client responses #63

starbelly · 2023-04-10T02:30:29Z

This commit resolves #62. Specifically, the number of chunks a response from a server will be pushed up the stack to the socket is non-deterministic, therefore we must loop until we've got response that is exactly an header, some data, and a trailer or a header, some data, trailer, and some trailing junk.

Note that the latter should never happen with a sane MLLP server implementation, but we can't disregard the possibility either. With that said, it's best to return it all to the caller to let them deal with it vs making assumptions in regards to whether it's safe to truncate or not.

lib/mllp/client.ex

starbelly · 2023-04-10T20:08:07Z

test/client_test.exs

+      )
+    end
+
+    test "when replies are fragmented and the last fragment is not received" do


This is unfortunately a brittle test. If we keep this approach, then we might want to tag it as such and not run it by default.

lib/mllp/client.ex

starbelly · 2023-04-10T23:25:00Z

@vikas15bhardwaj One other change we may want to consider is changing the default reply_timeout from :infinity to 60 or 120 seconds. I think that's definitely adequate, we will still accept :infinity, but I think 60 seconds as an example makes for a better debug experience.

lib/mllp/client.ex

vikas15bhardwaj · 2023-04-11T13:36:32Z

test/client_test.exs

+      |> expect(:recv, fn ^socket, 0, 1 -> {:ok, ack_frag2} end)
+
+      {:ok, client} =
+        Client.start_link(address, port, tcp: MLLP.TCPMock, use_backoff: true, reply_timeout: 2)


On this test case, if we add another expect or may be stub of :recv, then it may not fail if we don't timeout before 3-4 calls to recv are made.

I think I've resolved this, if I understand correctly, but let me know.

vikas15bhardwaj · 2023-04-11T13:41:59Z

@vikas15bhardwaj One other change we may want to consider is changing the default reply_timeout from :infinity to 60 or 120 seconds. I think that's definitely adequate, we will still accept :infinity, but I think 60 seconds as an example makes for a better debug experience.

We may break existing applications if they are relying on default infinity timeouts if we change this. I see your point, but applications today have option to change timeout if required anyway.

starbelly · 2023-04-11T14:16:42Z

lib/mllp/client.ex

+          <<@header, _rest::binary>> ->
+            recv_ack(
+              state,
+              maybe_decrement_timeout(timeout, time_spent + elapsed),


Had to adjust how we're doing this. If a client supplies a 5 millisecond timeout (as an example), and the ack/nack comes in 6 segments; in the previous implementation we would have bailed after about 20 microseconds.

This version should be more accurate.

I am wondering, if we should keep reply_timeout in microsecond for calculation, and only convert it before :recv call. Wouldn't this make your calculation more simple?
case recv_ack(state, options1.reply_timeout) do call should be case recv_ack(state, System.convert_time_unit(options1.reply_timeout), :millisecond, :microsecond)) do

I think I'm following you 🤔 I'm not sure which part it would simplify though. Maybe it's the timeout - div(spent, 1000) ?

If so, we'd have to do that anyway, the reason exemplified is :

iex(1)> micros = System.convert_time_unit(5000, :millisecond, :microsecond) 5000000 iex(2)> micros - 1001 4998999 iex(3)> updated = micros - 1001 4998999 iex(4)> System.convert_time_unit(updated, :microsecond, :millisecond) 4998

So, we lose 2 ms vs 1, that big of decrement in the case of small reply_timeout values is a big difference I believe.

But I may be off here and you might be pointing at a simplification I'm not seeing or thinking of.

This is the simplification, I was talking about. I don't seem to loose time in it. Every time we try to recv and get a response, we reduce our timeout in microsecond from elapsed time. if timeout becomes zero anytime, we timeout otherwise carry on.

defp recv_ack(state, timeout) do recv_ack(state, timeout, <<>>) end defp recv_ack(_state, time_left, _buffer) when is_integer(time_left) and time_left <= 0 do {:error, :timeout} end defp recv_ack(state, time_left, buffer) do {res, elapsed} = do_recv(state, 0, time_left) case res do {:ok, reply} -> new_buf = buffer <> reply check = byte_size(new_buf) - 3 case new_buf do <<@header, _ack::binary-size(check), @trailer>> -> {:ok, new_buf} <<@header, _rest::binary>> -> recv_ack(state, time_left - elapsed, new_buf) _ -> {:error, :invalid_reply} end {:error, _} = err -> err end end

iex(8)> micros = System.convert_time_unit(5000, :millisecond, :microsecond) 5000000 iex(9)> elapsed = 1000 1000 iex(10)> updated = micros - elapsed 4999000 iex(11)> System.convert_time_unit(updated, :microsecond, :millisecond) 4999

RIght, it's when the elapsed is say 1001, or even 1 microsecond. I did adjust to pass around microseconds though as it's easier to follow :)

how about if we calculate elapsed time in millisecond itself. Worst case, we will over weight a bit but would never bail out early

Based on the discussion, leaving this with microsecond to keep it more precise in timeout

starbelly · 2023-04-11T14:18:14Z

We may break existing applications if they are relying on default infinity timeouts if we change this. I see your point, but applications today have option to change timeout if required anyway.

I'm not worried about that in that we are not 1.0, thus you should expect breaking changes, that said we could bump the minor segment of the version too.

vikas15bhardwaj

starbelly requested a review from vikas15bhardwaj April 10, 2023 02:30