Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle fragmented client responses #63

Merged
merged 1 commit into from
Apr 12, 2023
Merged

Conversation

starbelly
Copy link
Contributor

This commit resolves #62. Specifically, the number of chunks a response from a server will be pushed up the stack to the socket is non-deterministic, therefore we must loop until we've got response that is exactly an header, some data, and a trailer or a header, some data, trailer, and some trailing junk.

Note that the latter should never happen with a sane MLLP server implementation, but we can't disregard the possibility either. With that said, it's best to return it all to the caller to let them deal with it vs making assumptions in regards to whether it's safe to truncate or not.

lib/mllp/client.ex Outdated Show resolved Hide resolved
lib/mllp/client.ex Outdated Show resolved Hide resolved
lib/mllp/client.ex Outdated Show resolved Hide resolved
)
end

test "when replies are fragmented and the last fragment is not received" do
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unfortunately a brittle test. If we keep this approach, then we might want to tag it as such and not run it by default.

lib/mllp/client.ex Outdated Show resolved Hide resolved
@starbelly
Copy link
Contributor Author

@vikas15bhardwaj One other change we may want to consider is changing the default reply_timeout from :infinity to 60 or 120 seconds. I think that's definitely adequate, we will still accept :infinity, but I think 60 seconds as an example makes for a better debug experience.

lib/mllp/client.ex Outdated Show resolved Hide resolved
|> expect(:recv, fn ^socket, 0, 1 -> {:ok, ack_frag2} end)

{:ok, client} =
Client.start_link(address, port, tcp: MLLP.TCPMock, use_backoff: true, reply_timeout: 2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On this test case, if we add another expect or may be stub of :recv, then it may not fail if we don't timeout before 3-4 calls to recv are made.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I've resolved this, if I understand correctly, but let me know.

@vikas15bhardwaj
Copy link
Contributor

@vikas15bhardwaj One other change we may want to consider is changing the default reply_timeout from :infinity to 60 or 120 seconds. I think that's definitely adequate, we will still accept :infinity, but I think 60 seconds as an example makes for a better debug experience.

We may break existing applications if they are relying on default infinity timeouts if we change this. I see your point, but applications today have option to change timeout if required anyway.

<<@header, _rest::binary>> ->
recv_ack(
state,
maybe_decrement_timeout(timeout, time_spent + elapsed),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had to adjust how we're doing this. If a client supplies a 5 millisecond timeout (as an example), and the ack/nack comes in 6 segments; in the previous implementation we would have bailed after about 20 microseconds.

This version should be more accurate.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering, if we should keep reply_timeout in microsecond for calculation, and only convert it before :recv call. Wouldn't this make your calculation more simple?
case recv_ack(state, options1.reply_timeout) do call should be case recv_ack(state, System.convert_time_unit(options1.reply_timeout), :millisecond, :microsecond)) do

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm following you 🤔 I'm not sure which part it would simplify though. Maybe it's the timeout - div(spent, 1000) ?

If so, we'd have to do that anyway, the reason exemplified is :

iex(1)> micros = System.convert_time_unit(5000, :millisecond, :microsecond)
5000000
iex(2)> micros - 1001
4998999
iex(3)> updated = micros - 1001
4998999
iex(4)> System.convert_time_unit(updated, :microsecond, :millisecond)
4998

So, we lose 2 ms vs 1, that big of decrement in the case of small reply_timeout values is a big difference I believe.

But I may be off here and you might be pointing at a simplification I'm not seeing or thinking of.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the simplification, I was talking about. I don't seem to loose time in it. Every time we try to recv and get a response, we reduce our timeout in microsecond from elapsed time. if timeout becomes zero anytime, we timeout otherwise carry on.

  defp recv_ack(state, timeout) do
    recv_ack(state, timeout, <<>>)
  end

  defp recv_ack(_state, time_left, _buffer)
       when is_integer(time_left) and time_left <= 0 do
    {:error, :timeout}
  end

  defp recv_ack(state, time_left, buffer) do
    {res, elapsed} = do_recv(state, 0, time_left)

    case res do
      {:ok, reply} ->
        new_buf = buffer <> reply
        check = byte_size(new_buf) - 3

        case new_buf do
          <<@header, _ack::binary-size(check), @trailer>> ->
            {:ok, new_buf}

          <<@header, _rest::binary>> ->
            recv_ack(state, time_left - elapsed, new_buf)

          _ ->
            {:error, :invalid_reply}
        end

      {:error, _} = err ->
        err
    end
  end
iex(8)> micros = System.convert_time_unit(5000, :millisecond, :microsecond)
5000000
iex(9)> elapsed = 1000                                                     
1000
iex(10)> updated = micros - elapsed                                         
4999000
iex(11)> System.convert_time_unit(updated, :microsecond, :millisecond)      
4999

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RIght, it's when the elapsed is say 1001, or even 1 microsecond. I did adjust to pass around microseconds though as it's easier to follow :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about if we calculate elapsed time in millisecond itself. Worst case, we will over weight a bit but would never bail out early

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the discussion, leaving this with microsecond to keep it more precise in timeout

@starbelly
Copy link
Contributor Author

We may break existing applications if they are relying on default infinity timeouts if we change this. I see your point, but applications today have option to change timeout if required anyway.

I'm not worried about that in that we are not 1.0, thus you should expect breaking changes, that said we could bump the minor segment of the version too.

Copy link
Contributor

@vikas15bhardwaj vikas15bhardwaj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fragmented Ack
2 participants