Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EIO: Getting `Tls Failure (Fatal MACMismatch) #457

Closed
bikallem opened this issue Dec 1, 2022 · 9 comments
Closed

EIO: Getting `Tls Failure (Fatal MACMismatch) #457

bikallem opened this issue Dec 1, 2022 · 9 comments

Comments

@bikallem
Copy link
Contributor

bikallem commented Dec 1, 2022

Somehow I am getting intermittent tls errors such as TLS failure: (Fatal MACMismatch) or TLS failure: (Fatal (RecordOverflow 26678)) when using tls-eio in my repo: https://github.com/bikallem/ocaml-dns/tree/eio-tls

This happens after I successfully establish tls connection - I can observe that I can do a successful tls handshake and can send application data over the tls connection. The first couple of data packets are sent and received okay however, it seems anything more than that seems to fail with the errors enumerated above. Could you please advise if I am using the tls-eio correctly?

@bikallem
Copy link
Contributor Author

bikallem commented Dec 2, 2022

I made another branch which uses eio-ssl instead of tls-eio and I am unable to reproduce the error I get with tls-eio. https://github.com/bikallem/ocaml-dns/tree/eio-ssl

/cc @talex5

Update: Here's the diff of the two branches (https://github.com/bikallem/ocaml-dns/compare/eio-tls...bikallem:ocaml-dns:eio-ssl?expand=1)

@talex5
Copy link
Contributor

talex5 commented Dec 2, 2022

Can't see anything obviously wrong (what are you using the ticket_cache for though?).

I suggest:

  1. Use tcpdump or similar to get a pcap dump of a case where it goes wrong.
  2. Make a test case that just replays that dump (will probably need to control the RNG too).
  3. Keep simplifying the test-case as long as it keeps failing.
  4. Turn on debug logging for TLS.

Maybe you're doing overlapping reads or overlapping writes somewhere? You could put a mutex around those operations to check.

Finally, does it work with the lwt+tls version of Dns?

@bikallem
Copy link
Contributor Author

bikallem commented Dec 2, 2022

Can't see anything obviously wrong (what are you using the ticket_cache for though?)

It is the same with or without.

@hannesm
Copy link
Member

hannesm commented Dec 2, 2022

Dear @bikallem, thanks for the report. Could you in a bit more detail elaborate to which host (IP/hostname and port number) you're trying to connect?

Does a connection to the same host using tls-lwt or tls-mirage lead to the same issues? Or is it only the tls-eio package that shows these issues? If the latter, would you mind to elaborate which versions of ocaml, eio, and tls you're using?

@bikallem
Copy link
Contributor Author

bikallem commented Dec 2, 2022

I went ahead and created a similar executable (ohost.exe) as the one in dns-client-eio. The error doesn't manifest in the lwt version. As verified above, it works correctly if tls-eio is replaced by eio-ssl usage.

  1. eio - https://github.com/bikallem/ocaml-dns/blob/eio-tls/eio/client/ohost.ml
  2. lwt - https://github.com/bikallem/ocaml-dns/blob/eio-tls/lwt/client/ohost.ml

Both versions connect to the same nameserver ip, port (https://github.com/bikallem/ocaml-dns/blob/eio-tls/eio/client/dns_client_eio.ml#L78) and https://github.com/bikallem/ocaml-dns/blob/eio-tls/lwt/client/dns_client_lwt.ml#L156

Does a connection to the same host using tls-lwt or tls-mirage lead to the same issues? Or is it only the tls-eio package that shows these issues? If the latter, would you mind to elaborate which versions of ocaml, eio, and tls you're using?

Yes, I only experience errors with tls-eio package(using version 0.15.5).
I am using master branch of eio (as I need this ocaml-multicore/eio#360) and OCaml 5.0.0~beta2.

@hannesm
Copy link
Member

hannesm commented Dec 2, 2022

Thanks for your investigation. Maybe the path forward is to set the log level of tls.tracing to debug, and compare what is done for lwt and eio, and what is different.

This was referenced Dec 12, 2022
@hannesm hannesm changed the title Getting `Tls Failure (Fatal MACMismatch) EIO: Getting `Tls Failure (Fatal MACMismatch) Dec 16, 2022
@talex5
Copy link
Contributor

talex5 commented Feb 17, 2023

I had a go at reproducing this using your eio-tls branch. It sometimes works and sometimes crashes with an end-of-file error for me. That seems to be in some custom parsing code that no longer exists (or has changed a lot) on your main eio branch.

Does this still happen with the current code? It would be good to have step-by-step instructions to see the error with the latest version if so. Perhaps a Dockerfile that builds and runs the test?

In any case, I suggest using Eio.Buf_read to parse things. That should ensure that data will be buffered correctly and handle EOF, at least. e.g. the current code has things like

    match Eio.Flow.single_read ctx.ns_connection ctx.recv_buf with
    | got ->
      let recv_data = append_recv_buf ctx got recv_data in
      handle_data recv_data
    | exception End_of_file ->

which seems unnecessarily low-level.

@bikallem
Copy link
Contributor Author

I attempted to reproduce the issue over last Christmas break but couldn't quite pinpoint the exact reproduction. I believe this bit seemed to trigger the issue intermittently.

            Eio.Fiber.first
              (fun () -> recv_packet ctx ctx.ns_connection request_id)
              (fun () -> Eio.Promise.await response_p)

https://github.com/mirage/ocaml-dns/blob/2fa2f980b44b04bdd8bf6f4ed10c8dbdd072979e/eio/client/dns_client_eio.ml#L290-L292

However, I couldn't quite pinpoint the exact issue since the actual error is intermingled with the eio error, Mutex poisioned, effect cancelled, etc. Which is also the reason I suggested that we only handle/catch the tls-eio specific exception in the Tls_eio.read function.

Does this still happen with the current code? It would be good to have step-by-step instructions to see the error with the latest version if so. Perhaps a Dockerfile that builds and runs the test?

I have since reworked the dns-client-eio PR to remove the use of Eio.Fiber.first and now the issue seems to not appear so far.

@bikallem
Copy link
Contributor Author

I really couldn't pin down the replication steps on this bug. I have now reworked the ocaml-dns PR which motivated this issue report. The issue no longer exists. So closing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging a pull request may close this issue.

3 participants