Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DTLS error when using 4096 key/cert #252

Closed
saghul opened this issue Jun 4, 2015 · 7 comments
Closed

DTLS error when using 4096 key/cert #252

saghul opened this issue Jun 4, 2015 · 7 comments

Comments

@saghul
Copy link
Contributor

saghul commented Jun 4, 2015

I'm getting this:

[1675940553] DTLS timeout on component 1 of stream 1, retransmitting
[WARN] [1675940553] The DTLS stack is trying to send a packet of 2236 bytes, this may be larger than the MTU and get dropped!

After switching to a 4096 key to avoid #251

Eventually all components timeout and fail.

@saghul
Copy link
Contributor Author

saghul commented Jun 4, 2015

I checked and 2048 has the same problem. 1024 certs work well.

@lminiero
Copy link
Member

lminiero commented Jun 4, 2015

That's a well known issue, and has been discussed several times on the google group. The reason for this is that the DTLS stack in OpenSSL does not fragment packets that exceed the MTU when needed, when a BIO is used instead of UDP directly.

Normally, the DTLS stack should try sending the "huge" packet, and when the timeout fires (because the message never reached the other side), fragment the original packet in smaller ones and send those instead. This apparently only works when you use the DTLS stack over UDP directly, that is, when OpenSSL has more control over the transport. When a BIO is used, as in Janus because we need to handle the transport ourselves (libnice), this never happens, and as such the message containing the too large certificate is dropped somewhere in the network and never reaches the destination, thus leading to a handshake failure.

I tried investigating ways on how to fix this and force the right behaviour somehow, but never managed to get it to work as expected. As such, the only solution as of now is to rely on "smaller" certificates. Hopefully someone will find the proper solution in the near future: not sure, for instance, if BoringSSL handles that properly, as I never managed to use that as a stack in place of OpenSSL.

I was documenting this guideline in an additional .md file in the certs folder, also to account for the feedback in #251. It will basically say that yes, as of now, certificates must be 1024 bits. Any additional text or clarification (or even example if you have any) is more than welcome!

@saghul
Copy link
Contributor Author

saghul commented Jun 4, 2015

That's a well known issue, and has been discussed several times on the google group. The reason for this is that the DTLS stack in OpenSSL does not fragment packets that exceed the MTU when needed, when a BIO is used instead of UDP directly.

Oh, sorry about that! I looked through the open issue and didn't find anything related. Will search the group next time!

I was documenting this guideline in an additional .md file in the certs folder, also to account for the feedback in #251. It will basically say that yes, as of now, certificates must be 1024 bits. Any additional text or clarification (or even example if you have any) is more than welcome!

Cool, I'll have a look once it lands!

@lminiero
Copy link
Member

lminiero commented Jun 4, 2015

Just to add some more details to what I explained in my previous posts, the DTLS stack in OpenSSL does indeed take care of fragmenting the packets according to what is assumed to be the MTU (1472 by default). The problem is that the mem BIO ignores that fragmentation info completely, and so, when you do an BIO_read, makes available at the application the whole message anyway. This results in the whole buffer being passed to nice_agent_send, which means it's just as not fragmenting anything. You can verify this by using, e.g., a 4096 bits certificate, and capture the DTLS traffic with Wireshark: you'll see that the message is recognized as composed of not only multiple messages, but also fragments.

My guess is that the mem BIO simply isn't smart enough to inspect the actual messages being transported: it probably doesn't care if it's DTLS, TLS or whatever else, and just acts as an opaque transport, which means that when the internal stack writes the fragmented packets in a bunch, that's what you get when you get the pending data to send. Not sure if this means we'll have to inspect the payload ourselves, e.g., do a BIO_read, process the packet to see if there are fragments (length+offset), and if so send each of them separately through libnice. This might probably do it, although it sounds a bit silly that the application is required to do so, especially considering that the application is not assumed to be aware of the protocol specifics in the first place (that's why you rely on a library usually).

@lminiero
Copy link
Member

lminiero commented Jun 5, 2015

I asked about this on the OpenSSL mailing list, and I already received useful feedback:

https://mta.openssl.org/pipermail/openssl-users/2015-June/001503.html

They basically confirm that the mem BIO has not enough knowledge to handle this, specifically as to datagram semantics, for instance. A suggestion they made is to write a BIO filter that wraps the mem BIO in order to handle fragmentation automatically. I'll try to do that ASAP.

lminiero pushed a commit that referenced this issue Jun 5, 2015
@lminiero
Copy link
Member

lminiero commented Jun 5, 2015

@saghul can you check if #254 works for you?

@saghul
Copy link
Contributor Author

saghul commented Jun 5, 2015

Great, will check it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants