Fix DTLS fragment exceeds MTU size if the certificate is large #1343

ibc · 2024-02-22T11:32:56Z

Details

UPDATE: Despite the PR description looks like "I don't know what I'm doing" finally it makes sense.

I cannot believe that what I've done works. So let me first explain what the problem is (which is perfectly explained in issue #1100), I hope I'm right:

As commented in #1100 (comment), the problem we had is:

When it comes to send DTLS data to the remote client, DtlsTransport::SendPendingOutgoingDtlsData() in mediasoup calls read = BIO_get_mem_data(...), and obtained read maybe higher than our DtlsMtu = 1350 (for example in DTLS Server Hello message if our certificate is very large). Let's say that read is 3000 bytes which is higher than 1350 so problems here.
But we enable all the MTU related stuff in OpenSSL DTLS in DtlsTransport so we know (and this is proven, see DTLS fragment exceeds MTU size if the certificate is large. #1100 (comment)) that those 3000 bytes contain N fragments of a DTLS packet (in this case Server Hello).
But we don't know how long each fragment is so we cannot read and send them separately. And we don't know it because read = BIO_get_mem_data(...) returns the total size as explained above.

What does this PR?

In DtlsTransport::SendPendingOutgoingDtlsData() we call read = BIO_get_mem_data(...) as usual.
If read <= DtlsMtu then nothing changes, we notify the listener with the full data.
If read > DtlsMtu then we split the total data into fragments of max DtlsMtu bytes and notify the parent separately with each of them.
So the parent (WebRtcTransport) will send each fragment into a separate UDP datagram.
And for whatever reason it WORKS!

DtlsTransport::SendPendingOutgoingDtlsData() | data to be sent is longer than DTLS MTU, fragmenting it [len:6987, DtlsMtu:1350] +248ms
DtlsTransport::SendPendingOutgoingDtlsData() | sending fragment [offset:0, len:1350] +0ms
DtlsTransport::SendPendingOutgoingDtlsData() | sending fragment [offset:1350, len:1350] +0ms
DtlsTransport::SendPendingOutgoingDtlsData() | sending fragment [offset:2700, len:1350] +0ms
DtlsTransport::SendPendingOutgoingDtlsData() | sending fragment [offset:4050, len:1350] +0ms
DtlsTransport::SendPendingOutgoingDtlsData() | sending fragment [offset:5400, len:1350] +0ms
DtlsTransport::SendPendingOutgoingDtlsData() | sending fragment [offset:6750, len:237] +0ms

An interesting fact is that, if I remove SSL_OP_NO_QUERY_MTU flag in the call to SSL_CTX_set_options(), then things don't work and the client/browser keeps sending DTLS Client Hello forever, meaning that the DTLS data it's receiving from mediasoup is invalid.

So this is magic. How is possible that this works? AFAIU OpenSSL is generating valid DTLS fragments but it doesn't expose them to the application separately (see the problem description above). So does this PR work by accident??? Maybe not! Probably OpenSSL is generating DTLS fragments of exactly our given DtlsMtu size (which BTW makes sense since we call SSL_set_mtu(this->ssl, DtlsMtu) and DTLS_set_link_mtu(this->ssl, DtlsMtu)), so if we split the total DTLS data to be sent into fragments of DtlsMtu exact sizes, then we are indeed taking valid DTLS fragments, so we can send them into separate UDP datagrams to the client. IS IT??? DOES IT MAKE ANY SENSE??? IMHO it does, but I did NOT expect this to work.

NOTE: I've tried to split (and send to client in individual UDP datafgrams) the big DTLS data into chunks of size 1 byte bigger or smaller than the configured DtlsMtu, and things fail. this confirms that the rationale exposed above is right so we are good.

Full rationale and one concern

Here we have data containing one or more DTLS messages with total size higher than our DtlsMtu value. These DTLS messages are, in fact, DTLS message fragments (various fragments conform a DTLS message). Each DTLS message fragment must be sent in a single UDP datagram or TCP framed message (although various DTLS message fragments can be sent together because they are framed). So the question is: How to split this big data we have here into valid DTLS message fragments?

Here the trick:

We called SSL_CTX_set_options() with SSL_OP_NO_QUERY_MTU (among other flags).
We called SSL_set_mtu(this->ssl, DtlsMtu).
We called DTLS_set_link_mtu(this->ssl, DtlsMtu).

So we know that OpenSSL will split DTLS messages bigger than DtlsMtu into DtlsMtu bytes long chunks (of course the last chunk maybe smaller). So assuming that (and it behaves that way), we can follow the reverse logic here and split this big data into chunks of DtlsMtu (except the last one, of couse), so it's guaranteed that each chunk will contain a valid DTLS message fragment. So we notify the parent by passing each chunk to it, so it will encapsulate it into a single UDP datagram or framed TCP message.

There is an scenario in which this logic would fail:

DtlsMtu is 1350 bytes.
OpenSSL generates 2 sequential DTLS messages to be sent to client.
First message is 2500 bytes long. So OpenSSL splits it into 2 DTLS message fragments:
1. First DTLS message fragment is 1350 (DtlsMtu) bytes long.
2. Second DTLS message fragment is 1150.
Second message is 500 bytes long (so no fragments needed).
So in SendPendingOutgoingDtlsData() we have 3000 bytes to send in total and there are 3 DTLS messages:
1. The first fragment of the first message: 1350 bytes.
2. The second fragment of the first message: 1150 bytes.
3. The second DTLS message: 500 bytes.
Following the above logic, SendPendingOutgoingDtlsData() will split those 3000 bytes as follows:
1. First 1350 bytes: Here we are exactly taking the first fragment of the first message, so all good.
2. Next 1350 bytes: Here we are reading the 1150 bytes of the second fragment of the first message plus the first 200 bytes of the second message. This is NOT good.
3. Next 300 bytes: The remaining 300 bytes of the second message. This is not good.
Client will reject the second and third UDP packets since they do not contain one or more valid DTLS full messages or DTLS message fragments.

However, by experimenting I see that OpenSSL doesn't generate messages like those and, anyway, we may only need to send DTLS messages bigger than DtlsMtu during the handshake and only if our certificate is big, and in this scenario the problematic above sequence doesn't happen.

Fixes #1100 ### Details I cannot believe that what I've done works. So let me first explain what the problem is (which is perfectly explained in issue #1100), I hope I'm right: As commented in #1100 (comment), the problem we had is: - When it comes to send DTLS data to the remote client, `DtlsTransport::SendPendingOutgoingDtlsData()` in mediasoup calls `read = BIO_get_mem_data(...)`, and obtained `read` maybe higher than our `DtlsMtu = 1350` (for example in DTLS Server Hello message if our certificate is very large). Let's say that `read` is 3000 bytes which is higher than 1350 so problems here. - But we enable all the MTU related stuff in OpenSSL DTLS in `DtlsTransport` so we **know** (and this is proven, see #1100 (comment)) that those 3000 bytes contain **N fragments** of a DTLS packet (in this case Server Hello). - But we don't know how long each fragment is so we cannot read and send them separately. And we don't know it because `read = BIO_get_mem_data(...)` returns the **total** size as explained above. What does this PR? - In `DtlsTransport::SendPendingOutgoingDtlsData()` we call `read = BIO_get_mem_data(...)` as usual. - If `read <= DtlsMtu` then nothing changes, we notify the listener with the full data. - If `read > DtlsMtu` then we split the total `data` into fragments of max `DtlsMtu` bytes and notify the parent separately with each of them. - So the parent (`WebRtcTransport`) will send each fragment into a separate UDP datagram. - And for whatever reason it **WORKS!** ``` DtlsTransport::SendPendingOutgoingDtlsData() | data to be sent is longer than DTLS MTU, fragmenting it [len:6987, DtlsMtu:1350] +248ms DtlsTransport::SendPendingOutgoingDtlsData() | sending fragment [offset:0, len:1350] +0ms DtlsTransport::SendPendingOutgoingDtlsData() | sending fragment [offset:1350, len:1350] +0ms DtlsTransport::SendPendingOutgoingDtlsData() | sending fragment [offset:2700, len:1350] +0ms DtlsTransport::SendPendingOutgoingDtlsData() | sending fragment [offset:4050, len:1350] +0ms DtlsTransport::SendPendingOutgoingDtlsData() | sending fragment [offset:5400, len:1350] +0ms DtlsTransport::SendPendingOutgoingDtlsData() | sending fragment [offset:6750, len:237] +0ms ``` An interesting fact is that, if I remove `SSL_OP_NO_QUERY_MTU` flag in the call to `SSL_CTX_set_options()`, then things don't work and the client/browser keeps sending DTLS Client Hello forever, meaning that the DTLS data it's receiving from mediasoup is invalid. So this is magic. How is possible that this works? AFAIU OpenSSL is generating valid DTLS **fragments** but it doesn't expose them to the application separately (see the problem description above). So does this PR work by accident??? Maybe not! Probably OpenSSL is generating DTLS fragments of **exactly** our given `DtlsMtu` size (which BTW makes sense since we call `SSL_set_mtu(this->ssl, DtlsMtu)` and `DTLS_set_link_mtu(this->ssl, DtlsMtu)`), so if we split the total DTLS data to be sent into fragments of `DtlsMtu` **exact** sizes, then we are **indeed** taking **valid** DTLS fragments, so we can send them into separate UDP datagrams to the client. **IS IT???** **DOES IT MAKE ANY SENSE???** IMHO it does, but I did **NOT** expect this to work.

jmillan · 2024-02-22T11:41:38Z

wow

jmillan · 2024-02-22T11:43:15Z

It certainly makes sense 👍

ibc · 2024-02-22T11:52:26Z

NOTE: It looks that BoringSSL has a more elegant solution for this (I may be wrong):

// BIO_mem_contents sets \|*out_contents\| to point to the current contents of
// \|bio\| and \|*out_len\| to contain the length of that data. It returns one on
// success and zero otherwise.
OPENSSL_EXPORT int BIO_mem_contents(const BIO *bio,
                                    const uint8_t **out_contents,
                                    size_t *out_len);

// BIO_get_mem_data sets \|*contents\| to point to the current contents of \|bio\|
// and returns the length of the data.
//
// WARNING: don't use this, use \|BIO_mem_contents\|. A return value of zero from
// this function can mean either that it failed or that the memory buffer is
// empty.
OPENSSL_EXPORT long BIO_get_mem_data(BIO *bio, char **contents);

My understanding is that BIO_mem_contents() may tell the app how many valid DTLS fragments are in the BIO, but I'm not sure. And I don't care much because there is no BIO_mem_contents() in OpenSSL anyway.

gkrol-katmai · 2024-02-22T14:40:00Z

If you can't believe it works - it may not keep working forever. This seems like a brittle solution.

I think I do understand the problem now, and why the previous attempt worked. SSL will internally write the packets with the correct size, but then all appends them into the memory bio, and we again lose this information.

The previous fix intercepted the write operations, and when implemented carefully can be made correct. It just needs to clear the buffer after the intercepted write.

An alternative solution is to implement our own BIO, which OpenSSL will then will call 'write' on. I did find some recommendations on the internet not to implement your own, but it merits some investigation. A big advantage would be is that we lose a memory copy, and we can take the data directly from the buffer OpenSSL uses and send it out over the network. Performance improvements will probably be slim, but it may save some space in the processor cache. It also doesn't use anything in a way it was not designed for.

gkrol-katmai · 2024-02-22T14:50:15Z

The very clear comment in the code explains the problem:

			// However, by experimenting I see that OpenSSL doesn't generate messages
			// like those and, anyway, we may only need to send DTLS messages bigger
			// than DtlsMtu during the handshake and only if our certificate is big,
			// and in this scenario the problematic above sequence doesn't happen.

If OpenSSL ever changes the way they fragment packets, this code will break. And they never made any promises on how they would fragment packets, only that the packets are <= than the MTU.

I'm not a reviewer, I would recommend not to merge this PR as-is. It would be sad if we at some point update OpenSSL and the support for large certificates breaks.

If there's a test-suite that would catch this problem then we could give it a try, but I don't think Mediasoup has such a test?

ibc · 2024-02-22T14:58:15Z

The very clear comment in the code explains the problem:
			// However, by experimenting I see that OpenSSL doesn't generate messages
			// like those and, anyway, we may only need to send DTLS messages bigger
			// than DtlsMtu during the handshake and only if our certificate is big,
			// and in this scenario the problematic above sequence doesn't happen.
If OpenSSL ever changes the way they fragment packets, this code will break. And they never made any promises on how they would fragment packets, only that the packets are <= than the MTU.

True.

I'm not a reviewer

Actually you are doing a perfect and helpful review.

I would recommend not to merge this PR as-is. It would be sad if we at some point update OpenSSL and the support for large certificates breaks.

Or worse, it may happen that other things break not only related to the scenario with a big certificate.

I definitely think that we have to explore the approach of having a custom buffer in which OpenSSL writes. Just a concern here:

We don't need to send a DTLS message immediately once OpenSSL has written it into such a buffer. for example, when the certificate is small, mediasoup sends a single UDP datagram containing the DTLS Server Hello message, the DTLS Certificate message, a DTLS Server Key Exchange message, etc, all them into the same UDP datagram (because they fit into it). If we send them in separate UDP datagrams that's worse, so whatever solution we came with, must respect that current behavior.

ibc · 2024-02-22T14:59:13Z

BTW I've written a Full rationale and one concern section in the PR description. It's a copy&paste of the inline doc.

And I'm turning this PR into draft PR since I prefer to explore better and safer alternatives.

ibc · 2024-02-22T15:41:44Z

I definitely think that we have to explore the approach of having a custom buffer in which OpenSSL writes. Just a concern here:

We don't need to send a DTLS message immediately once OpenSSL has written it into such a buffer. for example, when the certificate is small, mediasoup sends a single UDP datagram containing the DTLS Server Hello message, the DTLS Certificate message, a DTLS Server Key Exchange message, etc, all them into the same UDP datagram (because they fit into it). If we send them in separate UDP datagrams that's worse, so whatever solution we came with, must respect that current behavior.

Auto reply:

This should not be a big deal. Whatever bio/buffer we use in a new approach, we don't need to rely on buffer write operations of OpenSSL in that buffer. In current code (in v3) we don't do it. We know when we should check that bio/buffer and call SendPendingOutgoingDtlsData() in various legitimate places. So the very same if we use another custom or whatever bio/buffer.

ibc · 2024-02-22T15:44:33Z

I'm closing this PR as per rationale above. If further discussions should take place about the DTLS MTU issue let's do them in the associated and open issue #1100.

ibc requested review from jmillan and nazar-pc February 22, 2024 11:32

ibc mentioned this pull request Feb 22, 2024

OpenSSL send buffer growing without bounds in DtlsTransport (worker memory leak) #1340

Closed

Use MS_WARN_TAG

82efb1a

jmillan approved these changes Feb 22, 2024

View reviewed changes

ibc added 3 commits February 22, 2024 13:05

Add a comment

35e9316

Oppps

8e13c4a

Add proper inline documentation

0dc7dce

Improve inline doc

60103c4

ibc marked this pull request as draft February 22, 2024 14:59

ibc mentioned this pull request Feb 22, 2024

DTLS fragment exceeds MTU size if the certificate is large. #1100

Closed

ibc closed this Feb 22, 2024

ibc deleted the fix-dtls-mtu-issue-1100 branch March 8, 2024 09:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix DTLS fragment exceeds MTU size if the certificate is large #1343

Fix DTLS fragment exceeds MTU size if the certificate is large #1343

ibc commented Feb 22, 2024 •

edited

Loading

jmillan commented Feb 22, 2024

jmillan commented Feb 22, 2024

ibc commented Feb 22, 2024

gkrol-katmai commented Feb 22, 2024

gkrol-katmai commented Feb 22, 2024

ibc commented Feb 22, 2024

ibc commented Feb 22, 2024

ibc commented Feb 22, 2024

ibc commented Feb 22, 2024

Fix DTLS fragment exceeds MTU size if the certificate is large #1343

Fix DTLS fragment exceeds MTU size if the certificate is large #1343

Conversation

ibc commented Feb 22, 2024 • edited Loading

Details

Full rationale and one concern

jmillan commented Feb 22, 2024

jmillan commented Feb 22, 2024

ibc commented Feb 22, 2024

gkrol-katmai commented Feb 22, 2024

gkrol-katmai commented Feb 22, 2024

ibc commented Feb 22, 2024

ibc commented Feb 22, 2024

ibc commented Feb 22, 2024

ibc commented Feb 22, 2024

ibc commented Feb 22, 2024 •

edited

Loading