Skip to content
This repository has been archived by the owner on Apr 22, 2023. It is now read-only.

HTTP request ends too early #6143

Closed
satazor opened this issue Aug 28, 2013 · 13 comments
Closed

HTTP request ends too early #6143

satazor opened this issue Aug 28, 2013 · 13 comments

Comments

@satazor
Copy link

satazor commented Aug 28, 2013

It looks like the request is firing the end event too early, even before the whole download ends. This happens only on unstable connections. I've detected this issue while investigating the causes of bower/bower#824.

var https = require('https');

var size = 0;
var total;

https.get('https://codeload.github.com/jashkenas/backbone/tar.gz/1.0.0')
.on('response', function (res) {
    total = Number(res.headers['content-length']);

    res.on('data', function (data) {
        size += data.length;
        console.log('Donwloaded ' + size + ' of ' + total);
    })
    .on('end', function () {
        console.log('end!');
        console.log('total: ', total);
        console.log('downloaded: ', size);
        console.log('percentage: ', Math.floor(size / total * 100) + '%');

        if (size === total) {
            console.log('ok');
        } else {
            console.log('not ok');
        }
    })
    .on('error', function (e) {
        console.error(e);
    });

})
.on('error', function (e) {
    console.error(e);
});

This simple script, without 3rd party code, is able to replicate the issue with the following NLC (network link conditioner) settings:

Downlink 
Bandwidth: 780 kbps,
Packets Dropped: 5%
Delay: 100 ms

Uplink: 
Bandwidth: 330 kbps
Packets Dropped: 1%
Delay: 100 ms

Output of the program with those NLC settings:

end!
total:  8502074
downloaded:  707977
percentage:  8%
not ok

So basically, end event gets fired too early. If I switch off NLC everything is fine:

end!
total:  8502074
downloaded:  8502074
percentage:  100%
ok

I was expecting the end event to be fired only when the download completes, or at very last, an error event.

@bnoordhuis
Copy link
Member

How is the connection terminated at the TCP level? (tcpdump or wireshark will tell you.) As you're not getting an error, I suspect that it's a regular FIN/ACK sequence?

@satazor
Copy link
Author

satazor commented Aug 28, 2013

@bnoordhuis I don't have the time to debug it as of now. If really necessary, I will do it in the weekend.

@bnoordhuis
Copy link
Member

'If really necessary' depends on how important you think it is that this gets looked into.

I have no easy way to replicate your set-up. I can set up a netem qdisc but that probably doesn't work quite like Apple's NLC so then we won't be testing the same thing.

On the subject of FIN/ACK: if that's indeed what gets sent over the wire (be it a virtual or a physical wire), then there is no bug, just business as usual.

@satazor
Copy link
Author

satazor commented Sep 1, 2013

screen shot 2013-09-01 at 5 49 10 pm

Indeed there's a FIN sent over the wire, with a fatal error:

[Reassembly error, protocol TCP: New fragment overlaps old data (retransmission?)]
Message: New fragment overlaps old data (retransmission?)
Severity level: Error
Group: Malformed

Wireshark detected that the FIN packet occurred because of an error, could node do the same and fire an error event?

@bnoordhuis
Copy link
Member

Indeed there's a FIN sent over the wire, with a fatal error

It's a protocol error but not necessarily a fatal one. Corrupted packets and retransmits happen all the time.

Apart from the PSH flag, it looks like regular connection termination. Not that the PSH flag is harmful, it's just mildly uncommon.

Wireshark detected that the FIN packet occurred because of an error, could node do the same and fire an error event?

Not if the operating system doesn't report it. Which it probably won't because, as mentioned, it's not a fatal error.

Sorry, but I don't see anything here that suggests that it's a node.js issue. Thanks for following up though.

@satazor
Copy link
Author

satazor commented Sep 1, 2013

@bnoordhuis I've tried to download with curl under the same NLC settings and I get the same behavior, but curl compares against the content-length to detect such issues:

curl https://codeload.github.com/jashkenas/backbone/tar.gz/1.0.0 > bleh.tar.gz
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
 12 8302k   12 1003k    0     0  14293      0  0:09:54  0:01:11  0:08:43 25325
curl: (18) transfer closed with 7474745 bytes remaining to read

I know that some servers might not send the content-length header, but for those that send we could add the same check and fire an error event? Or you see it implemented in an higher level module, like mikeal/request?

@bnoordhuis
Copy link
Member

Or you see it implemented in an higher level module, like mikeal/request?

I think so. The core http library is deliberately low-level. Not as low-level as I would like but that's another story.

Throwing errors on bad Content-Length headers would also be quite disruptive because it's not that uncommon. I think most browsers regard it as a size hint, nothing more.

@satazor
Copy link
Author

satazor commented Sep 5, 2013

@bnoordhuis after talking with an engineer of GitHub, I think that nodejs should be smarter in the way he handles the transfer:

Another thing worth mentioning is that chunked encoding should help you detect if the download finished abruptly. That is, if the connection closes in the middle of a chunk, that should be a real error. I've seen curl detect it, but perhaps not all http libraries are as smart.

Their servers are indeed sending chunked responses. Could node detect if the connection was closed in the middle of a chunk and fire an error in that situation?

@bnoordhuis
Copy link
Member

Could node detect if the connection was closed in the middle of a chunk and fire an error in that situation?

It does, it emits an 'aborted' event on the response object.

@satazor
Copy link
Author

satazor commented Sep 9, 2013

@bnoordhuis I've search the document for the abort/aborted event and couldn't find any. Is that for .abort() or a general event that can be used in the case I mentioned above?

@bnoordhuis
Copy link
Member

You're right, sorry. 'aborted' is an internal-ish event, you should also get the documented 'close' event.

@jfirebaugh
Copy link

If the aborted event is not a public API, then how is it possible to distinguish a truncated chunked download from a successful one? Won't both emit a close event?

@ghost
Copy link

ghost commented Dec 24, 2016

@jfirebaugh @satazor @bnoordhuis I know it's been a while, but to answer your questions and anyone else who finds this ticket via Google:

  • HTTP has only one proper mechanism to detect truncated downloads: The "Content-Length" header.
  • If you use chunked HTTP downloads, it is possible to detect if a download ends in the middle of a chunk, but your downloading library needs to be coded to take advantage of that method, and not all servers support chunked transfers.
  • If the file transmission ends prematurely (and isn't chunked), the HTTP client will simply interpret that as a closed connection and think that the transmission is complete. Because HTTP is a terrible, old protocol where a closed connection is synonymous with the "end of the data".
  • As an alternative to the "chunking" trick, someone could theoretically write a program which requests a "keep-alive" HTTP connection, performs a download, and then tries doing a read (such as a HEAD request on the same file) after the file is "complete". If the read fails, the connection has been dropped incorrectly. But nobody does this check, because it'd be an ugly hack.
  • The only protocol that fully protects you against dropped connections during file transfers is HTTPS (SSL/TLS): As a SECURITY feature, they REQUIRE that a close_notify message is sent when the connection is about to be closed naturally. This is to prevent connection truncation attacks. But it also inadvertently protects us against truncated downloads. Because if we don't get a "close_notify", and we try to read from the socket and fail to read, then we know that the download is broken. Libraries like cURL implement this check properly, and make it trivial to retry the download when the "SSL read error" happens.
  • Furthermore, and most importantly of all, SSL/TLS mostly uses AES which has a 16-byte "bulk encryption" block size, so if we get a partial AES chunk it means we can't decrypt it, and our download library would detect that as a truncated download too, regardless of whether "close_notify" is used or not. :-) That's safe against 15 in every 16 closed connections (it wouldn't work if the truncation happens exactly after the proper end of an AES block).

So no, there's no reliable way to protect against truncation of plain-old HTTP. But on the other hand, HTTPS is on the rise and has two mechanisms that protect very well against truncation. And things like chunked transfers can be used on top of HTTPS to get even more security.

Reference: https://tools.ietf.org/html/rfc5246#section-7.2.1

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants