Skip to content
This repository has been archived by the owner on May 30, 2023. It is now read-only.

Get amount of transfered bytes #10156

Closed
ariya opened this issue Jul 2, 2011 · 31 comments
Closed

Get amount of transfered bytes #10156

ariya opened this issue Jul 2, 2011 · 31 comments

Comments

@ariya
Copy link
Owner

ariya commented Jul 2, 2011

[email protected] commented:

There is no way to reliably get the amount of transfered bytes for a request.

bodySize is not available for responses with stage==end and the content-length header is not very reliable, especially as it seems to be unset for requests with content-encoding "gzip".

I guess the bodySize has to be summed up for each bunch of data received and made available in a respose with stage==end.

Disclaimer:
This issue was migrated on 2013-03-15 from the project's former issue tracker on Google Code, Issue #156.
🌟   9 people had starred this issue at the time of migration.

@ariya
Copy link
Owner Author

ariya commented Jul 3, 2011

[email protected] commented:

 

 
Metadata Updates

  • Label(s) removed:
    • Type-Defect
  • Label(s) added:
    • Type-Enhancement
  • Milestone updated: FutureRelease (was: ---)
  • Status updated: Accepted

@marcelduran
Copy link
Contributor

[email protected] commented:

Any reason why Content-Length header isn't available for Content-Encoding:gzip responses?

While issue 158 is still open there's no way to get both compressed and uncompressed sizes of gzip responses, therefore netsniff.js example that generates HAR file is a bit misleading:

https://github.com/ariya/phantomjs/blob/master/examples/netsniff.js#L51

According to HAR spec (http://www.softwareishard.com/blog/har-12-spec/#content), content.size:
"... should be equal to response.bodySize if there is no compression and bigger when the content has been compressed."

@marcelduran
Copy link
Contributor

[email protected] commented:

Another gzip/raw content issue:

By running:
phantomjs netsniff.js http://search.yahoo.com

The generated HAR shows that the main html response headers contains Content-Encoding: gzip and the bodySize is 12726.

However by running curl with compression it gets different result:

curl search.yahoo.com -H "Accept-Encoding:gzip" | wc -c
4328

And without compression it gets similar size for what phantomjs is returning:

curl search.yahoo.com | wc -c
12120

@strtok
Copy link

strtok commented Jun 12, 2013

I see this was migrated to 'feature enhancement', but I think this should be considered a bug. Anyone using the HAR output from netsniff.js are seeing uncompressed bytes only, and are getting an inaccurate representation of actual bytes transferred.

Is this data not easily accessible from QT?

@sveisvei
Copy link

+1 on this, any suggestion where the extra bytes are comming from?

@fwebdev
Copy link

fwebdev commented Nov 8, 2013

For me all bytesizes on CSS/JS Files are shown significant smaller then tey are in reallity (According to Chorme Dev Tools and Firebug)
Also comparing to the gzip Files size they are shown too small.

Imagesizes are all shown correct. Anybody else has that kind of Problem?

@zackw
Copy link
Contributor

zackw commented Apr 19, 2015

Seems to still be an issue in 2.0. I get the impression Qt/Webkit changes might be needed?

@djberriman
Copy link

I beleive if you are talking to a chunking server content-length is not set, instead the size of each chunk is passed before the data itself and when a size of zero is returned the resource is complete. So that may explain why content-length is not present sometimes.

Looking at networkaccessmanager.cpp NetworkAccessManager::handleStarted() sets the bodySize to reply->size(); NetworkAccessManager::handleFinished does not set the bodySize to presumably it is left as is and is the size of the content (when not chunking) or the first chunk.

QTNetworkReply has a downloadProgress signal which returns bytesReceived and bytesTotal. Perhaps that could be used.

NetworkAccessManager::handleFinished could set the bodySize to the content-length where it is available.

Its a pity there does not appear to be a signal for each chunk (unless downloadProgress provides that) as it would then be possible to determine the size downloaded correctly by simply adding the chunksize to bodySize

@djberriman
Copy link

I did some more research and it appears QT must be removing the content-length header when gzip is used. I did the same request via telnet and via phantomjs, note chunking is not in use.

telnet response:-

Cache-Control: private
Content-Type: text/html; charset=utf-8
Content-Encoding: gzip
Vary: Accept-Encoding
Server: Microsoft-IIS/8.0
Set-Cookie: ASP.NET_SessionId=onq34pudvbwazeh04ksylpfs; path=/; HttpOnly
X-AspNetMvc-Version: 4.0
X-AspNet-Version: 4.0.30319
X-Powered-By: ASP.NET
X-Frame-Options: SAMEORIGIN
Date: Wed, 29 Apr 2015 14:54:17 GMT
Content-Length: 22767

phantom response:-

Cache-Control = private
Content-Type = text/html; charset=utf-8
Content-Encoding = gzip
Vary = Accept-Encoding
Server = Microsoft-IIS/8.0
Set-Cookie = ASP.NET_SessionId=e02yvkniwvolblo31qyt42ia; path=/; HttpOnly
X-AspNetMvc-Version = 4.0
X-AspNet-Version = 4.0.30319
X-Powered-By = ASP.NET
X-Frame-Options = SAMEORIGIN
Date = Wed, 29 Apr 2015 14:51:42 GMT

It would appear QT is for some reason removing the header.

@djberriman
Copy link

Changenetworkaccessmanager.cpp
data["bodySize"] = reply->size();
to
data["bodySize"] = reply->header(QNetworkRequest::ContentLengthHeader);
This then means that when Content-Length is passed bodySize is correct.
It won't work (but then neither does the current code) when chunking is in use or the Content-Length is not passed by QT such as when gzip is used. Disabling gzip in the 2nd case works round that issue.

@djberriman
Copy link

from what I can see size() is just the size of the QbyteArray.......

@djberriman
Copy link

for gzip you need to set the header yourself to accept gzip as there is a bug in QT

https://bugreports.qt.io/browse/QTBUG-41840

Content-Length is then returned unfortunately you then run accross bug https://forum.qt.io/topic/2308/content-encoding-gzip-with-qt-webkit/9 and the content is not decompressed.

@gmetais
Copy link

gmetais commented Apr 30, 2015

Great digging work @djberriman! Go on!
Thousands of people are supporting you!

👍 👍 👍

@djberriman
Copy link

QT does indeed specifically remove the content-length header on gzip data

void QHttpNetworkReplyPrivate::removeAutoDecompressHeader()
{
// The header "Content-Encoding = gzip" is retained.
// Content-Length is removed since the actual one send by the server is for compressed data
QByteArray name("content-length");
QList<QPair<QByteArray, QByteArray> >::Iterator it = fields.begin(),
end = fields.end();
while (it != end) {
if (qstricmp(name.constData(), it->first.constData()) == 0) {
fields.erase(it);
break;
}
++it;
}

}

@djberriman
Copy link

From what I can see from the QT source code it may well be worth using the QTNetworkReply downloadProgress signal which returns bytesReceived and bytesTotal. I believe this will also mean chunked data will work correctly as it will fire for each chunk.

@djberriman
Copy link

I appear to have a fix for this not sure how to submit it so I will work on that in a moment.

Basically phantomjs is not trapping one of the emits from QT so the size returned is that of the first read. We need to add another stage as well as 'start' and 'end' which I have called 'data'. If you cater for this in your onResourceReceived function and add up the res.bodySize returned each time it is triggered for a particular resource (End will return 0) then you will have the true size of the content. This should I believe work regardless of conent-length being passed, gzip or chunking. Do not rely on Content-Length.

replace handleStarted() in networkaccessmanager.cpp with the following code.

void NetworkAccessManager::handleStarted()
{
QNetworkReply reply = qobject_cast<QNetworkReply>(sender());
if (!reply)
return;

QVariantList headers;
foreach (QByteArray headerName, reply->rawHeaderList()) {
    QVariantMap header;
    header["name"] = QString::fromUtf8(headerName);
    header["value"] = QString::fromUtf8(reply->rawHeader(headerName));
    headers += header;
}

QVariantMap data;
if (!m_started.contains(reply)) {
  m_started += reply;
  data["stage"] = "start";
}
else {
  data["stage"] = "data";
}
data["id"] = m_ids.value(reply);
data["url"] = reply->url().toEncoded().data();
data["status"] = reply->attribute(QNetworkRequest::HttpStatusCodeAttribute);
data["statusText"] = reply->attribute(QNetworkRequest::HttpReasonPhraseAttribute);
data["contentType"] = reply->header(QNetworkRequest::ContentTypeHeader);
data["bodySize"] = reply->size();
data["redirectURL"] = reply->header(QNetworkRequest::LocationHeader);
data["headers"] = headers;
data["time"] = QDateTime::currentDateTime();

emit resourceReceived(data);

}

@sveisvei
Copy link

This is awesome @djberriman

@djberriman
Copy link

Just be aware the total size returned appears to be the uncompressed size not the content-length when gzip is being used, ran a test allowing gzip and one not allowing gzip and got same results.

@ktilcu
Copy link

ktilcu commented Jun 16, 2015

@djberriman Any thoughts on getting the gzip sizes?

@atwenzel
Copy link

@djberriman Thanks so much for this fix, this is exactly what I need for my project.

Can anyone give a general idea of the changes that should be made to the onResourceReceived function, especially in the context of the netsniff.js example (https://github.com/ariya/phantomjs/blob/master/examples/netsniff.js) I've built phantomjs with this fix but I'm a little unsure how to implement it in a script. Thanks!

EDIT: I seem to have solved my issue. For anyone else with as little phantomjs experience as I have who finds this thread, in the above example, you can change

page.onResourceReceived = function (res) {
if (res.stage === 'start') {
page.resources[res.id].startReply = res;
}
if (res.stage === 'end') {
page.resources[res.id].endReply = res;
}
};

to

page.onResourceReceived = function (res) {
    if (res.stage === 'start') {
        page.resources[res.id].startReply = res;
    }
    if (res.stage === 'data') {
        page.resources[res.id].startReply.bodySize += res.bodySize;
    }
    if (res.stage === 'end') {
        page.resources[res.id].endReply = res;
    }
};

And it should work with @djberriman's change.

@tufandevrim
Copy link

@ariya @djberriman what's the resolution on this one?

@djberriman
Copy link

@tufandevrim Just waiting for @ariya to put it in the main line

erikdubbelboer added a commit to erikdubbelboer/phantomjs that referenced this issue Dec 27, 2015
@EFF
Copy link

EFF commented Mar 11, 2016

@ariya @djberriman ... did we finally merged this one in 2.1.1 ? Fix looks good to me.

@vargasj
Copy link

vargasj commented Mar 30, 2016

Has this been solved? Thanks.

@adeelraza
Copy link

+1

@djberriman
Copy link

onResourceReceived function should read more like:-

if (res.stage == 'start') {
urlRequestedBytes[res.id] = res.bodySize;
}
else {
if (res.bodySize != undefined) {
urlRequestedBytes[res.id] += res.bodySize;
}
}

During my testing I found both 'data' and 'end' could return a size depending on whether chunking is in use and that it can also be returned as undefined. To get the correct size in all cases you need to add the value returned in bodySize in each 'start','data' and 'end'.

@djberriman
Copy link

Just a quick update on content-length with encoded response (gzip). The lack of a content-length header was due to a feature of QT whereby they physically removed the header if it was compressed. Following proof of the bug/feature and some discussion the code will now be removed from QT that does this which means content-length will always be passed if returned from the server (chunking servers for instance don't return a length).

@stephanebachelier
Copy link

@djberriman for gzipped response you will probably have no header content-length as the content will be stream which you can verify if there is the header "Transfer-Encoding: chunked".
If the content has already been gzipped before (cache, disk, ...), the server will set the content-length header as it will know the length of the gzip archive.

@jsut
Copy link

jsut commented Aug 25, 2016

@djberriman with regards to the content length, the current version of QT will emit the content length header though 'downloadMetaData', but i'm not convinced the value of the contentLength header is really the best thing to use if you actually want the amount of bytes transferred, that omits the size of the header, which if you have a lot of cookies can be significant, especially across all the requests required to render a web page.

It seems like using downloadProgress, which you mentioned earlier might be a better approach, depending on what your use case is. Better yet would be if the QT library had something like reply->bytes_transferred. based on the documentation of downloadProgress[1] though, it does seem like that is the best approach. Though I think QT removing the contentLength header is kind of dumb too.

[1] http://doc.qt.io/qt-5/qnetworkreply.html#downloadProgress

@tomgallagher
Copy link

Strangely I'm in need of the content length header only. What's the state of play on this? Has this been resolved in a later version of Phantom. I'm using 2.1.1

@abbasharoon
Copy link

+1

@ghost ghost closed this as completed in 545b03c Jan 8, 2018
This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests