Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FileResponse failed to prepare proper headers for Range requests #2844

Closed
spcharc opened this issue Mar 17, 2018 · 14 comments
Closed

FileResponse failed to prepare proper headers for Range requests #2844

spcharc opened this issue Mar 17, 2018 · 14 comments

Comments

@spcharc
Copy link
Contributor

spcharc commented Mar 17, 2018

Long story short

Please describe your problem and why the fix is important.

Users may pause their downloads of large static resources.
Many download managers including those browser-builtin ones check for Content-Range headers. Missing this header would cause downloads to fail.
I checked the source code and found out static resources are handled by FileResponse in web_fileresponse.py. But {0}.format(title)

Expected behaviour

What is the behaviour you expect?

Assume you have a BIG file big_file
big_file.size() == 10000 (9.77kb)
A user downloaded 500 bytes (0-499) and connection lost.
And then the user resumes download with request: "Range:bytes=500-"
Expected header: "Content-Range:bytes 500-9999/10000" <- this one is missing

Actual behaviour

What's actually happening?

The downloads just failed immediately after clicking resume button.
Actually aiohttp handles Range requests very well. HTTP 206 and Content-Length are both correctly returned to users.

Steps to reproduce

Please describe steps to reproduce the issue.
If you have a script that does that please include it here within
markdown code markup

Check the web_fileresponse.py source code, or try a download like this

    from aiohttp import web
    import asyncio
    loop = asyncio.get_event_loop()
    app = web.Application(loop = loop)
    app.add_static('/', folder_containing_large_file, show_index = True)
    web.run_app(app, host = '0.0.0.0', port = 4321)

goto http://your_addr:4321/
click the big_file

Your environment

Describe the environment you have that lead to your issue.
This includes aiohttp version, OS, proxy server and other bits that
are related to your case.

 IMPORTANT: aiohttp is both server framework and client library.
 For getting rid of confusing please put 'server', 'client' or 'both'
 word here.

Server
Python/3.6.4
aiohttp/3.0.9

Temporary workaround

in file web_fileresponse.py:
line 228 == "if count:"
in line 227 I added:
self.headers[hdrs.ACCEPT_RANGES] = 'bytes'
(Accept-Ranges header, which is to indicate that the server supports Range requests, is also missing. But I found out that usually web clients just ignore it. They still send Range requests even though this is missing. Maybe an Accept-Ranges:none header is needed to prevent Range requests?)
in line 229 I added:
if start is not None: self.headers[hdrs.CONTENT_RANGE] = f'bytes {start}-{start+count-1}/{file_size}'
(After this line is added, download managers work like a charm)
These two lines could be a temporary workaround. It has problems left listed as below

Other things to check

Some download managers use this header:

    If-Range:xxxxxx
    Range:bytes=yyyy-zzzz

And some use this (including Chrome browser):

    If-Unmodified-Since:xxxxxx
    Range:bytes=yyyy-zzzz

Both are not properly handled by FileResponse. Nothing is checked and HTTP 206 is returned with requested partial content
As far as I know, what is expected:

if condition:
    return HTTP 206, Content-Length=len(requested_part), Content-Range=start-end/full_len
else:
    return HTTP 200, Content-Length=full_len

Besides, multipart/byterange is not supported
A request with Range:bytes=0-10,20-30 will cause HTTP 416

(EOF. It's my first time using github, please tell me if anything wrong. English is not my first language, please point out if any mistake. Sorry about that)

Edit:
Only If-Range works in that way, but If-Unmodified-Since is like this:

if condition:
    return HTTP 206, Content-Length=len(requested_part), Content-Range=start-end/full_len
else:
    return HTTP 412 Precondition Failed,
@asvetlov
Copy link
Member

Thank you for report.
I agree that Content-Range / Accept-Ranges should be supported.
The issue is waiting for a champion for Pull Request :)

@spcharc
Copy link
Contributor Author

spcharc commented Mar 17, 2018

What is a pull request? I've seen it a couple of times on github though.
You can fix it nice and easy as you have already made the handler for If-Modified-Since xxxxxx (expected HTTP 304 if unmodified else HTTP 200. FileResponse can handle this occasion already.)
The only problem would be multipart/byterange response for multiple ranges which is really rare. Most of the download managers don't use that kind of Range header

@asvetlov
Copy link
Member

Pull Request is a way to making a patch for merging.
Quick googling suggest https://gist.github.com/Chaser324/ce0505fbed06b947d962 or https://yangsu.github.io/pull-request-tutorial/
There are a lot of resources describing how GitHub works with pull requests, what is workflow etc.
Please note: the change should contain tests for covering a new functionality.

@spcharc
Copy link
Contributor Author

spcharc commented Mar 17, 2018

Thank you for your explanation. I'm trying to figure out how to use git command now.
I've checked this page
https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/If-Unmodified-Since
and found out if condition is False If-Unmodified-Since expects HTTP 412
(unlike If-Range expects HTTP 200 when False)

@hubo1016
Copy link
Contributor

If-Unmodified-Since should be processed together with If-Modified-Since, It is not required to be used together with range requests. If-Range could be processed together too, if not matched, remove the range header.

I don't have time for this now. If no one is working on it, I can do It later.

@hubo1016
Copy link
Contributor

If ETag is generated, client will use If-Match instead.

There is an implementation I created years ago:

https://github.com/hubo1016/vlcp/blob/master/vlcp/service/web/static.py#L65

It did not process If-Match header, But works for normal range requests. Only ETag is supported, Last-Modified is not returned.

@spcharc
Copy link
Contributor Author

spcharc commented Mar 18, 2018

Thanks for reply. aiohttp does provide Last-Modified header and it doesn't have Etag support yet. So If-Match and If-None-Match handler isn't necessary as clients won't make requests with them. (They won't get any ETag beforehand)
ETag is better than Last-Modified on these occasions:

  1. a file whose last-modified is changed a lot without changing its content.
  2. a file with its content being changed serveral times in a single second. (minimum unit for Last-Modified header is sec)
  3. you cannot get the accurate last-modified from a file.

Usually we don't change static resources a lot. And generating strong ETags requires some caching and hashing work. It's not cheap. Especially when dealing with large files.

@hubo1016
Copy link
Contributor

@spcharc It is not that different. In fact, usually we generate a ETag based on: modification time, inode, file size. They all come from fstat(). HMAC (sha256, md5) are NOT used. At least Apache and nginx uses this kind of ETag.

@spcharc
Copy link
Contributor Author

spcharc commented Mar 18, 2018

I see. Thank you!
Actually I just want a fix because this problem is really annoying in my daily use
If ETag doesn't make my computer slower then I'm totally OK with it
I'm trying to make a pull request in about 2 days in this project, providing a simple fix for this issue. I'm learning to how to pull, feel free to close it

@asvetlov
Copy link
Member

Fixed by #2847

@spcharc
Copy link
Contributor Author

spcharc commented Mar 21, 2018

Today I thought about why ETag is better than date check again.
I searched on the web and I found out people mentioned that files extracted from compressed files would have a Last-Modification-Date in the past (before they were packed). If I extract files into the /static/ folder and overwrite something with the same name, it would cause a problem.
Besides, some people said sometimes a file of the same address could be different files on the server (depends on user settings)
For instance:
If user setting of color theme on a website changed from ‘red’ to ‘green’, then ‘/pics/background.jpg’ may changed from actually returning ‘/pics/red/backgroud.jpg’ to ‘/pics/green/backgroud.jpg’. If-Modified-Since would cause problem in this case, and ETag will work as expected
Maybe ETag support would be a necessary one?
But I don’t know how to implement a ETag support. (If someone share me the ETag algorithm I’m happy to help)
Fortunately we have @hubo1016 here who knows the implementation. We could expect a patch from him. He said that ETag would not impact performance, and does not involve hashing. I believe it would be a good choice to have it.

@hubo1016
Copy link
Contributor

@spcharc Currently I don't have enough time, but you can look at an implementation I wrote and tested before:
https://github.com/hubo1016/vlcp/blob/7d22c818a3056fa4265aaf9c309e63357483e439/vlcp/service/web/static.py#L139-L143

I've also modified it to a suitable version (removing Python 2.x compatibility)

def safehex(v):
    h = hex(v)[2:]
    return h

def _createetag(stat_info):
    etag = 'aiohttp-' + safehex(stat_info.st_ino) + '-' + safehex(int(stat_info.st_mtime)) + '-' + safehex(int(stat_info.st_size))
    # return etag.encode('ascii') # if bytes are wanted
    return etag

Call os.stat() on the file path to get the f_stat structure, use it to generate the ETag.

Whether to support ETag or not, and how to implement it is a decision of design. It should be discussed in a separated issue. And adding ETag should be a feature request rather than a bug fix which is not very urgent.

@spcharc
Copy link
Contributor Author

spcharc commented Mar 21, 2018

@hubo1016 Thank you for your detailed explanation and suggestion. I will open another issue to discuss it.

@lock
Copy link

lock bot commented Oct 28, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a [new issue] for related bugs.
If you feel like there's important points made in this discussion, please include those exceprts into that [new issue].
[new issue]: https://github.com/aio-libs/aiohttp/issues/new

@lock lock bot added the outdated label Oct 28, 2019
@lock lock bot locked as resolved and limited conversation to collaborators Oct 28, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants