-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Etag support in HTTP::StaticFileHandler #6145
Add Etag support in HTTP::StaticFileHandler #6145
Conversation
It's certainly a good idea to clean up and reorder some of the logic in that class. But I don't see the point in having a It's particularly odd that this PR adds support for In my opinion, |
Thanks for quick feedback! We can pack logic internally and still keep it extendable - I thought why not making it more configurable out of the box. As for using both simultaneously - I will give it another go; by (quickly :x) looking at rfc7232 before submitting that proposal I thought it made no sense because of:
But I guess we still have to deal with some older clients and support both at the same time on server side 🤔 good point! Will ping you hopefully this week for further discussion and I'm still hoping for more feedback ;). |
Yes, some clients or intermediaries might only use one of them so we should support both. And |
Another quick go, please lemme know wdyt - I'm still collecting feedback here (will rebase whole thing anyway) Note/Fun (?) fact: Implementation in Rails contradicts specification linked here, but I guess this should follow rfc 🙊 |
@@ -6,6 +6,8 @@ require "uri" | |||
class HTTP::StaticFileHandler | |||
include HTTP::Handler | |||
|
|||
ALLOWED_METHODS = ["GET", "HEAD"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ALLOWED_METHODS = %w(GET HEAD)
@@ -24,12 +26,13 @@ class HTTP::StaticFileHandler | |||
end | |||
|
|||
def call(context) | |||
unless context.request.method == "GET" || context.request.method == "HEAD" | |||
request_method = context.request.method |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Useless assign.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks really good overall 👍 I've added a few line comments about simple improvements.
A few notes regarding general design:
- I don't see a benefit from isolating the behaviour in an additional
Cache
type. These methods should just be (private) instance methods inStaticFileHandler
. They're of no use outside of that class. - Combining methods
etag?
andmodified_since?
inmatch?
would avoid lot's of boiler plate and would even make it better understandable what's going on. IMHO it doesn't make sense to separate these methods.
Support for a list of etags in If-None-Match
header and validating each of them would be great, but that can be added when the general design is settled (should be fairly easy).
@@ -6,7 +6,10 @@ require "uri" | |||
class HTTP::StaticFileHandler | |||
include HTTP::Handler | |||
|
|||
ALLOWED_METHODS = ["GET", "HEAD"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure why we need to expose this as a constant. But it should be a tuple if so.
if @fallthrough | ||
call_next(context) | ||
else | ||
context.response.status_code = 405 | ||
context.response.headers.add("Allow", "GET, HEAD") | ||
context.response.headers.add("Allow", ALLOWED_METHODS.join(", ")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generating that string dynamically is useless as the allowed methods are not gonna change. Better leave it as before and scrap ALLOWED_METHODS
.
response.status_code.should eq(200) | ||
response.headers["Last-Modified"].should eq(HTTP.rfc1123_date(File.info("#{__DIR__}/static/test.txt").modification_time)) | ||
response.body.should eq(File.read("#{__DIR__}/static/test.txt")) | ||
end | ||
end | ||
|
||
context "with If-None-Match header" do | ||
it "should return 304 Not Modified if header matches etag" do | ||
etag = %{W/"#{File.info("#{__DIR__}/static/test.txt").modification_time.epoch.to_s}"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You shouldn't generate a etag manually. How it's generated is irrelevant to this spec.
Just send a prior request to get the etag.
context "with both If-None-Match and If-Modified-Since headers" do | ||
it "ignores If-Modified-Since as specified in RFC 7232" do | ||
headers = HTTP::Headers.new | ||
etag = %{W/"#{File.info("#{__DIR__}/static/test.txt").modification_time.epoch.to_s}"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
end | ||
|
||
context "with both If-None-Match and If-Modified-Since headers" do | ||
it "ignores If-Modified-Since as specified in RFC 7232" do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this spec test if If-Modified-Since
is ignored? The date needs to be before modifcation time of the file for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because of current 1s padding that I did not touched I don't think it's correct, see spec
it "should return 304 Not Modified if file mtime is equal" do
update: ouh, I think I can see it now ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, it seems there is something off... We need more integration tests for this.
ref_response = handle HTTP::Request.new("GET", "/test.txt")
headers = HTTP::Headers.new
headers["If-Modified-Since"] = ref_response.headers["Last-Modified"]
response = handle HTTP::Request.new("GET", "/test.txt")
This should result in 304 Not Modified (both with and without this PR), but it doesn't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bad, this works as intended.
headers["If-None-Match"] = "some random etag" | ||
response = handle HTTP::Request.new("GET", "/test.txt", headers) | ||
response.status_code.should eq(200) | ||
response.headers["Etag"].should match(/W\/"\d+"$/) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Validation of the etag format and should be a separate spec.
response = handle HTTP::Request.new("GET", "/test.txt", headers) | ||
response.status_code.should eq(200) | ||
response.headers["Etag"].should match(/W\/"\d+"$/) | ||
response.body.should eq(File.read("#{__DIR__}/static/test.txt")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this expectation is necessary. Checking the status code should be enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually re-added body assertion after I removed it because of this comment 😅 as first (original) spec does it and I guess it won't hurt, right? 🤔
@@ -24,12 +26,13 @@ class HTTP::StaticFileHandler | |||
end | |||
|
|||
def call(context) | |||
unless context.request.method == "GET" || context.request.method == "HEAD" | |||
request_method = context.request.method | |||
unless ALLOWED_METHODS.includes?(request_method) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{"GET", HEAD"}.includes?(context.request.method)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should support OPTIONS
as well, but that's for another PR.
@@ -0,0 +1,48 @@ | |||
class HTTP::StaticFileHandler::Cache |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be a module since there are no instances for this type.
if_none_match = context.request.headers[HTTP_IF_NONE_MATCH]? | ||
return false unless if_none_match | ||
|
||
if_none_match == etag(file_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need to recompute the etag, it's already available as context.response.headers[HTTP_ETAG]
. Ditto below.
Thanks for comments guys! ❤️ Assuming this is acceptable way I will wrap it up (cleanup and address comments) @straight-shoota I tried to keep it in a single file/class as suggested - but damn it seemed cluttered - I will give it another go tho and let's decide one more important issues are addressed ;-) |
end | ||
|
||
private def self.etag(file_path) | ||
%{W/"#{File.info(file_path).modification_time.epoch.to_s}"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't need .to_s
here. In fact, it results in a additional, useless string allocation.
|
||
# According to RFC 7232: | ||
# A recipient must ignore If-Modified-Since if the request contains an If-None-Match header field | ||
return if_none_match == context.response.headers[HTTP_ETAG] if if_none_match |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion for better readable control flow:
# According to RFC 7232:
# A recipient must ignore If-Modified-Since if the request contains an If-None-Match header field
if if_none_match = context.request.headers[HTTP_IF_NONE_MATCH]?
if_none_match == context.response.headers[HTTP_ETAG]
elsif if_modified_since = context.request.headers[HTTP_IF_MODIFIED_SINCE]?
header_time = HTTP.parse_time(if_modified_since)
# ...
else
false
end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using case
would be even more readable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
case
by using somehow by using the implicit-object syntax ❓
Please move the methods from |
def self.match?(context : HTTP::Server::Context, file_path : String) : Bool | ||
# According to RFC 7232: | ||
# A recipient must ignore If-Modified-Since if the request contains an If-None-Match header field | ||
if if_none_match = context.request.headers[HTTP_IF_NONE_MATCH]? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
case
when if_none_match = context.request.headers[HTTP_IF_NONE_MATCH]?
# ...
when if_modified_since = context.request.headers[HTTP_IF_MODIFIED_SINCE]?
# ...
else
# ...
end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arguably reads better than:
if if_none_match = context.request.headers[HTTP_IF_NONE_MATCH]?
# ...
elsif if_modified_since = context.request.headers[HTTP_IF_MODIFIED_SINCE]?
# ...
else
# ...
end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to be honest by first glance I don't see how one is better than other (and vice versa) 😅 I thought at first with crystal's implicit-object syntax there might be some more sophisticated approach, maybe let's just leave it as-it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is clearly a case for if
. Yes, it could be written as case
but that's a bit different semantically. And it's more to type. Theoretically, you could rewrite every if ... elseif
as case ... when
, but what would be the point?
um, from |
Yes, I mean We could certainly generalize Currently, these methods are only internal implementation and as such, don't need isolated unit tests at all. |
632b72d
to
83ddbfd
Compare
Resolved all comments (I think) and rebased whole thing; gonna eat some breakfast and make sure I didn't messed smtg up, I don't trust myself 😅 again thanks for all the feedback! |
any ideas why edit: this #5890 ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- spec
add Etag header
should also test the etag is different when mtime changes. - Would you rewrite the other existing examples to use
initial_response
as well? Name suggestions:returns 304 Not Modified for younger than Last-Modified
andserves content for older than Last-Modified
. You can useHTTP.parse_time
to parse theLast-Modified
header for calculations.
@@ -21,29 +21,94 @@ describe HTTP::StaticFileHandler do | |||
response.body.should eq(File.read("#{__DIR__}/static/test.txt")) | |||
end | |||
|
|||
context "with header If-Modified-Since" do | |||
it "should add Etag header" do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a nit-bit: don't use should
. Just it "adds Etag header" do
is fine. Less noise.
I know, some of the existing examples in this file use should
but it doesn't add anything. Of course all specs describe something that should work. Feel free to remove all of them.
headers = HTTP::Headers.new | ||
headers["If-Modified-Since"] = HTTP.rfc1123_date(File.info("#{__DIR__}/static/test.txt").modification_time) | ||
headers["If-Modified-Since"] = initial_response.headers["Last-Modified"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe also add a expectation response.headers["Last-Modified"].should eq initial_response.headers["Last-Modified"]
later to ensure the header doesn't change between requests.
headers["If-Modified-Since"] = initial_response.headers["Last-Modified"] | ||
|
||
response = handle HTTP::Request.new("GET", "/test.txt", headers), ignore_body: true | ||
response.headers["Content-Type"]?.should eq(nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this needs to be a separate example. Just add this expectation to the previous one.
s/eq(nil)/be_nil/
end | ||
|
||
it "should serve file if file mtime is younger" do | ||
headers = HTTP::Headers.new | ||
headers["If-Modified-Since"] = HTTP.rfc1123_date(File.info("#{__DIR__}/static/test.txt").modification_time - 1.hour) | ||
response = handle HTTP::Request.new("GET", "/test.txt") | ||
response = handle HTTP::Request.new("GET", "/test.txt", headers) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add ignore_body: false
to make it explicit that this is expecting content.
headers = HTTP::Headers.new | ||
headers["If-None-Match"] = "some random etag" | ||
|
||
response = handle HTTP::Request.new("GET", "/test.txt", headers) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto ignore_body: false
initial_response = handle HTTP::Request.new("GET", "/test.txt") | ||
|
||
headers = HTTP::Headers.new | ||
headers["If-Modified-Since"] = HTTP.rfc1123_date(File.info("#{__DIR__}/static/test.txt").modification_time - 1.hour) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would also benefit from using Last-Modified
:
HTTP.parse_time(initial_response.headers["Last-Modified"]) - 1.hour
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but you would have to parse it back with rfc1123_date
(not mentioning that parse_time
returns Time?
, so not_nil!
would have to be added as well) - is it really worth it? 🤔
|
||
it "should serve a file if header does not match etag" do | ||
headers = HTTP::Headers.new | ||
headers["If-Modified-Since"] = HTTP.rfc1123_date(File.info("#{__DIR__}/static/test.txt").modification_time) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
response.status_code.should eq(304) | ||
end | ||
|
||
it "should serve a file if header does not match etag" do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add even If-Modified-Since is fresh
.
83ddbfd
to
64b23e5
Compare
end | ||
end | ||
|
||
private def etag(file_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should take a modification_time
so that we only call File.info(path).modification_time
once, this saves performance and avoids race conditions where the etag and last modified are different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Race condition doesn't matter, but performance and reusability 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could even move this as a property of Response
for even more reusability.
HTTP_IF_MODIFIED_SINCE = "If-Modified-Since" | ||
HTTP_IF_NONE_MATCH = "If-None-Match" | ||
HTTP_ETAG = "Etag" | ||
HTTP_LAST_MODIFIED = "Last-Modified" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these consts should be private, or maybe even just removed and inlined. We don't typically agressively extract literals in crystal.
5d540a0
to
c5fcce7
Compare
How about now? 🙏 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a small improvement, apart from that: 👍
# An exact comparison might be slightly off, so we add 1s padding. | ||
# Static files should generally not be modified in subsecond intervals, so this is perfectly safe. | ||
# This might be replaced by a more sophisticated time comparison when it becomes available. | ||
last_modified = modification_time(file_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe move this to the call
method and pass modification_time
to both add_cache_header
and cache_request?
to avoid another duplicate file stat.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point! rebased PR once again 😅
c5fcce7
to
2163d40
Compare
This is a great improvement over the previous |
Hi all
I was trying to extract some logic in Kemal - kemalcr/kemal#440 - it seems at this point static file handler is kinda hard to re-use as it's quite coupled - a lot of stuff is going on there at the moment.
Maybe injecting conditional into handle arguments might be a good idea? So you can basically plug-in anything you want there (you could also remove caching by injecting some class that will simply return false for
matches?
class method).It's just a totally random proposal / proof of concept (wip) - please let me know what you think about it so we can proceed with it / change the approach or simply reject it 💣
Keep up great work, I'm really enjoying having 'ruby experience' in a compiled language - it's truly awesome ❤️