-
Notifications
You must be signed in to change notification settings - Fork 307
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for ETag and Last-Modified headers for summary and summary.sig #2205
Add support for ETag and Last-Modified headers for summary and summary.sig #2205
Conversation
One possible improvement to think about after this (if it gets merged) is that the separate HTTP requests to get That would mean a shift to using inline signatures to protect |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall idea makes sense to me. Though worth noting there was a suggestion from @alexlarsson recently to have deltas for summary files as well which I think can affect the discussion here. E.g. if we were to support summary deltas, then presumably they'd be in some form which makes querying for "new summary wrt what I have locally" cheap, which intersects with what's going on here.
Though support for cache semantics makes sense on its own too and is compatible with existing OSTree webservers today.
src/libostree/ostree-fetcher-curl.c
Outdated
* Wed, 21 Oct 2015 07:28:00 GMT | ||
*/ | ||
static GDateTime * | ||
parse_rfc2616_date_time (const char *buf, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Woah, is there really no helper function in any of glib, libsoup, or libcurl for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There’s soup_date_new_from_string()
, but it’s in libsoup and this code is in the libcurl backend, so I assume libsoup is not an option there. Have I misunderstood?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No you're right, I missed that this is specific to libcurl and not in common with both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another big thing to keep in mind is that when updating host systems at least ostree usually runs as unrestricted root (I'd eventually like to fix this) and things like the string parsing we're doing here are an attack surface and need careful auditing.
To be clear did you write this code from scratch or is it inspired by anything else?
Let's at least add some unit tests for this?
We've already copied in some libsoup code for URIs, maybe we could copy in the libsoup code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear did you write this code from scratch or is it inspired by anything else?
It’s written from scratch.
Let's at least add some unit tests for this?
I’ll have a go.
We've already copied in some libsoup code for URIs, maybe we could copy in the libsoup code?
Maybe, but it was rewritten recently, so is in basically the same position as this code (except more general and has support for multiple RFC date formats tied together in the same code). The older version of the code was tied to SoupDate
and also supported multiple date formats in one code path.
I think having a minimal in-house implementation makes sense, given that we only need to support the Last-Modified
header, which only uses one specific HTTP date format (RFC 2616). If we were supporting more headers with other formats then maybe things would be different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I’ve split the RFC 2616 date parsing code out into a separate file so that it can be built into a test program, and added a test program which gives 100% coverage of lines (and, I think, branches — although the lcov setup in ostree doesn’t enable branch coverage). How’s that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much for the added tests!
src/libostree/ostree-repo-pull.c
Outdated
g_autoptr(GBytes) etag_bytes = NULL; | ||
|
||
etag_bytes = glnx_fgetxattr_bytes (fd, "user.etag", NULL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: (here and elsewhere), coding style is declaration and initialization at the same time (or e.g. in the case of fd
above, declaration would be right before the openat
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So declarations not necessarily at the start of code blocks? I thought there were problems with doing that with autocleanup, where the autocleanup would happen but the initialisation might not (if not reached by the time the block is exited)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As long as you initialize it to the "null" value, it's fine. When mixing it with the goto out
pattern it gets tricky, but nowadays compilers will yell at you if you try to do something funky like jumping over an initialization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Tweaked a couple of declarations to move them to where they’re used, but most declarations in the PR are assigned by pointer so can’t be changed.
char *summary_etag; | ||
guint64 summary_last_modified; /* seconds since the epoch */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How necessary do you feel is ETag
support? Feels like Last-Modified
should be enough for our purposes, and it's nice that we don't have to dabble in user xattrs. But I'm not an expert in that field -- are there webservers out there which support ETag
but not Last-Modified
? (I'd expect the opposite.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think ETag
support is actually more important than Last-Modified
support because it doesn’t have race conditions around updates/checks in the same second, and doesn’t have issues with clock synchronisation. Both require some level of support on the server (typically, aiui, servers are configured to serve everything as if it’s not cacheable, as that’s the safest option). I would expect that all web servers would support both headers, though. They’ve been around a long time.
If xattr support isn’t available then ETags won’t work, but should degrade nicely, so that shouldn’t cause a problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to have a basic test for this where we check that we avoid pulling the summary. Something like just doing another ostree pull
and sanity-checking that the inode for the summary didn't change? (Since we always create a new file when caching.) Hmm, though on the ostree-trivial-httpd
side, briefly looking at SoupServer
, it seems like we'd have to implement that logic ourselves like this test here.
src/libostree/ostree-fetcher-curl.c
Outdated
* Wed, 21 Oct 2015 07:28:00 GMT | ||
*/ | ||
static GDateTime * | ||
parse_rfc2616_date_time (const char *buf, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No you're right, I missed that this is specific to libcurl and not in common with both.
src/libostree/ostree-repo-pull.c
Outdated
g_autoptr(GBytes) etag_bytes = NULL; | ||
|
||
etag_bytes = glnx_fgetxattr_bytes (fd, "user.etag", NULL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As long as you initialize it to the "null" value, it's fine. When mixing it with the goto out
pattern it gets tricky, but nowadays compilers will yell at you if you try to do something funky like jumping over an initialization.
src/libostree/ostree-fetcher-curl.c
Outdated
const char *end_ptr = NULL; | ||
gint saved_errno = 0; | ||
|
||
g_return_val_if_fail (n_digits == 2 || n_digits == 4, FALSE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused by the role n_digits
plays here. Doesn't look like it's actually used outside of this precondition check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops, good catch. It was used before I refactored the function a bit, and then I forgot to plumb it back in. Will update.
src/libostree/ostree-fetcher-curl.c
Outdated
if (!is_file && response == 304) | ||
{ | ||
/* Version on the server is unchanged from the version we have | ||
* cached locally; report this as an out-argument, a zero-length | ||
* response buffer, and no error. */ | ||
req->out_not_modified = TRUE; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional: should we check here that at least one of if_none_match
or if_modified_since
was provided? That way, if somehow a HTTP server returns 304 even though we didn't provide either of these, we still error out as before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion, will tweak things to do that (and in the libsoup implementation too).
Working on that now; will hopefully have something ready tomorrow. |
Updated. I’ve fixed a couple of bugs and added a unit test (although see its commit message: it doesn’t actually fail with the ETag/Last-Modified changes reverted). All my changes are as fixups for ease of review, and can be squashed later. |
src/libostree/ostree-fetcher-curl.c
Outdated
strncasecmp (buffer, "ETag: ", strlen ("ETag: ")) == 0) | ||
{ | ||
g_clear_pointer (&req->out_etag, g_free); | ||
req->out_etag = g_strstrip (g_strdup (buffer + strlen ("ETag: "))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
g_strstrip()
is dangerous here because if the buffer has leading whitespace we lose track of the real malloc pointer. I think you just want g_strchomp()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure? g_strstrip(x)
is just g_strchug(g_strchomp(x))
, and g_strchug()
doesn’t increment the input pointer, it uses memmove()
to shift the string contents down over any leading whitespace. The pointer is unchanged.
src/libostree/ostree-fetcher-curl.c
Outdated
* Wed, 21 Oct 2015 07:28:00 GMT | ||
*/ | ||
static GDateTime * | ||
parse_rfc2616_date_time (const char *buf, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another big thing to keep in mind is that when updating host systems at least ostree usually runs as unrestricted root (I'd eventually like to fix this) and things like the string parsing we're doing here are an attack surface and need careful auditing.
To be clear did you write this code from scratch or is it inspired by anything else?
Let's at least add some unit tests for this?
We've already copied in some libsoup code for URIs, maybe we could copy in the libsoup code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great overall! Really just minor things.
Could you squash the commits? We don't use the Homu bot here anymore which means that there is no autosquashing.
Add support in the soup and curl fetchers to send the `If-None-Match` and `If-Modified-Since` request headers, and pass on the `ETag` and `Last-Modified` response headers. This currently introduces no functional changes, but once call sites provide the appropriate integration, this will allow HTTP caching to happen with requests (typically with metadata requests, where the data is not immutable due to being content-addressed). That should reduce bandwidth requirements. Signed-off-by: Philip Withnall <[email protected]>
As `summary` and `summary.sig` aren’t immutable, HTTP requests to download them can be optimised by sending the `If-None-Match` and `If-Modified-Since` headers to avoid unnecessarily re-downloading them if they haven’t changed since last being checked. Hook them up to the new support for that in the fetcher. The `ETag` and `Last-Modified` for each file in the cache are stored as the `user.etag` xattr and the mtime, respectively. For flatpak, for example, this affects the cached files in `~/.local/share/flatpak/repo/tmp/cache/summaries`. If xattrs aren’t supported, or if the server doesn’t support the caching headers, the pull behaviour is unchanged from before. Signed-off-by: Philip Withnall <[email protected]>
This test would have actually passed before the summary file caching changes (in the previous few commits) were added, as the `summary.sig` essentially acted as the ETag for the summary file, and itself wasn’t updated on disk if it didn’t change when querying the server. Actually testing that the HTTP caching headers are working to reduce HTTP traffic would require test hooks into the pull code or the trivial-httpd server, neither of which I have the time to add at the moment. Signed-off-by: Philip Withnall <[email protected]>
This is basic support for the Last-Modified/ETag/If-Modified-Since/If-None-Match headers. It’s not high performance, and doesn’t support all of the related caching features (like the If-Match header, etc.). Signed-off-by: Philip Withnall <[email protected]>
This makes it testable, and increases its test coverage too 100% of lines, as measured by `make coverage`. Signed-off-by: Philip Withnall <[email protected]>
4aafdc1
to
0974a7f
Compare
All review comments addressed (or commented on) and all the fixups squashed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
src/libostree/ostree-fetcher-curl.c
Outdated
* Wed, 21 Oct 2015 07:28:00 GMT | ||
*/ | ||
static GDateTime * | ||
parse_rfc2616_date_time (const char *buf, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much for the added tests!
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: cgwalters, jlebon, pwithnall The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Looks like Travis is stuck, hmm. Not sure how to retrigger that one without another force push. Let's give it a bit and see if it comes back, otherwise we can override. |
See the commit messages. This adds support (in the soup and curl backends) for the
Last-Modified
/ETag
/If-None-Match
/If-Modified-Since
HTTP headers and hooks them up for requests forsummary
andsummary.sig
so they don’t have to be re-downloaded if unchanged from last time. In particular, this will shave 590B off each response forsummary.sig
in the cached (common) case.This PR is related to #2167 in that it’s a network efficiency improvement, but can be reviewed/merged independently.