-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compression Fairing in contrib crate #550
Conversation
contrib/src/compression.rs
Outdated
for enc in headers.into_iter() | ||
.filter(|x| x.name() == "Content-Encoding" && x.value() != to_delete) { | ||
let header = enc.clone(); | ||
println!("{:?}", header); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this debugging print?
Sorry for the println!, I have removed it. Thanks for the finding @messense! Regards. |
The error in the build seems to be caused by an error compiling TestOpts for Travis, it is not related to the last commit or the pull request:
Regards. |
The build has been fixed. Regards. |
First, I love the idea of a fairing for compression support and would probably use one if it was merged. I haven't looked at the whole PR comprehensively, but there is at least one very serious bug: the fairing uses a single Reading the On a separate note, some projects might want a choice of whether or not to support brotli, gzip, or other compression methods, either at compile time with feature selection or at runtime with different configs. |
Hello @jebrosen and thanks for your comment. Firstly, I don't understand why I didn't view that the request is available on the on_response function, my first idea was using only on_response but then I found that the Accept-Encoding is only in the request and then I added the on_request function without noticing that the request is available on the on_response function 😖😖😖😖... my fault. I expect fixing it today. I also found useful adding features to choose the compression method, but then I found that brotli is not supported in all browsers and that it is only supported in encrypted (https) connections, so gzip must be always supported. Then, currently the unique useful option would be ignoring brotli compression method because you want to use always gzip. As the compression method chosen is in cascade (if brotli is supported use brotli, else if gzip is supported use gzip, etc.) this adds too much complexity to the fairing, so I decided it does not worth. As a workaround, you can force gzip in your responses adding the Content-Encoding header, but then you are downloading and including in your executable brotli fairing without using it. However, I will study the features option while fixing the on_response function. Thanks and regards. |
Hello. The accept headers are check in the on_response function now. I have found a easy manner of separated brotli_compression and gzip_compression features and they have been implemented too. compression feature remains available to use all available compression methods. Tests pass and I have checked it in an own project with compression, gzip_compression and brotli_compression features. Thanks and regards. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An early, incomplete review.
In general, the code is far too complicated for what it's doing, and it won't be accepted unless it's significantly simplified. There are also significantly more allocations than necessary. I don't think we need any non-compression-related allocation here. I started to indicate ideas for simplifying the code, but unfortunately I've run out of time. Still, I hope it's helpful.
Finally, the code appears to be checking that the response includes a Content-Encoding
header to actually perform the compression. This seems pretty unintuitive. The user already attached the fairing, so clearly they want compression. The fairing should just perform the compression whenever the client allows it via the Accept-Encoding
header. If necessary, you can add configuration to the fairing so that the user can select which additional content-types they don't want the fairing to perform compression on.
contrib/src/compression.rs
Outdated
} | ||
|
||
fn on_response(&self, request: &Request, response: &mut Response) { | ||
let accept_headers: Vec<&str> = request.headers().get("Accept-Encoding").collect(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line and the series of booleans below are unnecessary.
The first thing this should do is check whether this is an image. The ContentType
header is cached in the request, so you can do this really easy and correctly with:
// We don't compress images, so bail early.
if request.content_type().top() == "image" {
return;
}
Then we simply want to check whether br
or gzip
are accepted content-encoding and perform GZIP or BR compression if so, preferring to perform brotli compression if it's accepted. This is can be done simpler as well:
let accepted_encodings = request.headers()
.get("Accept-Encoding")
.flat_map(|e| e.split(","));
#[cfg(feature = "brotli_compression")]
if accepted_encodings.any(|e| e == "br") {
// Do brotli compression and return.
}
#[cfg(feature = "brotli_compression")]
if accepted_encodings.any(|e| e == "gzip") {
// Do GZIP compression and return.
}
contrib/src/compression.rs
Outdated
let mut compressed = Cursor::new(Vec::<u8>::new()); | ||
let mut params = brotli::enc::BrotliEncoderInitParams(); | ||
params.quality = 2; | ||
let content_type = headers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this trying to do? You can get the raw content type string with request.headers().get_one("Content-Type")
. But I don't think you need it. Just use request.content_type()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I collect and join in case the request had several Content-Type
headers and some of them were text/*
. Once more, this corner case maybe could be omited.
contrib/src/compression.rs
Outdated
.get("Content-Type") | ||
.collect::<Vec<&str>>() | ||
.join(", "); | ||
if content_type.contains("text/") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use top()
.
Hello @SergioBenitez. My idea was that the user could disable compression in specific requests (due to performance usually) forcing a |
Hello. The code has been simplified and the compression algorithm choose via Nevertheless, some of your suggestions have not be able to be implemented as proposed because the Regards. |
I have not used pull requests changes requested before, but I think that @SergioBenitez should approve the changes before merging the pull request when he has reviewed them. Or should I do anything more? I think that all requested changes have been implemented and the pull request is ready to be merged. Regards. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At present, the entire response body is read into a vector and then compressed. This is unacceptable as response bodies are streams; there's no way to prevent a self-DOS here. That must be changed for this to be merged.
contrib/src/compression.rs
Outdated
.flat_map(|e| e.trim().split(",")) | ||
.any(|e| e.trim() == "br" || e.trim() == "brotli") | ||
{ | ||
#[cfg(feature = "brotli_compression")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unnecessary. The entire branch will be false
if the feature isn't enabled, so this will be compiled-out anyway.
contrib/src/compression.rs
Outdated
&& request | ||
.headers() | ||
.get("Accept-Encoding") | ||
.flat_map(|e| e.trim().split(",")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we actually need all of these trims?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I use trim in case any browser set the header with spaces (for instance: "Accept-Encoding: deflate, gzip, br"). However, since the elements are trimmed after split, the trim of the flat_map
can be securely removed, I will do it.
In Firefox and Chrome this does not happen, but I think that this makes the fairing more robust.
contrib/src/compression.rs
Outdated
// the corresponding feature is enabled | ||
if cfg!(feature = "brotli_compression") | ||
&& request | ||
.headers() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check should be factored out into a function or method. Then, this whole thing becomes:
if cfg!(feature = "brotli_compression") && has_encoding(&request, &["br", "brotli"]) {
....
}
This applies to the else if
below as well.
contrib/src/compression.rs
Outdated
{ | ||
#[cfg(feature = "brotli_compression")] | ||
{ | ||
let body = response.body_bytes(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're reading the entire body here. This is really, really bad. What if it's a 1GiB file, or an endless stream? The compression has to operate on a stream.
contrib/src/compression.rs
Outdated
{ | ||
#[cfg(feature = "gzip_compression")] | ||
{ | ||
let body = response.body_bytes(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same issue here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I are doing changes here, but I don't sure if they are fine.
I'm trying to use response.body()
function to get streamed body, but the only manner of get the Read
inside is using body.into_bytes()
and then passing it as parameter to the corresponded compression function.
Is this ok?
Doing it has also forced to me to use an auxiliar boolean in order to use response mut reference only once in the same block:
let mut success = false;
let mut compressed = Cursor::new(Vec::<u8>::new());
{
let plain = response.body();
if let Some(plain) = plain {
let body_bytes = &plain.into_bytes();
if let Some(body_bytes) = body_bytes {
success = GzEncoder::new(&mut compressed, flate2::Compression::default())
.write_all(&body_bytes)
.is_ok();
}
}
}
if success {
Compression::set_body_and_header(response, compressed, "gzip");
}
Is this acceptable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand how the streaming interface works.
If I set a sized body to the response, then everything works:
fn set_body_and_header<'h, B>(response: &mut Response<'h>, body: B, header: &'h str)
where
B: Read + Seek + 'h,
{
response.remove_header("Content-Encoding");
response.adjoin_header(Header::new("Content-Encoding", header));
response.set_sized_body(body);
}
However, replacing in this function the set_sized_body
call with set_streaming_body
makes the body empty into the response, in spite of passing the same bytes as body
to the function.
Is there a special function to call when the streaming has finished?
Has the chunk_size
parameter of the set_chunked_body
special requirements?
What am I doing wrong?
Sorry if my question is too much stupid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your question isn't stupid at all! I really appreciate all your work here. :)
You can get the actual body with response.body().into_inner()
or response.take_body().into_inner()
. The returned value from these types implement Read
. The compression libraries should have some kind of wrapper over any Read
type that returns another Read
type that, when read, compresses on the fly. This is what we need. Once you have that type, use response.set_streamed_body(compression_wrapper)
to set the new body on the response.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I was doing that: in my Fairing
the compression libraries read the body as a Cursor<Vec<u8>>
and write the compressed bytes into another Cursor<Vec<u8>>
.
I have done some tests and I have found that response.set_streamed_body
works fine using a repeat().take()
as in the documentation example. Nevertheless, when a Cursor<Vec<u8>>
is passed to this function, the response body is empty.
This is the code and the results that I have used to test it:
Set a Cursor
fails:
response.remove_header("Content-Encoding");
response.adjoin_header(Header::new("Content-Encoding", header));
response.set_streamed_body(body);
println!("{:?}", repeat(87).take(5));
// println!("{:?}", body);
println!("{:?}", response.body().unwrap().into_bytes());
Output:
Take { inner: Repeat { .. }, limit: 5 }
Some([])
Set a repeat().take()
works, the Cursor
has 16 bytes.
response.remove_header("Content-Encoding");
response.adjoin_header(Header::new("Content-Encoding", header));
response.set_streamed_body(repeat(87).take(5));
println!("{:?}", repeat(87).take(5));
println!("{:?}", body);
println!("{:?}", response.body().unwrap().into_bytes());
Output:
Take { inner: Repeat { .. }, limit: 5 }
Cursor { inner: [139, 5, 128, 72, 101, 108, 108, 111, 32, 119, 111, 114, 108, 100, 33, 3], pos: 16 }
Some([87, 87, 87, 87, 87])
As you can see, the response body has bytes when a repeat().take()
is passed to set_streamed_body
but when a Cursor<Vec<u8>>
is passed then the body is empty. Using set_sized_body
when the same Cursor<Vec<u8>>
works fine.
I don't know if you can figure out where is the problem.
Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have found the error: my misunderstanding about the Cursor
struct. Seeking body
to the start before calling set_streamed_body
everything works.
All tests pass and I have tested it in a own project with JS and CSS files getting response times between 0 and 2 ms 😊.
I think that now the pull request is ready to be merged, I wait your comments.
@SergioBenitez, have you had any chance to review the last changes? |
contrib/src/compression.rs
Outdated
params.mode = BrotliEncoderMode::BROTLI_MODE_FONT; | ||
} | ||
} | ||
if plain.into_inner().read_to_end(&mut plain_buffer).is_ok() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
read_to_end
is the problematic part of this: what if the response was streaming a 3GB file from disk? Without compression, it will be streamed to the client piece by piece keeping only a bit in memory at a time. With compression, it now has to read all 3 GB into memory, compress it (keeping both uncompressed and compressed versions in memory for a short time), then send the compressed version to the client.
A safe approach would involve something like this (modulo parameters and error handling):
let reader = response.take_body().into_inner();
let new_body = brotli::CompressorReader::new(reader, buffer_size, q, lgwin);
Compression::set_body_and_header(response, new_body, "br");
(The same reasoning applies to gzip below)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will try your suggestion when #646 is resolved and I can compile Rocket.
However, flate2
API only supports to write &[u8]
, so I don't find any manner of converting the Read
into a &[u8]
without read_to_end
or similar (I guess that take
has the same problem).
Thanks for your help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
flate2::read::GzEncoder
looks like it would work for this purpose (instead of flate2::write::GzEncoder
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that then I could use read
to write in a streamed manner the bytes compressed into a buffer of my desired size, no?
It seems weird to me that read
writes compressed data xD, that is why I didn't found this API before.
Thanks again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have finally done the changes in order to have streaming compression of requests.
I have also done some refactoring of the code removing unnecessary variables and making it more readable.
@jebrosen, @SergioBenitez, I kindly wait your reviews and comments.
@jebrosen Could you check this out? |
@Martin1887 Please rebase. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The plain body is no longer being read into a buffer, but the compressed body still is. That has to be fixed for the same reasons.
contrib/src/compression.rs
Outdated
if cfg!(feature = "brotli_compression") | ||
&& Compression::accepts_encoding(request, &["br", "brotli"]) | ||
{ | ||
let mut compressed = Cursor::new(Vec::<u8>::new()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This Vec<u8>
to temporarily hold compressed data is still an issue. The fairing should not do any reading/writing of bytes or buffer management itself on either the plain or compressed side; it should instead be left up to the Response
/Body
machinery to stream the response.
contrib/src/compression.rs
Outdated
params.mode = BrotliEncoderMode::BROTLI_MODE_FONT; | ||
} | ||
} | ||
brotli::BrotliCompress(&mut plain.into_inner(), &mut compressed, ¶ms) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use of brotli should read similarly to this code below to stream the response. I don't know what the appropriate values of q
or lgwin
might be, but the brotli documentation has some guidance.
let body = brotli::CompressorReader::new(plain.into_inner(), buffer_size, q, lgwin);
Compression::set_body_and_header(response, body, "br");
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
q
is the quality, I am using 2 in this crate to have really fast compressions with compression ratio similar to gzip.
lgwin
is the size of the window, whose default in the crate is 20. Recommended values are 20 and 22.
contrib/src/compression.rs
Outdated
flate2::Compression::default(), | ||
); | ||
if match response.body() { | ||
Some(plain) => gz.write(&plain.into_bytes().unwrap()).is_ok(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as the brotli case above, a flate2::read::GzEncoder
should be used as the body instead of writing into a buffer.
contrib/src/lib.rs
Outdated
#[cfg(feature = "json")] | ||
#[cfg_attr(feature = "json", macro_use)] | ||
#[doc(hidden)] | ||
pub mod json; | ||
|
||
#[cfg(feature = "json")] | ||
pub use json::{Json, SerdeError, JsonValue}; | ||
pub use json::{Json, JsonValue, SerdeError}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this done automatically by an editor/IDE? It's not related to the compression feature, so it could confuse someone going through the commit history later. Same with the templates::
imports below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, RLS did it. I will undone that changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A commit undoing the changes would be enough or does exist any git command to discard the changes done in that file on a specified commit (I have not found it)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now, a commit to undo them would be enough; the history can be cleaned up later.
contrib/src/compression.rs
Outdated
|
||
fn set_body_and_header<'h, B>(response: &mut Response<'h>, mut body: B, header: &'h str) | ||
where | ||
B: Read + Seek + 'h, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function should not use Seek
, to align with the other changes.
… low performance hit, set compression mode to TEXT or FONT regarding the Content-Type of the response, images now are not compressed because it is not useful, fixed error when Accept-Encoding was a only header with all values in the same header
…med to Compression
…est removing AtomicBool flags
…n features, in addition to compression feature for all compression methods
…n features, in addition to compression feature for all compression methods
…on algorithm choose using the Content-Encoding header of the response
…ompression instead of read and write the whole response at once
Adds the following to the list of known media types: * audio/webm (WEBA) * image/tiff (TIFF) * audio/aac (AAC) * text/calendar (Calendar) * video/mpeg (MPEG) * application/x-tar (TAR) * application/gzip (GZIP) * video/quicktime (MOV) * video/mp4 (MP4) Also adds the following to the list of known extensions: * .weba (WEBA) * .ogv (OGG) * .mp4 (MP4) * .mpeg4 (MP4) * .aac (AAC) * .ics (Calendar) * .bin (Binary) * .mpg (MPEG) * .mpeg (MPEG) * .tar (TAR) * .gz (GZIP) * .tif (TIFF) * .tiff (TIFF) * .mov (MOV)
The new 'JsonError' type allows users to inspect the raw string that failed to parse as a given JSON value. Resolves rwf2#772.
The new 'FromData' trait allows an implementor to instruct the caller to maintain state on its stack and later pass a borrow for processing. Among other things, it greatly simplifies the 'Form' type, removing a use of unsafe, and allows references in deserialized data guards.
My Pull Request has been merged and the new crates.io release will be launched on the next week. Then, I will commit and push my changes after test them. |
I have committed the last changes and rebased All I hope that this pull request is finally ready to be merged 🙂. |
Please rebase or reorder commits again against the lastest I only noticed one problem with the latest implementation: any response that already has a Everything else looks reasonable to me, just nitpicks and stylistic changes. |
I have added the Sorry, but I am not very skillful in git. All commits in this pull request contain Thanks. |
I'm not sure exactly what happened, but I did try rebasing it myself and things got pretty hairy. Would you be able to make a new branch off of the upstream |
Closing in favor of #781. |
Hello.
This pull request provides compression support to all responses using the optional feature 'Compression', as requested in #528.
This feature provides brotli and gzip compression with the following requirements:
Regards.