From d7d126a9e602b10f33a6d57aebae529b0948f9a2 Mon Sep 17 00:00:00 2001 From: Barry Pollard Date: Thu, 10 Dec 2020 23:37:10 +0000 Subject: [PATCH] Replace gzip with Gzip and brotli with Brotli (#1745) * Replace gzip with Gzip * Fix typo --- src/content/en/2019/caching.md | 44 ++++++++++----------- src/content/en/2019/cdn.md | 44 ++++++++++----------- src/content/en/2019/compression.md | 62 +++++++++++++++--------------- src/content/en/2019/http2.md | 2 +- src/content/en/2019/javascript.md | 8 ++-- src/content/en/2020/caching.md | 2 +- src/content/es/2019/javascript.md | 8 ++-- src/content/fr/2019/caching.md | 2 +- src/content/fr/2019/compression.md | 52 ++++++++++++------------- src/content/fr/2019/javascript.md | 8 ++-- src/content/ja/2019/caching.md | 4 +- src/content/ja/2019/cdn.md | 4 +- src/content/ja/2019/compression.md | 54 +++++++++++++------------- src/content/ja/2019/http2.md | 2 +- src/content/ja/2019/javascript.md | 8 ++-- src/content/pt/2019/http2.md | 2 +- src/content/pt/2019/javascript.md | 8 ++-- 17 files changed, 157 insertions(+), 157 deletions(-) diff --git a/src/content/en/2019/caching.md b/src/content/en/2019/caching.md index 48aec09e15a..bbb6762f4eb 100644 --- a/src/content/en/2019/caching.md +++ b/src/content/en/2019/caching.md @@ -36,7 +36,7 @@ Web architectures typically involve [multiple tiers of caching](https://blog.yoa * An end user's browser * A service worker cache in the user's browser -* A shared gateway +* A shared gateway * CDNs, which offer the ability to cache at the edge, close to end users * A caching proxy in front of the application, to reduce the backend workload * The application and database layers @@ -86,7 +86,7 @@ The example below contains an excerpt of a request/response header from HTTP Arc < ETag: "1566748830.0-3052-3932359948" ``` -The tool [RedBot.org](https://redbot.org/) allows you to input a URL and see a detailed explanation of how the response would be cached based on these headers. For example, [a test for the URL above](https://redbot.org/?uri=https%3A%2F%2Fhttparchive.org%2Fstatic%2Fjs%2Fmain.js) would output the following: +The tool [RedBot.org](https://redbot.org/) allows you to input a URL and see a detailed explanation of how the response would be cached based on these headers. For example, [a test for the URL above](https://redbot.org/?uri=https%3A%2F%2Fhttparchive.org%2Fstatic%2Fjs%2Fmain.js) would output the following: {{ figure_markup( image="ch16_fig1_redbot_example.jpg", @@ -98,7 +98,7 @@ The tool [RedBot.org](https://redbot.org/) allows you to input a URL and see a d ) }} -If no caching headers are present in a response, then the [client is permitted to heuristically cache the response](https://paulcalvano.com/index.php/2018/03/14/http-heuristic-caching-missing-cache-control-and-expires-headers-explained/). Most clients implement a variation of the RFC's suggested heuristic, which is 10% of the time since `Last-Modified`. However, some may cache the response indefinitely. So, it is important to set specific caching rules to ensure that you are in control of the cacheability. +If no caching headers are present in a response, then the [client is permitted to heuristically cache the response](https://paulcalvano.com/index.php/2018/03/14/http-heuristic-caching-missing-cache-control-and-expires-headers-explained/). Most clients implement a variation of the RFC's suggested heuristic, which is 10% of the time since `Last-Modified`. However, some may cache the response indefinitely. So, it is important to set specific caching rules to ensure that you are in control of the cacheability. 72% of responses are served with a `Cache-Control` header, and 56% of responses are served with an `Expires` header. However, 27% of responses did not use either header, and therefore are subject to heuristic caching. This is consistent across both desktop and mobile sites. @@ -113,7 +113,7 @@ If no caching headers are present in a response, then the [client is permitted t ## What type of content are we caching? -A cacheable resource is stored by the client for a period of time and available for reuse on a subsequent request. Across all HTTP requests, 80% of responses are considered cacheable, meaning that a cache is permitted to store them. Out of these, +A cacheable resource is stored by the client for a period of time and available for reuse on a subsequent request. Across all HTTP requests, 80% of responses are considered cacheable, meaning that a cache is permitted to store them. Out of these, * 6% of requests have a time to live (TTL) of 0 seconds, which immediately invalidates a cached entry. * 27% are cached heuristically because of a missing `Cache-Control` header. @@ -235,7 +235,7 @@ The table below details the cache TTL values for desktop requests by type. Most While most of the median TTLs are high, the lower percentiles highlight some of the missed caching opportunities. For example, the median TTL for images is 28 hours, however the 25th percentile is just one-two hours and the 10th percentile indicates that 10% of cacheable image content is cached for less than one hour. -By exploring the cacheability by content type in more detail in Figure 16.5 below, we can see that approximately half of all HTML responses are considered non-cacheable. Additionally, 16% of images and scripts are non-cacheable. +By exploring the cacheability by content type in more detail in Figure 16.5 below, we can see that approximately half of all HTML responses are considered non-cacheable. Additionally, 16% of images and scripts are non-cacheable. {{ figure_markup( image="fig5.png", @@ -269,7 +269,7 @@ HTTP/1.1 introduced the `Cache-Control` header, and most modern clients support * `must-revalidate` tells the client a cached entry must be validated with a conditional request prior to its use. * `private` indicates a response should only be cached by a browser, and not by an intermediary that would serve multiple clients. -53% of HTTP responses include a `Cache-Control` header with the `max-age` directive, and 54% include the Expires header. However, only 41% of these responses use both headers, which means that 13% of responses are caching solely based on the older `Expires` header. +53% of HTTP responses include a `Cache-Control` header with the `max-age` directive, and 54% include the Expires header. However, only 41% of these responses use both headers, which means that 13% of responses are caching solely based on the older `Expires` header. {{ figure_markup( image="fig7.png", @@ -342,7 +342,7 @@ The HTTP/1.1 [specification](https://tools.ietf.org/html/rfc7234#section-5.2.1)
{{ figure_link(caption="Cache-Control directives.") }}
-For example, `cache-control: public, max-age=43200` indicates that a cached entry should be stored for 43,200 seconds and it can be stored by all caches. +For example, `cache-control: public, max-age=43200` indicates that a cached entry should be stored for 43,200 seconds and it can be stored by all caches. {{ figure_markup( image="fig9.png", @@ -359,13 +359,13 @@ For example, `cache-control: public, max-age=43200` indicates that a cached entr Figure 16.9 above illustrates the top 15 `Cache-Control` directives in use on mobile websites. The results for desktop and mobile are very similar. There are a few interesting observations about the popularity of these cache directives: -* `max-age` is used by almost 75% of `Cache-Control` headers, and `no-store` is used by 18%. +* `max-age` is used by almost 75% of `Cache-Control` headers, and `no-store` is used by 18%. * `public` is rarely necessary since cached entries are assumed `public` unless `private` is specified. Approximately 38% of responses include `public`. * The `immutable` directive is relatively new, [introduced in 2017](https://code.facebook.com/posts/557147474482256/this-browser-tweak-saved-60-of-requests-to-facebook) and is [supported on Firefox and Safari](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Cache-Control#Browser_compatibility). Its usage has grown to 3.4%, and it is widely used in [Facebook and Google third-party responses](https://discuss.httparchive.org/t/cache-control-immutable-a-year-later/1195). -Another interesting set of directives to show up in this list are `pre-check` and `post-check`, which are used in 2.2% of `Cache-Control` response headers (approximately 7.8 million responses). This pair of headers was [introduced in Internet Explorer 5 to provide a background validation](https://blogs.msdn.microsoft.com/ieinternals/2009/07/20/internet-explorers-cache-control-extensions/) and was rarely implemented correctly by websites. 99.2% of responses using these headers had used the combination of `pre-check=0` and `post-check=0`. When both of these directives are set to 0, then both directives are ignored. So, it seems these directives were never used correctly! +Another interesting set of directives to show up in this list are `pre-check` and `post-check`, which are used in 2.2% of `Cache-Control` response headers (approximately 7.8 million responses). This pair of headers was [introduced in Internet Explorer 5 to provide a background validation](https://blogs.msdn.microsoft.com/ieinternals/2009/07/20/internet-explorers-cache-control-extensions/) and was rarely implemented correctly by websites. 99.2% of responses using these headers had used the combination of `pre-check=0` and `post-check=0`. When both of these directives are set to 0, then both directives are ignored. So, it seems these directives were never used correctly! -In the long tail, there are more than 1,500 erroneous directives in use across 0.28% of responses. These are ignored by clients, and include misspellings such as "nocache", "s-max-age", "smax-age", and "maxage". There are also numerous non-existent directives such as "max-stale", "proxy-public", "surrogate-control", etc. +In the long tail, there are more than 1,500 erroneous directives in use across 0.28% of responses. These are ignored by clients, and include misspellings such as "nocache", "s-max-age", "smax-age", and "maxage". There are also numerous non-existent directives such as "max-stale", "proxy-public", "surrogate-control", etc. ## `Cache-Control`: `no-store`, `no-cache` and `max-age=0` @@ -380,13 +380,13 @@ Functionally, `no-cache` and `max-age=0` are similar, since they both require re Over 3 million responses include the combination of `no-store`, `no-cache`, and `max-age=0`. Of these directives `no-store` takes precedence and the other directives are merely redundant -18% of responses include `no-store` and 16.6% of responses include both `no-store` and `no-cache`. Since `no-store` takes precedence, the resource is ultimately non-cacheable. +18% of responses include `no-store` and 16.6% of responses include both `no-store` and `no-cache`. Since `no-store` takes precedence, the resource is ultimately non-cacheable. The `max-age=0` directive is present on 1.1% of responses (more than four million responses) where `no-store` is not present. These resources will be cached in the browser but will require revalidation as they are immediately expired. ## How do cache TTLs compare to resource age? -So far we've talked about how web servers tell a client what is cacheable, and how long it has been cached for. When designing cache rules, it is also important to understand how old the content you are serving is. +So far we've talked about how web servers tell a client what is cacheable, and how long it has been cached for. When designing cache rules, it is also important to understand how old the content you are serving is. When you are selecting a cache TTL, ask yourself: "how often are you updating these assets?" and "what is their content sensitivity?". For example, if a hero image is going to be modified infrequently, then cache it with a very long TTL. If you expect a JavaScript resource to change frequently, then version it and cache it with a long TTL or cache it with a shorter TTL. @@ -413,7 +413,7 @@ By comparing a resources cacheability to its age, we can determine if the TTL is < Last-Modified: Sun, 25 Aug 2019 16:00:30 GMT < Cache-Control: public, max-age=43200 < Expires: Mon, 14 Oct 2019 07:36:57 GMT -< ETag: "1566748830.0-3052-3932359948" +< ETag: "1566748830.0-3052-3932359948" ``` Overall, 59% of resources served on the web have a cache TTL that is too short compared to its content age. Furthermore, the median delta between the TTL and age is 25 days. @@ -446,7 +446,7 @@ When we break this out by first vs third-party, we can also see that 70% of firs ## Validating freshness -The HTTP response headers used for validating the responses stored within a cache are `Last-Modified` and `ETag`. The `Last-Modified` header does exactly what its name implies and provides the time that the object was last modified. The `ETag` header provides a unique identifier for the content. +The HTTP response headers used for validating the responses stored within a cache are `Last-Modified` and `ETag`. The `Last-Modified` header does exactly what its name implies and provides the time that the object was last modified. The `ETag` header provides a unique identifier for the content. For example, the response below was last modified on 25 Aug 2019 and it has an `ETag` value of `"1566748830.0-3052-3932359948"` @@ -463,9 +463,9 @@ For example, the response below was last modified on 25 Aug 2019 and it has an ` < ETag: "1566748830.0-3052-3932359948" ``` -A client could send a conditional request to validate a cached entry by using the `Last-Modified` value in a request header named `If-Modified-Since`. Similarly, it could also validate the resource with an `If-None-Match` request header, which validates against the `ETag` value the client has for the resource in its cache. +A client could send a conditional request to validate a cached entry by using the `Last-Modified` value in a request header named `If-Modified-Since`. Similarly, it could also validate the resource with an `If-None-Match` request header, which validates against the `ETag` value the client has for the resource in its cache. -In the example below, the cache entry is still valid, and an `HTTP 304` was returned with no content. This saves the download of the resource itself. If the cache entry was no longer fresh, then the server would have responded with a `200` and the updated resource which would have to be downloaded again. +In the example below, the cache entry is still valid, and an `HTTP 304` was returned with no content. This saves the download of the resource itself. If the cache entry was no longer fresh, then the server would have responded with a `200` and the updated resource which would have to be downloaded again. ``` > GET /static/js/main.js HTTP/1.1 @@ -496,7 +496,7 @@ Overall, 65% of responses are served with a `Last-Modified` header, 42% are serv ## Validity of date strings -There are a few HTTP headers used to convey timestamps, and the format for these are very important. The `Date` response header indicates when the resource was served to a client. The `Last-Modified` response header indicates when a resource was last changed on the server. And the `Expires` header is used to indicate how long a resource is cacheable until (unless a `Cache-Control` header is present). +There are a few HTTP headers used to convey timestamps, and the format for these are very important. The `Date` response header indicates when the resource was served to a client. The `Last-Modified` response header indicates when a resource was last changed on the server. And the `Expires` header is used to indicate how long a resource is cacheable until (unless a `Cache-Control` header is present). All three of these HTTP headers use a date formatted string to represent timestamps. @@ -542,11 +542,11 @@ The largest source of invalid `Expires` headers is from assets served from a pop ## `Vary` header -One of the most important steps in caching is determining if the resource being requested is cached or not. While this may seem simple, many times the URL alone is not enough to determine this. For example, requests with the same URL could vary in what [compression](./compression) they used (gzip, brotli, etc.) or be modified and tailored for mobile visitors. +One of the most important steps in caching is determining if the resource being requested is cached or not. While this may seem simple, many times the URL alone is not enough to determine this. For example, requests with the same URL could vary in what [compression](./compression) they used (Gzip, Brotli, etc.) or be modified and tailored for mobile visitors. To solve this problem, clients give each cached resource a unique identifier (a cache key). By default, this cache key is simply the URL of the resource, but developers can add other elements (like compression method) by using the Vary header. -A Vary header instructs a client to add the value of one or more request header values to the cache key. The most common example of this is `Vary: Accept-Encoding`, which will result in different cached entries for `Accept-Encoding` request header values (i.e. `gzip`, `br`, `deflate`). +A Vary header instructs a client to add the value of one or more request header values to the cache key. The most common example of this is `Vary: Accept-Encoding`, which will result in different cached entries for `Accept-Encoding` request header values (i.e. `gzip`, `br`, `deflate`). Another common value is `Vary: Accept-Encoding, User-Agent`, which instructs the client to vary the cached entry by both the Accept-Encoding values and the `User-Agent` string. When dealing with shared proxies and CDNs, using values other than `Accept-Encoding` can be problematic as it dilutes the cache keys and can reduce the amount of traffic served from cache. @@ -581,11 +581,11 @@ When a response is cached, its entire headers are swapped into the cache as well ) }} -But what happens if you have a `Set-Cookie` on a response? According to [RFC 7234 Section 8](https://tools.ietf.org/html/rfc7234#section-8), the presence of a `Set-Cookie` response header does not inhibit caching. This means that a cached entry might contain a `Set-Cookie` if it was cached with one. The RFC goes on to recommend that you should configure appropriate `Cache-Control` headers to control how responses are cached. +But what happens if you have a `Set-Cookie` on a response? According to [RFC 7234 Section 8](https://tools.ietf.org/html/rfc7234#section-8), the presence of a `Set-Cookie` response header does not inhibit caching. This means that a cached entry might contain a `Set-Cookie` if it was cached with one. The RFC goes on to recommend that you should configure appropriate `Cache-Control` headers to control how responses are cached. One of the risks of caching responses with `Set-Cookie` is that the cookie values can be stored and served to subsequent requests. Depending on the cookie's purpose, this could have worrying results. For example, if a login cookie or a session cookie is present in a shared cache, then that cookie might be reused by another client. One way to avoid this is to use the `Cache-Control` `private` directive, which only permits the response to be cached by the client browser. -3% of cacheable responses contain a `Set-Cookie header`. Of those responses, only 18% use the `private` directive. The remaining 82% include 5.3 million HTTP responses that include a `Set-Cookie` which can be cached by public and private cache servers. +3% of cacheable responses contain a `Set-Cookie header`. Of those responses, only 18% use the `private` directive. The remaining 82% include 5.3 million HTTP responses that include a `Set-Cookie` which can be cached by public and private cache servers. {{ figure_markup( image="ch16_fig16_cacheable_responses_set_cookie.jpg", @@ -714,7 +714,7 @@ Lighthouse computes a score for each audit, ranging from 0% to 100%, and those s ) }} -Only 3.4% of sites scored a 100%, meaning that most sites can benefit from some cache optimizations. A vast majority of sites sore below 40%, with 38% scoring less than 10%. Based on this, there is a significant amount of caching opportunities on the web. +Only 3.4% of sites scored a 100%, meaning that most sites can benefit from some cache optimizations. A vast majority of sites sore below 40%, with 38% scoring less than 10%. Based on this, there is a significant amount of caching opportunities on the web. Lighthouse also indicates how many bytes could be saved on repeat views by enabling a longer cache policy. Of the sites that could benefit from additional caching, 82% of them can reduce their page weight by up to a whole Mb! diff --git a/src/content/en/2019/cdn.md b/src/content/en/2019/cdn.md index 3e11d682cf9..1fcdacc1924 100644 --- a/src/content/en/2019/cdn.md +++ b/src/content/en/2019/cdn.md @@ -35,7 +35,7 @@ CDNs also help with TCP latency. The latency of TCP determines how long it takes Although CDNs are often thought of as just caches that store and serve static content close to the visitor, they are capable of so much more! CDNs aren't limited to just helping overcome the latency penalty, and increasingly they offer other features that help improve performance and security. -* Using a CDN to proxy dynamic content (base HTML page, API responses, etc.) can take advantage of both the reduced latency between the browser and the CDN's own network back to the origin. +* Using a CDN to proxy dynamic content (base HTML page, API responses, etc.) can take advantage of both the reduced latency between the browser and the CDN's own network back to the origin. * Some CDNs offer transformations that optimize pages so they download and render more quickly, or optimize images so they're the appropriate size (both dimensions and file size) for the device on which they're going to be viewed. * From a [security](./security) perspective, malicious traffic and bots can be filtered out by a CDN before the requests even reach the origin, and their wide customer base means CDNs can often see and react to new threats sooner. * The rise of [edge computing](https://en.wikipedia.org/wiki/Edge_computing) allows sites to run their own code close to their visitors, both improving performance and reducing the load on the origin. @@ -52,7 +52,7 @@ There are many limits to the testing [methodology](./methodology) used for the W * **Single geographic location**: Tests are run from [a single datacenter](https://httparchive.org/faq#how-is-the-data-gathered) and cannot test the geographic distribution of many CDN vendors. * **Cache effectiveness**: Each CDN uses proprietary technology and many, for security reasons, do not expose cache performance. * **Localization and internationalization**: Just like geographic distribution, the effects of language and geo-specific domains are also opaque to the testing. -* [CDN detection](https://github.com/WPO-Foundation/wptagent/blob/master/internal/optimization_checks.py#L51) is primarily done through DNS resolution and HTTP headers. Most CDNs use a DNS CNAME to map a user to an optimal datacenter. However, some CDNs use AnyCast IPs or direct A+AAAA responses from a delegated domain which hide the DNS chain. In other cases, websites use multiple CDNs to balance between vendors which is hidden from the single-request pass of [WebPageTest](./methodology#webpagetest). All of this limits the effectiveness in the measurements. +* [CDN detection](https://github.com/WPO-Foundation/wptagent/blob/master/internal/optimization_checks.py#L51) is primarily done through DNS resolution and HTTP headers. Most CDNs use a DNS CNAME to map a user to an optimal datacenter. However, some CDNs use AnyCast IPs or direct A+AAAA responses from a delegated domain which hide the DNS chain. In other cases, websites use multiple CDNs to balance between vendors which is hidden from the single-request pass of [WebPageTest](./methodology#webpagetest). All of this limits the effectiveness in the measurements. Most importantly, these results reflect a potential utilization but do not reflect actual impact. YouTube is more popular than "ShoesByColin" yet both will appear as equal value when comparing utilization. @@ -62,11 +62,11 @@ With this in mind, there are a few intentional statistics that were not measured ### Further stats -In future versions of the Web Almanac, we would expect to look more closely at the TLS and RTT management between CDN vendors. Of interest would the impact of OCSP stapling, differences in TLS Cipher performance. CWND (TCP congestion window) growth rate, and specifically the adoption of BBR v1, v2, and traditional TCP Cubic. +In future versions of the Web Almanac, we would expect to look more closely at the TLS and RTT management between CDN vendors. Of interest would the impact of OCSP stapling, differences in TLS Cipher performance. CWND (TCP congestion window) growth rate, and specifically the adoption of BBR v1, v2, and traditional TCP Cubic. ## CDN adoption and usage -For websites, a CDN can improve performance for the primary domain (`www.shoesbycolin.com`), sub-domains or sibling domains (`images.shoesbycolin.com` or `checkout.shoesbycolin.com`), and finally third parties (Google Analytics, etc.). Using a CDN for each of these use cases improves performance in different ways. +For websites, a CDN can improve performance for the primary domain (`www.shoesbycolin.com`), sub-domains or sibling domains (`images.shoesbycolin.com` or `checkout.shoesbycolin.com`), and finally third parties (Google Analytics, etc.). Using a CDN for each of these use cases improves performance in different ways. Historically, CDNs were used exclusively for static resources like [CSS](./css), [JavaScript](./javascript), and [images](./media). These resources would likely be versioned (include a unique number in the path) and cached long-term. In this way we should expect to see higher adoption of CDNs on sub-domains or sibling domains compared to the base HTML domains. The traditional design pattern would expect that `www.shoesbycolin.com` would serve HTML directly from a datacenter (or **origin**) while `static.shoesbycolin.com` would use a CDN. @@ -462,7 +462,7 @@ The composition of top CDN providers dramatically shifts for third-party resourc ## RTT and TLS management -CDNs can offer more than simple caching for website performance. Many CDNs also support a pass-through mode for dynamic or personalized content when an organization has a legal or other business requirement prohibiting the content from being cached. Utilizing a CDN's physical distribution enables increased performance for TCP RTT for end users. As [others have noted](https://www.igvita.com/2012/07/19/latency-the-new-web-performance-bottleneck/), [reducing RTT is the most effective means to improve web page performance](https://hpbn.co/primer-on-latency-and-bandwidth/) compared to increasing bandwidth. +CDNs can offer more than simple caching for website performance. Many CDNs also support a pass-through mode for dynamic or personalized content when an organization has a legal or other business requirement prohibiting the content from being cached. Utilizing a CDN's physical distribution enables increased performance for TCP RTT for end users. As [others have noted](https://www.igvita.com/2012/07/19/latency-the-new-web-performance-bottleneck/), [reducing RTT is the most effective means to improve web page performance](https://hpbn.co/primer-on-latency-and-bandwidth/) compared to increasing bandwidth. Using a CDN in this way can improve page performance in two ways: @@ -470,13 +470,13 @@ Using a CDN in this way can improve page performance in two ways: Reducing RTT has three immediate benefits. First, it improves the time for the user to receive data, because TCP+TLS connection time are RTT-bound. Secondly, this will improve the time it takes to grow the congestion window and utilize the full amount of bandwidth the user has available. Finally, it reduces the probability of packet loss. When the RTT is high, network interfaces will time-out requests and resend packets. This can result in double packets being delivered. -2. CDNs can utilize pre-warmed TCP connections to the back-end origin. Just as terminating the connection closer to the user will improve the time it takes to grow the congestion window, the CDN can relay the request to the origin on pre-established TCP connections that have already maximized congestion windows. In this way the origin can return the dynamic content in fewer TCP round trips and the content can be more effectively ready to be delivered to the waiting user. +2. CDNs can utilize pre-warmed TCP connections to the back-end origin. Just as terminating the connection closer to the user will improve the time it takes to grow the congestion window, the CDN can relay the request to the origin on pre-established TCP connections that have already maximized congestion windows. In this way the origin can return the dynamic content in fewer TCP round trips and the content can be more effectively ready to be delivered to the waiting user. ## TLS negotiation time: origin 3x slower than CDNs Since TLS negotiations require multiple TCP round trips before data can be sent from a server, simply improving the RTT can significantly improve the page performance. For example, looking at the base HTML page, the median TLS negotiation time for origin requests is 207 ms (for desktop WebPageTest). This alone accounts for 10% of a 2 second performance budget, and this is under ideal network conditions where there is no latency applied on the request. -In contrast, the median TLS negotiation for the majority of CDN providers is between 60 and 70 ms. Origin requests for HTML pages take almost 3x longer to complete TLS negotiation than those web pages that use a CDN. Even at the 90th percentile, this disparity perpetuates with origin TLS negotiation rates of 427 ms compared to most CDNs which complete under 140 ms! +In contrast, the median TLS negotiation for the majority of CDN providers is between 60 and 70 ms. Origin requests for HTML pages take almost 3x longer to complete TLS negotiation than those web pages that use a CDN. Even at the 90th percentile, this disparity perpetuates with origin TLS negotiation rates of 427 ms compared to most CDNs which complete under 140 ms!

A word of caution when interpreting these charts: it is important to focus on orders of magnitude when comparing vendors as there are many factors that impact the actual TLS negotiation performance. These tests were completed from a single datacenter under controlled conditions and do not reflect the variability of the internet and user experiences.

@@ -649,7 +649,7 @@ In contrast, the median TLS negotiation for the majority of CDN providers is bet
{{ figure_link(caption="HTML TLS connection time (ms).") }}
-For resource requests (including same-domain and third-party), the TLS negotiation time takes longer and the variance increases. This is expected because of network saturation and network congestion. By the time that a third-party connection is established (by way of a resource hint or a resource request) the browser is busy rendering and making other parallel requests. This creates contention on the network. Despite this disadvantage, there is still a clear advantage for third-party resources that utilize a CDN over using an origin solution. +For resource requests (including same-domain and third-party), the TLS negotiation time takes longer and the variance increases. This is expected because of network saturation and network congestion. By the time that a third-party connection is established (by way of a resource hint or a resource request) the browser is busy rendering and making other parallel requests. This creates contention on the network. Despite this disadvantage, there is still a clear advantage for third-party resources that utilize a CDN over using an origin solution. {{ figure_markup( image="resource_tls_negotiation_time.png", @@ -659,17 +659,17 @@ For resource requests (including same-domain and third-party), the TLS negotiati }} -TLS handshake performance is impacted by a number of factors. These include RTT, TLS record size, and TLS certificate size. While RTT has the biggest impact on the TLS handshake, the second largest driver for TLS performance is the TLS certificate size. +TLS handshake performance is impacted by a number of factors. These include RTT, TLS record size, and TLS certificate size. While RTT has the biggest impact on the TLS handshake, the second largest driver for TLS performance is the TLS certificate size. -During the first round trip of the [TLS handshake](https://hpbn.co/transport-layer-security-tls/#tls-handshake), the server attaches its certificate. This certificate is then verified by the client before proceeding. In this certificate exchange, the server might include the certificate chain by which it can be verified. After this certificate exchange, additional keys are established to encrypt the communication. However, the length and size of the certificate can negatively impact the TLS negotiation performance, and in some cases, crash client libraries. +During the first round trip of the [TLS handshake](https://hpbn.co/transport-layer-security-tls/#tls-handshake), the server attaches its certificate. This certificate is then verified by the client before proceeding. In this certificate exchange, the server might include the certificate chain by which it can be verified. After this certificate exchange, additional keys are established to encrypt the communication. However, the length and size of the certificate can negatively impact the TLS negotiation performance, and in some cases, crash client libraries. -The certificate exchange is at the foundation of the TLS handshake and is usually handled by isolated code paths so as to minimize the attack surface for exploits. Because of its low level nature, buffers are usually not dynamically allocated, but fixed. In this way, we cannot simply assume that the client can handle an unlimited-sized certificate. For example, OpenSSL CLI tools and Safari can successfully negotiate against [`https://10000-sans.badssl.com`](https://10000-sans.badssl.com). Yet, Chrome and Firefox fail because of the size of the certificate. +The certificate exchange is at the foundation of the TLS handshake and is usually handled by isolated code paths so as to minimize the attack surface for exploits. Because of its low level nature, buffers are usually not dynamically allocated, but fixed. In this way, we cannot simply assume that the client can handle an unlimited-sized certificate. For example, OpenSSL CLI tools and Safari can successfully negotiate against [`https://10000-sans.badssl.com`](https://10000-sans.badssl.com). Yet, Chrome and Firefox fail because of the size of the certificate. -While extreme sizes of certificates can cause failures, even sending moderately large certificates has a performance impact. A certificate can be valid for one or more hostnames which are are listed in the `Subject-Alternative-Name` (SAN). The more SANs, the larger the certificate. It is the processing of these SANs during verification that causes performance to degrade. To be clear, performance of certificate size is not about TCP overhead, rather it is about processing performance of the client. +While extreme sizes of certificates can cause failures, even sending moderately large certificates has a performance impact. A certificate can be valid for one or more hostnames which are are listed in the `Subject-Alternative-Name` (SAN). The more SANs, the larger the certificate. It is the processing of these SANs during verification that causes performance to degrade. To be clear, performance of certificate size is not about TCP overhead, rather it is about processing performance of the client. -Technically, TCP slow start can impact this negotiation but it is very improbable. TLS record length is limited to 16 KB, which fits into a typical initial congestion window of 10. While some ISPs might employ packet splicers, and other tools fragment congestion windows to artificially throttle bandwidth, this isn't something that a website owner can change or manipulate. +Technically, TCP slow start can impact this negotiation but it is very improbable. TLS record length is limited to 16 KB, which fits into a typical initial congestion window of 10. While some ISPs might employ packet splicers, and other tools fragment congestion windows to artificially throttle bandwidth, this isn't something that a website owner can change or manipulate. -Many CDNs, however, depend on shared TLS certificates and will list many customers in the SAN of a certificate. This is often necessary because of the scarcity of IPv4 addresses. Prior to the adoption of `Server-Name-Indicator` (SNI) by end users, the client would connect to a server, and only after inspecting the certificate, would the client hint which hostname the user user was looking for (using the `Host` header in HTTP). This results in a 1:1 association of an IP address and a certificate. If you are a CDN with many physical locations, each location may require a dedicated IP, further aggravating the exhaustion of IPv4 addresses. Therefore, the simplest and most efficient way for CDNs to offer TLS certificates for websites that still have users that don't support SNI is to offer a shared certificate. +Many CDNs, however, depend on shared TLS certificates and will list many customers in the SAN of a certificate. This is often necessary because of the scarcity of IPv4 addresses. Prior to the adoption of `Server-Name-Indicator` (SNI) by end users, the client would connect to a server, and only after inspecting the certificate, would the client hint which hostname the user user was looking for (using the `Host` header in HTTP). This results in a 1:1 association of an IP address and a certificate. If you are a CDN with many physical locations, each location may require a dedicated IP, further aggravating the exhaustion of IPv4 addresses. Therefore, the simplest and most efficient way for CDNs to offer TLS certificates for websites that still have users that don't support SNI is to offer a shared certificate. According to Akamai, the adoption of SNI is [still not 100% globally](https://datatracker.ietf.org/meeting/101/materials/slides-101-maprg-update-on-tls-sni-and-ipv6-client-adoption-00). Fortunately there has been a rapid shift in recent years. The biggest culprits are no longer Windows XP and Vista, but now Android apps, bots, and corporate applications. Even at 99% adoption, the remaining 1% of 3.5 billion users on the internet can create a very compelling motivation for website owners to require a non-SNI certificate. Put another way, a pure play website can enjoy a virtually 100% SNI adoption among standard web browsers. Yet, if the website is also used to support APIs or WebViews in apps, particularly Android apps, this distribution can drop rapidly. @@ -1025,7 +1025,7 @@ Most CDNs balance the need for shared certificates and performance. Most cap the ## TLS adoption -In addition to using a CDN for TLS and RTT performance, CDNs are often used to ensure patching and adoption of TLS ciphers and TLS versions. In general, the adoption of TLS on the main HTML page is much higher for websites that use a CDN. Over 76% of HTML pages are served with TLS compared to the 62% from origin-hosted pages. +In addition to using a CDN for TLS and RTT performance, CDNs are often used to ensure patching and adoption of TLS ciphers and TLS versions. In general, the adoption of TLS on the main HTML page is much higher for websites that use a CDN. Over 76% of HTML pages are served with TLS compared to the 62% from origin-hosted pages. {{ figure_markup( image="fig15.png", @@ -1053,11 +1053,11 @@ Each CDN offers different rates of adoption for both TLS and the relative cipher ) }} -Along with this general adoption of TLS, CDN use also sees higher adoption of emerging TLS versions like TLS 1.3. +Along with this general adoption of TLS, CDN use also sees higher adoption of emerging TLS versions like TLS 1.3. -In general, the use of a CDN is highly correlated with a more rapid adoption of stronger ciphers and stronger TLS versions compared to origin-hosted services where there is a higher usage of very old and compromised TLS versions like TLS 1.0. +In general, the use of a CDN is highly correlated with a more rapid adoption of stronger ciphers and stronger TLS versions compared to origin-hosted services where there is a higher usage of very old and compromised TLS versions like TLS 1.0. -

It is important to emphasize that Chrome used in the Web Almanac will bias to the latest TLS versions and ciphers offered by the host. Also, these web pages were crawled in July 2019 and reflect the adoption of websites that have enabled the newer versions.

+

It is important to emphasize that Chrome used in the Web Almanac will bias to the latest TLS versions and ciphers offered by the host. Also, these web pages were crawled in July 2019 and reflect the adoption of websites that have enabled the newer versions.

{{ figure_markup( image="fig18.png", @@ -1071,7 +1071,7 @@ More discussion of TLS versions and ciphers can be found in the [Security](./sec ## HTTP/2 adoption -Along with RTT management and improving TLS performance, CDNs also enable new standards like HTTP/2 and IPv6. While most CDNs offer support for HTTP/2 and many have signaled early support of the still-under-standards-development HTTP/3, adoption still depends on website owners to enable these new features. Despite the change-management overhead, the majority of the HTML served from CDNs has HTTP/2 enabled. +Along with RTT management and improving TLS performance, CDNs also enable new standards like HTTP/2 and IPv6. While most CDNs offer support for HTTP/2 and many have signaled early support of the still-under-standards-development HTTP/3, adoption still depends on website owners to enable these new features. Despite the change-management overhead, the majority of the HTML served from CDNs has HTTP/2 enabled. CDNs have over 70% adoption of HTTP/2, compared to the nearly 27% of origin pages. Similarly, sub-domain and third-party resources on CDNs see an even higher adoption of HTTP/2 at 90% or higher while third-party resources served from origin infrastructure only has 31% adoption. The performance gains and other features of HTTP/2 are further covered in the [HTTP/2](./http2) chapter. @@ -1491,7 +1491,7 @@ CDNs have over 70% adoption of HTTP/2, compared to the nearly 27% of origin page A website can control the caching behavior of browsers and CDNs with the use of different HTTP headers. The most common is the `Cache-Control` header which specifically determines how long something can be cached before returning to the origin to ensure it is up-to-date. -Another useful tool is the use of the `Vary` HTTP header. This header instructs both CDNs and browsers how to fragment a cache. The `Vary` header allows an origin to indicate that there are multiple representations of a resource, and the CDN should cache each variation separately. The most common example is [compression](./compression). Declaring a resource as `Vary: Accept-Encoding` allows the CDN to cache the same content, but in different forms like uncompressed, with gzip, or Brotli. Some CDNs even do this compression on the fly so as to keep only one copy available. This `Vary` header likewise also instructs the browser how to cache the content and when to request new content. +Another useful tool is the use of the `Vary` HTTP header. This header instructs both CDNs and browsers how to fragment a cache. The `Vary` header allows an origin to indicate that there are multiple representations of a resource, and the CDN should cache each variation separately. The most common example is [compression](./compression). Declaring a resource as `Vary: Accept-Encoding` allows the CDN to cache the same content, but in different forms like uncompressed, with Gzip, or Brotli. Some CDNs even do this compression on the fly so as to keep only one copy available. This `Vary` header likewise also instructs the browser how to cache the content and when to request new content. {{ figure_markup( image="use_of_vary_on_cdn.png", @@ -1507,7 +1507,7 @@ While the main use of `Vary` is to coordinate `Content-Encoding`, there are othe For HTML pages, the most common use of `Vary` is to signal that the content will change based on the `User-Agent`. This is short-hand to indicate that the website will return different content for desktops, phones, tablets, and link-unfurling engines (like Slack, iMessage, and Whatsapp). The use of `Vary: User-Agent` is also a vestige of the early mobile era, where content was split between "mDot" servers and "regular" servers in the back-end. While the adoption for responsive web has gained wide popularity, this `Vary` form remains. -In a similar way, `Vary: Cookie` usually indicates that content that will change based on the logged-in state of the user or other personalization. +In a similar way, `Vary: Cookie` usually indicates that content that will change based on the logged-in state of the user or other personalization. {{ figure_markup( image="use_of_vary.png", @@ -1519,7 +1519,7 @@ In a similar way, `Vary: Cookie` usually indicates that content that will change Resources, in contrast, don't use `Vary: Cookie` as much as the HTML resources. Instead these resources are more likely to adapt based on the `Accept`, `Origin`, or `Referer`. Most media, for example, will use `Vary: Accept` to indicate that an image could be a JPEG, WebP, JPEG 2000, or JPEG XR depending on the browser's offered `Accept` header. In a similar way, third-party shared resources signal that an XHR API will differ depending on which website it is embedded. This way, a call to an ad server API will return different content depending on the parent website that called the API. -The `Vary` header also contains evidence of CDN chains. These can be seen in `Vary` headers such as `Accept-Encoding, Accept-Encoding` or even `Accept-Encoding, Accept-Encoding, Accept-Encoding`. Further analysis of these chains and `Via` header entries might reveal interesting data, for example how many sites are proxying third-party tags. +The `Vary` header also contains evidence of CDN chains. These can be seen in `Vary` headers such as `Accept-Encoding, Accept-Encoding` or even `Accept-Encoding, Accept-Encoding, Accept-Encoding`. Further analysis of these chains and `Via` header entries might reveal interesting data, for example how many sites are proxying third-party tags. Many of the uses of the `Vary` are extraneous. With most browsers adopting double-key caching, the use of `Vary: Origin` is redundant. As is `Vary: Range` or `Vary: Host` or `Vary: *`. The wild and variable use of `Vary` is demonstrable proof that the internet is weird. diff --git a/src/content/en/2019/compression.md b/src/content/en/2019/compression.md index c05df2bdaa7..7faee3de7d5 100644 --- a/src/content/en/2019/compression.md +++ b/src/content/en/2019/compression.md @@ -15,7 +15,7 @@ featured_quote: HTTP compression is a technique that allows you to encode inform featured_stat_1: 38% featured_stat_label_1: HTTP responses using text-based compression featured_stat_2: 80% -featured_stat_label_2: Use of gzip compression +featured_stat_label_2: Use of Gzip compression featured_stat_3: 56% featured_stat_label_3: HTML responses not using compression --- @@ -24,19 +24,19 @@ featured_stat_label_3: HTML responses not using compression HTTP compression is a technique that allows you to encode information using fewer bits than the original representation. When used for delivering web content, it enables web servers to reduce the amount of data transmitted to clients. This increases the efficiency of the client's available bandwidth, reduces [page weight](./page-weight), and improves [web performance](./performance). -Compression algorithms are often categorized as lossy or lossless: +Compression algorithms are often categorized as lossy or lossless: * When a lossy compression algorithm is used, the process is irreversible, and the original file cannot be restored via decompression. This is commonly used to compress media resources, such as image and video content where losing some data will not materially affect the resource. -* Lossless compression is a completely reversible process, and is commonly used to compress text based resources, such as [HTML](./markup), [JavaScript](./javascript), [CSS](./css), etc. +* Lossless compression is a completely reversible process, and is commonly used to compress text based resources, such as [HTML](./markup), [JavaScript](./javascript), [CSS](./css), etc. In this chapter, we are going to explore how text-based content is compressed on the web. Analysis of non-text-based content forms part of the [Media](./media) chapter. ## How HTTP compression works -When a client makes an HTTP request, it often includes an [`Accept-Encoding`](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Accept-Encoding) header to advertise the compression algorithms it is capable of decoding. The server can then select from one of the advertised encodings it supports and serve a compressed response. The compressed response would include a [`Content-Encoding`](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Encoding) header so that the client is aware of which compression was used. Additionally, a [`Content-Type`](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Type) header is often used to indicate the [MIME type](https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/MIME_types) of the resource being served. +When a client makes an HTTP request, it often includes an [`Accept-Encoding`](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Accept-Encoding) header to advertise the compression algorithms it is capable of decoding. The server can then select from one of the advertised encodings it supports and serve a compressed response. The compressed response would include a [`Content-Encoding`](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Encoding) header so that the client is aware of which compression was used. Additionally, a [`Content-Type`](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Type) header is often used to indicate the [MIME type](https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/MIME_types) of the resource being served. -In the example below, the client advertised support for gzip, brotli, and deflate compression. The server decided to return a gzip compressed response containing a `text/html` document. +In the example below, the client advertised support for Gzip, Brotli, and Deflate compression. The server decided to return a Gzip compressed response containing a `text/html` document. ``` @@ -44,7 +44,7 @@ In the example below, the client advertised support for gzip, brotli, and deflat > Host: httparchive.org > Accept-Encoding: gzip, deflate, br - < HTTP/1.1 200 + < HTTP/1.1 200 < Content-type: text/html; charset=utf-8 < Content-encoding: gzip ``` @@ -55,11 +55,11 @@ The HTTP Archive contains measurements for 5.3 million web sites, and each site ## Compression algorithms -IANA maintains a [list of valid HTTP content encodings](https://www.iana.org/assignments/http-parameters/http-parameters.xml#content-coding) that can be used with the `Accept-Encoding` and `Content-Encoding` headers. These include gzip, deflate, br (brotli), as well as a few others. Brief descriptions of these algorithms are given below: +IANA maintains a [list of valid HTTP content encodings](https://www.iana.org/assignments/http-parameters/http-parameters.xml#content-coding) that can be used with the `Accept-Encoding` and `Content-Encoding` headers. These include `gzip`, `deflate`, `br` (Brotli), as well as a few others. Brief descriptions of these algorithms are given below: -* [Gzip](https://tools.ietf.org/html/rfc1952) uses the [LZ77](https://en.wikipedia.org/wiki/LZ77_and_LZ78#LZ77) and [Huffman coding](https://en.wikipedia.org/wiki/Huffman_coding) compression techniques, and is older than the web itself. It was originally developed for the UNIX gzip program in 1992. An implementation for web delivery has existed since HTTP/1.1, and most web browsers and clients support it. -* [Deflate](https://tools.ietf.org/html/rfc1951) uses the same algorithm as gzip, just with a different container. Its use was not widely adopted for the web because of [compatibility issues](https://en.wikipedia.org/wiki/HTTP_compression#Problems_preventing_the_use_of_HTTP_compression) with some servers and browsers. -* [Brotli](https://tools.ietf.org/html/rfc7932) is a newer compression algorithm that was [invented by Google](https://github.com/google/brotli). It uses the combination of a modern variant of the LZ77 algorithm, Huffman coding, and second order context modeling. Compression via brotli is more computationally expensive compared to gzip, but the algorithm is able to reduce files by [15-25%](https://cran.r-project.org/web/packages/brotli/vignettes/brotli-2015-09-22.pdf) more than gzip compression. Brotli was first used for compressing web content in 2015 and is [supported by all modern web browsers](https://caniuse.com/#feat=brotli). +* [Gzip](https://tools.ietf.org/html/rfc1952) uses the [LZ77](https://en.wikipedia.org/wiki/LZ77_and_LZ78#LZ77) and [Huffman coding](https://en.wikipedia.org/wiki/Huffman_coding) compression techniques, and is older than the web itself. It was originally developed for the UNIX `gzip` program in 1992. An implementation for web delivery has existed since HTTP/1.1, and most web browsers and clients support it. +* [Deflate](https://tools.ietf.org/html/rfc1951) uses the same algorithm as Gzip, just with a different container. Its use was not widely adopted for the web because of [compatibility issues](https://en.wikipedia.org/wiki/HTTP_compression#Problems_preventing_the_use_of_HTTP_compression) with some servers and browsers. +* [Brotli](https://tools.ietf.org/html/rfc7932) is a newer compression algorithm that was [invented by Google](https://github.com/google/brotli). It uses the combination of a modern variant of the LZ77 algorithm, Huffman coding, and second order context modeling. Compression via Brotli is more computationally expensive compared to Gzip, but the algorithm is able to reduce files by [15-25%](https://cran.r-project.org/web/packages/brotli/vignettes/brotli-2015-09-22.pdf) more than Gzip compression. Brotli was first used for compressing web content in 2015 and is [supported by all modern web browsers](https://caniuse.com/#feat=brotli). Approximately 38% of HTTP responses are delivered with text-based compression. This may seem like a surprising statistic, but keep in mind that it is based on all HTTP requests in the dataset. Some content, such as images, will not benefit from these compression algorithms. The table below summarizes the percentage of requests served with each content encoding. @@ -88,21 +88,21 @@ Approximately 38% of HTTP responses are delivered with text-based compression. T 285,158,644 - gzip + gzip 29.66% 30.95% 122,789,094 143,549,122 - br + br 7.43% 7.55% 30,750,681 35,012,368 - deflate + deflate 0.02% 0.02% 68,802 @@ -116,28 +116,28 @@ Approximately 38% of HTTP responses are delivered with text-based compression. T 68,352 - identity + identity 0.000709% 0.000563% 2,935 2,611 - x-gzip + x-gzip 0.000193% 0.000179% 800 829 - compress + compress 0.000008% 0.000007% 33 32 - x-compress + x-compress 0.000002% 0.000006% 8 @@ -148,7 +148,7 @@ Approximately 38% of HTTP responses are delivered with text-based compression. T
{{ figure_link(caption="Adoption of compression algorithms.") }}
-Of the resources that are served compressed, the majority are using either gzip (80%) or brotli (20%). The other compression algorithms are infrequently used. +Of the resources that are served compressed, the majority are using either Gzip (80%) or Brotli (20%). The other compression algorithms are infrequently used. {{ figure_markup( image="fig2.png", @@ -162,16 +162,16 @@ Additionally, there are 67k requests that return an invalid `Content-Encoding`, We can't determine the compression levels from any of the diagnostics collected by the HTTP Archive, but the best practice for compressing content is: -* At a minimum, enable gzip compression level 6 for text based assets. This provides a fair trade-off between computational cost and compression ratio and is the [default for many web servers](https://paulcalvano.com/index.php/2018/07/25/brotli-compression-how-much-will-it-reduce-your-content/)—though [Nginx still defaults to the, often too low, level 1](http://nginx.org/en/docs/http/ngx_http_gzip_module.html#gzip_comp_level). -* If you can support brotli and precompress resources, then compress to brotli level 11. This is more computationally expensive than gzip - so precompression is an absolute must to avoid delays. -* If you can support brotli and are unable to precompress, then compress to brotli level 5. This level will result in smaller payloads compared to gzip, with a similar computational overhead. +* At a minimum, enable Gzip compression level 6 for text based assets. This provides a fair trade-off between computational cost and compression ratio and is the [default for many web servers](https://paulcalvano.com/index.php/2018/07/25/brotli-compression-how-much-will-it-reduce-your-content/)—though [Nginx still defaults to the, often too low, level 1](http://nginx.org/en/docs/http/ngx_http_gzip_module.html#gzip_comp_level). +* If you can support Brotli and precompress resources, then compress to Brotli level 11. This is more computationally expensive than Gzip - so precompression is an absolute must to avoid delays. +* If you can support Brotli and are unable to precompress, then compress to Brotli level 5. This level will result in smaller payloads compared to Gzip, with a similar computational overhead. ## What types of content are we compressing? -Most text based resources (such as HTML, CSS, and JavaScript) can benefit from gzip or brotli compression. However, it's often not necessary to use these compression techniques on binary resources, such as images, video, and some web fonts because their file formats are already compressed. +Most text based resources (such as HTML, CSS, and JavaScript) can benefit from Gzip or Brotli compression. However, it's often not necessary to use these compression techniques on binary resources, such as images, video, and some web fonts because their file formats are already compressed. -In the graph below, the top 25 content types are displayed with box sizes representing the relative number of requests. The color of each box represents how many of these resources were served compressed. Most of the media content is shaded orange, which is expected since gzip and brotli would have little to no benefit for them. Most of the text content is shaded blue to indicate that they are being compressed. However, the light blue shading for some content types indicate that they are not compressed as consistently as the others. +In the graph below, the top 25 content types are displayed with box sizes representing the relative number of requests. The color of each box represents how many of these resources were served compressed. Most of the media content is shaded orange, which is expected since Gzip and Brotli would have little to no benefit for them. Most of the text content is shaded blue to indicate that they are being compressed. However, the light blue shading for some content types indicate that they are not compressed as consistently as the others. {{ figure_markup( image="fig3.png", @@ -255,13 +255,13 @@ The content types with the lowest compression rates include `application/json`, ) }} -Across all content types, gzip is the most popular compression algorithm. The newer brotli compression is used less frequently, and the content types where it appears most are `application/javascript`, `text/css` and `application/x-javascript`. This is likely due to CDNs that automatically apply brotli compression for traffic that passes through them. +Across all content types, Gzip is the most popular compression algorithm. The newer Brotli compression is used less frequently, and the content types where it appears most are `application/javascript`, `text/css` and `application/x-javascript`. This is likely due to CDNs that automatically apply Brotli compression for traffic that passes through them. ## First-party vs third-party compression -In the [Third Parties](./third-parties) chapter, we learned about third parties and their impact on performance. When we compare compression techniques between first and third parties, we can see that third-party content tends to be compressed more than first-party content. +In the [Third Parties](./third-parties) chapter, we learned about third parties and their impact on performance. When we compare compression techniques between first and third parties, we can see that third-party content tends to be compressed more than first-party content. -Additionally, the percentage of brotli compression is higher for third-party content. This is likely due to the number of resources served from the larger third parties that typically support brotli, such as Google and Facebook. +Additionally, the percentage of Brotli compression is higher for third-party content. This is likely due to the number of resources served from the larger third parties that typically support Brotli, such as Google and Facebook.
@@ -288,21 +288,21 @@ Additionally, the percentage of brotli compression is higher for third-party con - + - + - + @@ -363,8 +363,8 @@ Lighthouse also indicates how many bytes could be saved by enabling text-based c ## Conclusion -HTTP compression is a widely used and highly valuable feature for reducing the size of web content. Both gzip and brotli compression are the dominant algorithms used, and the amount of compressed content varies by content type. Tools like Lighthouse can help uncover opportunities to compress content. +HTTP compression is a widely used and highly valuable feature for reducing the size of web content. Both Gzip and Brotli compression are the dominant algorithms used, and the amount of compressed content varies by content type. Tools like Lighthouse can help uncover opportunities to compress content. While many sites are making good use of HTTP compression, there is still room for improvement, particularly for the `text/html` format that the web is built upon! Similarly, lesser-understood text formats like `font/ttf`, `application/json`, `text/xml`, `text/plain`, `image/svg+xml`, and `image/x-icon` may take extra configuration that many websites miss. -At a minimum, websites should use gzip compression for all text-based resources, since it is widely supported, easily implemented, and has a low processing overhead. Additional savings can be found with brotli compression, although compression levels should be chosen carefully based on whether a resource can be precompressed. +At a minimum, websites should use Gzip compression for all text-based resources, since it is widely supported, easily implemented, and has a low processing overhead. Additional savings can be found with Brotli compression, although compression levels should be chosen carefully based on whether a resource can be precompressed. diff --git a/src/content/en/2019/http2.md b/src/content/en/2019/http2.md index e32043991b1..e136bb9ca58 100644 --- a/src/content/en/2019/http2.md +++ b/src/content/en/2019/http2.md @@ -34,7 +34,7 @@ The protocol seemed simple, but it also came with limitations. Because HTTP was That in itself brings its own issues as TCP connections take time and resources to set up and get to full efficiency, especially when using HTTPS, which requires additional steps to set up the encryption. HTTP/1.1 improved this somewhat, allowing reuse of TCP connections for subsequent requests, but still did not solve the parallelization issue. -Despite HTTP being text-based, the reality is that it was rarely used to transport text, at least in its raw format. While it was true that HTTP headers were still text, the payloads themselves often were not. Text files like [HTML](./markup), [JS](./javascript), and [CSS](./css) are usually [compressed](./compression) for transport into a binary format using gzip, brotli, or similar. Non-text files like [images and videos](./media) are served in their own formats. The whole HTTP message is then often wrapped in HTTPS to encrypt the messages for [security](./security) reasons. +Despite HTTP being text-based, the reality is that it was rarely used to transport text, at least in its raw format. While it was true that HTTP headers were still text, the payloads themselves often were not. Text files like [HTML](./markup), [JS](./javascript), and [CSS](./css) are usually [compressed](./compression) for transport into a binary format using Gzip, Brotli, or similar. Non-text files like [images and videos](./media) are served in their own formats. The whole HTTP message is then often wrapped in HTTPS to encrypt the messages for [security](./security) reasons. So, the web had basically moved on from text-based transport a long time ago, but HTTP had not. One reason for this stagnation was because it was so difficult to introduce any breaking changes to such a ubiquitous protocol like HTTP (previous efforts had tried and failed). Many routers, firewalls, and other middleboxes understood HTTP and would react badly to major changes to it. Upgrading them all to support a new version was simply not possible. diff --git a/src/content/en/2019/javascript.md b/src/content/en/2019/javascript.md index a9f53c6554a..693ccb1ab9b 100644 --- a/src/content/en/2019/javascript.md +++ b/src/content/en/2019/javascript.md @@ -146,8 +146,8 @@ In the context of browser-server interactions, resource compression refers to co There are multiple text-compression algorithms, but only two are mostly used for the compression (and decompression) of HTTP network requests: -- [Gzip](https://www.gzip.org/) (gzip): The most widely used compression format for server and client interactions -- [Brotli](https://github.com/google/brotli) (br): A newer compression algorithm aiming to further improve compression ratios. [90% of browsers](https://caniuse.com/#feat=brotli) support Brotli encoding. +- [Gzip](https://www.gzip.org/) (`gzip`): The most widely used compression format for server and client interactions +- [Brotli](https://github.com/google/brotli) (`br`): A newer compression algorithm aiming to further improve compression ratios. [90% of browsers](https://caniuse.com/#feat=brotli) support Brotli encoding. Compressed scripts will always need to be uncompressed by the browser once transferred. This means its content remains the same and execution times are not optimized whatsoever. Resource compression, however, will always improve download times which also is one of the most expensive stages of JavaScript processing. Ensuring JavaScript files are compressed correctly can be one of the most significant factors in improving site performance. @@ -155,8 +155,8 @@ How many sites are compressing their JavaScript resources? {{ figure_markup( image="fig10.png", - caption="Percentage of sites compressing JavaScript resources with gzip or brotli.", - description="Bar chart showing 67%/65% of JavaScript resources are compressed with gzip on desktop and mobile respectively, and 15%/14% are compressed using Brotli.", + caption="Percentage of sites compressing JavaScript resources with Gzip or Brotli.", + description="Bar chart showing 67%/65% of JavaScript resources are compressed with Gzip on desktop and mobile respectively, and 15%/14% are compressed using Brotli.", chart_url="https://docs.google.com/spreadsheets/d/e/2PACX-1vTpzDb9HGbdVvin6YPTOmw11qBVGGysltxmH545fUfnqIThAq878F_b-KxUo65IuXaeFVSnlmJ5K1Dm/pubchart?oid=241928028&format=interactive" ) }} diff --git a/src/content/en/2020/caching.md b/src/content/en/2020/caching.md index 454165fe311..3a9e9616fdf 100644 --- a/src/content/en/2020/caching.md +++ b/src/content/en/2020/caching.md @@ -503,7 +503,7 @@ One large source of invalid `Expires` headers is from assets served from a popul ## The `Vary` header -We have discussed how a caching entity can determine whether a response object is cacheable, and for how long it can be cached. However, one of the most important steps the caching entity must take is determining if the resource being requested is already in its cache. While this may seem simple, many times the URL alone is not enough to determine this. For example, requests with the same URL could vary in what compression they used (gzip, brotli, etc.) or could be returned in different encodings (XML, JSON etc.). +We have discussed how a caching entity can determine whether a response object is cacheable, and for how long it can be cached. However, one of the most important steps the caching entity must take is determining if the resource being requested is already in its cache. While this may seem simple, many times the URL alone is not enough to determine this. For example, requests with the same URL could vary in what compression they used (Gzip, Brotli, etc.) or could be returned in different encodings (XML, JSON etc.). To solve this problem, when a caching entity caches an object, it gives the object a unique identifier (a cache key). When it needs to determine whether the object is in its cache, it checks for the existence of the object using the cache key as a lookup. By default, this cache key is simply the URL used to retrieve the object, but servers can tell the caching entity to include other 'attributes' of the response (such as compression method) in the cache key, by including the Vary response header, to ensure that the correct object is subsequently retrieved from cache - the `Vary` header identifies 'variants' of the object, based on factors other than the URL. diff --git a/src/content/es/2019/javascript.md b/src/content/es/2019/javascript.md index 07040f4d296..4897f095905 100644 --- a/src/content/es/2019/javascript.md +++ b/src/content/es/2019/javascript.md @@ -146,8 +146,8 @@ En el contexto de las interacciones navegador-servidor, la compresión de recurs Existen varios algoritmos de compresión de texto, pero solo dos se utilizan principalmente para la compresión (y descompresión) de solicitudes de red HTTP: -- [Gzip](https://www.gzip.org/) (gzip): El formato de compresión más utilizado para las interacciones de servidor y cliente. -- [Brotli](https://github.com/google/brotli) (br): Un algoritmo de compresión más nuevo que apunta a mejorar aún más las relaciones de compresión. [90% de los navegadores](https://caniuse.com/#feat=brotli) soportan la codificación Brotli. +- [Gzip](https://www.gzip.org/) (`gzip`): El formato de compresión más utilizado para las interacciones de servidor y cliente. +- [Brotli](https://github.com/google/brotli) (`br`): Un algoritmo de compresión más nuevo que apunta a mejorar aún más las relaciones de compresión. [90% de los navegadores](https://caniuse.com/#feat=brotli) soportan la codificación Brotli. Los scripts comprimidos siempre deberán ser descomprimidos por el navegador una vez transferidos. Esto significa que su contenido sigue siendo el mismo y los tiempos de ejecución no están optimizados en absoluto. Sin embargo, la compresión de recursos siempre mejorará los tiempos de descarga, que también es una de las etapas más caras del procesamiento de JavaScript. Asegurarse de que los archivos JavaScript se comprimen correctamente puede ser uno de los factores más importantes para mejorar el rendimiento del sitio. @@ -155,8 +155,8 @@ Los scripts comprimidos siempre deberán ser descomprimidos por el navegador una {{ figure_markup( image="fig10.png", - caption="Porcentaje de sitios que comprimen recursos de JavaScript con gzip o brotli.", - description="Gráfico de barras que muestra el 67% / 65% de los recursos de JavaScript se comprime con gzip en computadoras de escritorio y dispositivos móviles respectivamente, y el 15% / 14% se comprime con Brotli.", + caption="Porcentaje de sitios que comprimen recursos de JavaScript con Gzip o Brotli.", + description="Gráfico de barras que muestra el 67% / 65% de los recursos de JavaScript se comprime con Gzip en computadoras de escritorio y dispositivos móviles respectivamente, y el 15% / 14% se comprime con Brotli.", chart_url="https://docs.google.com/spreadsheets/d/e/2PACX-1vTpzDb9HGbdVvin6YPTOmw11qBVGGysltxmH545fUfnqIThAq878F_b-KxUo65IuXaeFVSnlmJ5K1Dm/pubchart?oid=241928028&format=interactive" ) }} diff --git a/src/content/fr/2019/caching.md b/src/content/fr/2019/caching.md index 2a7e95571ad..bc56431393e 100644 --- a/src/content/fr/2019/caching.md +++ b/src/content/fr/2019/caching.md @@ -542,7 +542,7 @@ La plus grande source d'en-têtes `Expires` invalides provient de ressources ser ## En-tête `Vary` -L'une des étapes les plus importantes de la mise en cache est de déterminer si la ressource demandée est mise en cache ou non. Bien que cela puisse paraître simple, il arrive souvent que l'URL seule ne suffise pas à le déterminer. Par exemple, les requêtes ayant la même URL peuvent varier en fonction de la [compression](./compression) utilisée (gzip, brotli, etc.) ou être modifiées et adaptées aux visiteurs mobiles. +L'une des étapes les plus importantes de la mise en cache est de déterminer si la ressource demandée est mise en cache ou non. Bien que cela puisse paraître simple, il arrive souvent que l'URL seule ne suffise pas à le déterminer. Par exemple, les requêtes ayant la même URL peuvent varier en fonction de la [compression](./compression) utilisée (Gzip, Brotli, etc.) ou être modifiées et adaptées aux visiteurs mobiles. Pour résoudre ce problème, les clients donnent à chaque ressource mise en cache un identifiant unique (une clé de cache). Par défaut, cette clé de cache est simplement l'URL de la ressource, mais les développeurs et développeuses peuvent ajouter d'autres éléments (comme la méthode de compression) en utilisant l'en-tête `Vary`. diff --git a/src/content/fr/2019/compression.md b/src/content/fr/2019/compression.md index 8137434dd52..bb9413bd625 100644 --- a/src/content/fr/2019/compression.md +++ b/src/content/fr/2019/compression.md @@ -15,7 +15,7 @@ featured_quote: La compression HTTP est une technique qui permet de coder des in featured_stat_1: 38 % featured_stat_label_1: Réponses HTTP avec compression de texte featured_stat_2: 80 % -featured_stat_label_2: Utilisent la compression gzip +featured_stat_label_2: Utilisent la compression Gzip featured_stat_3: 56 % featured_stat_label_3: Réponses HTML n'utilisant pas de compression --- @@ -36,7 +36,7 @@ Dans ce chapitre, nous allons analyser comment le contenu textuel est compressé Lorsqu’un client effectue une requête HTTP, celle-ci comprend souvent un en-tête [`Accept-Encoding`](https://developer.mozilla.org/fr/docs/Web/HTTP/Headers/Accept-Encoding) pour communiquer les algorithmes qu’il est capable de décoder. Le serveur peut alors choisir parmi eux un encodage qu’il prend en charge et servir la réponse compressée. La réponse compressée comprendra un en-tête [`Content-Encoding`](https://developer.mozilla.org/fr/docs/Web/HTTP/Headers/Content-Encoding) afin que le client sache quelle compression a été utilisée. En outre, l’en-tête [`Content-Type`](https://developer.mozilla.org/fr/docs/Web/HTTP/Headers/Content-Type) est souvent utilisé pour indiquer le [type MIME](https://developer.mozilla.org/fr/docs/Web/HTTP/Basics_of_HTTP/MIME_types) de la ressource servie. -Dans l’exemple ci-dessous, le client indique supporter la compression gzip, brotli et deflate. Le serveur a décidé de renvoyer une réponse compressée avec gzip contenant un document `text/html`. +Dans l’exemple ci-dessous, le client indique supporter la compression Gzip, Brotli et deflate. Le serveur a décidé de renvoyer une réponse compressée avec Gzip contenant un document `text/html`. ``` @@ -55,11 +55,11 @@ HTTP Archive contient des mesures pour 5,3 millions de sites web, et chaque site ## Algorithmes de compression -L’IANA tient à jour une [liste des encodages de contenu HTTP valide](https://www.iana.org/assignments/http-parameters/http-parameters.xml#content-coding) qui peuvent être utilisés avec les en-têtes « Accept-Encoding » et « Content-Encoding ». On y retrouve notamment gzip, deflate, br (brotli), ainsi que de quelques autres. De brèves descriptions de ces algorithmes sont données ci-dessous : +L’IANA tient à jour une [liste des encodages de contenu HTTP valide](https://www.iana.org/assignments/http-parameters/http-parameters.xml#content-coding) qui peuvent être utilisés avec les en-têtes « Accept-Encoding » et « Content-Encoding ». On y retrouve notamment `gzip`, `deflate`, `br` (Brotli), ainsi que de quelques autres. De brèves descriptions de ces algorithmes sont données ci-dessous : -* [Gzip](https://tools.ietf.org/html/rfc1952) utilise les techniques de compression [LZ77](https://fr.wikipedia.org/wiki/LZ77_et_LZ78#LZ77) et [le codage de Huffman](https://fr.wikipedia.org/wiki/Codage_de_Huffman) qui sont plus ancienes que le web lui-même. Elles ont été développés à l’origine pour le programme gzip d’UNIX en 1992. Une implémentation pour la diffusion sur le web existe depuis HTTP/1.1, et la plupart des navigateurs et clients web la prennent en charge. -* [Deflate](https://tools.ietf.org/html/rfc1951) utilise le même algorithme que gzip, mais avec un conteneur différent. Son utilisation n’a pas été largement adoptée sur le web pour des [raisons de compatibilité](https://en.wikipedia.org/wiki/HTTP_compression#Problems_preventing_the_use_of_HTTP_compression) avec d’autres serveurs et navigateurs. -* [Brotli](https://tools.ietf.org/html/rfc7932) est un algorithme de compression plus récent qui a été [inventé par Google](https://github.com/google/brotli). Il utilise la combinaison d’une variante moderne de l’algorithme LZ77, le codage de Huffman et la modélisation du contexte du second ordre. La compression via brotli est plus coûteuse en termes de calcul par rapport à gzip, mais l’algorithme est capable de réduire les fichiers de [15-25 %](https://cran.r-project.org/web/packages/brotli/vignettes/brotli-2015-09-22.pdf) de plus que la compression gzip. Brotli a été utilisé pour la première fois pour la compression de contenu web en 2015 et est [supporté par tous les navigateurs web modernes](https://caniuse.com/#feat=brotli). +* [Gzip](https://tools.ietf.org/html/rfc1952) utilise les techniques de compression [LZ77](https://fr.wikipedia.org/wiki/LZ77_et_LZ78#LZ77) et [le codage de Huffman](https://fr.wikipedia.org/wiki/Codage_de_Huffman) qui sont plus ancienes que le web lui-même. Elles ont été développés à l’origine pour le programme `gzip` d’UNIX en 1992. Une implémentation pour la diffusion sur le web existe depuis HTTP/1.1, et la plupart des navigateurs et clients web la prennent en charge. +* [Deflate](https://tools.ietf.org/html/rfc1951) utilise le même algorithme que Gzip, mais avec un conteneur différent. Son utilisation n’a pas été largement adoptée sur le web pour des [raisons de compatibilité](https://en.wikipedia.org/wiki/HTTP_compression#Problems_preventing_the_use_of_HTTP_compression) avec d’autres serveurs et navigateurs. +* [Brotli](https://tools.ietf.org/html/rfc7932) est un algorithme de compression plus récent qui a été [inventé par Google](https://github.com/google/brotli). Il utilise la combinaison d’une variante moderne de l’algorithme LZ77, le codage de Huffman et la modélisation du contexte du second ordre. La compression via Brotli est plus coûteuse en termes de calcul par rapport à Gzip, mais l’algorithme est capable de réduire les fichiers de [15-25 %](https://cran.r-project.org/web/packages/brotli/vignettes/brotli-2015-09-22.pdf) de plus que la compression Gzip. Brotli a été utilisé pour la première fois pour la compression de contenu web en 2015 et est [supporté par tous les navigateurs web modernes](https://caniuse.com/#feat=brotli). Environ 38 % des réponses HTTP sont fournies avec de la compression de texte. Cette statistique peut sembler surprenante, mais n’oubliez pas qu’elle est basée sur toutes les requêtes HTTP de l’ensemble de données. Certains contenus, tels que les images, ne bénéficieront pas de ces algorithmes de compression. Le tableau ci-dessous résume le pourcentage de requêtes servies pour chaque type de compression. @@ -88,21 +88,21 @@ Environ 38 % des réponses HTTP sont fournies avec de la compression de tex - + - + - + @@ -116,28 +116,28 @@ Environ 38 % des réponses HTTP sont fournies avec de la compression de tex - + - + - + - + @@ -148,7 +148,7 @@ Environ 38 % des réponses HTTP sont fournies avec de la compression de tex
Illustration 1. Adoption des algorithmes de compression.
-Parmi les ressources qui sont servies compressées, la majorité utilise soit gzip (80 %), soit brotli (20 %). Les autres algorithmes de compression sont peu utilisés. +Parmi les ressources qui sont servies compressées, la majorité utilise soit Gzip (80 %), soit Brotli (20 %). Les autres algorithmes de compression sont peu utilisés. {{ figure_markup( image="fig2.png", @@ -162,16 +162,16 @@ De plus, il y a 67k requêtes qui renvoient un `Content-Encoding` invalide, tel Nous ne pouvons pas déterminer les niveaux de compression à partir des diagnostics recueillis par HTTP Archive, mais les bonnes pratiques pour compresser du contenu sont : -* Au minimum, activez le niveau 6 de compression gzip pour les ressources textuelles. Cela permet un bon compromis entre le coût de calcul et le taux de compression, c'est la [valeur par défaut pour de nombreux serveurs web](https://paulcalvano.com/index.php/2018/07/25/brotli-compression-how-much-will-it-reduce-your-content/). Toutefois [Nginx utilise par défaut le niveau 1, c’est souvent trop bas](http://nginx.org/en/docs/http/ngx_http_gzip_module.html#gzip_comp_level). -* Si vous pouvez supporter brotli et la précompression des ressources, alors compressez au niveau 11. Cette méthode est plus coûteuse en termes de calcul que gzip - la précompression est donc absolument nécessaire pour éviter les délais. -* Si vous pouvez supporter le brotli et que vous ne pouvez pas le précompresser, alors compressez au niveau 5. Ce niveau se traduira par un cout de compression plus petit que gzip, avec un coût de calcul similaire. +* Au minimum, activez le niveau 6 de compression Gzip pour les ressources textuelles. Cela permet un bon compromis entre le coût de calcul et le taux de compression, c'est la [valeur par défaut pour de nombreux serveurs web](https://paulcalvano.com/index.php/2018/07/25/brotli-compression-how-much-will-it-reduce-your-content/). Toutefois [Nginx utilise par défaut le niveau 1, c’est souvent trop bas](http://nginx.org/en/docs/http/ngx_http_gzip_module.html#gzip_comp_level). +* Si vous pouvez supporter Brotli et la précompression des ressources, alors compressez au niveau 11. Cette méthode est plus coûteuse en termes de calcul que Gzip - la précompression est donc absolument nécessaire pour éviter les délais. +* Si vous pouvez supporter le Brotli et que vous ne pouvez pas le précompresser, alors compressez au niveau 5. Ce niveau se traduira par un cout de compression plus petit que Gzip, avec un coût de calcul similaire. ## Quels types de contenus compressons-nous ? -La plupart des ressources textuelles (telles que HTML, CSS et JavaScript) peuvent bénéficier de la compression gzip ou brotli. Cependant, il n’est souvent pas nécessaire d’utiliser ces techniques sur des ressources binaires, telles que les images, les vidéos et certaines polices Web, leurs formats de fichiers sont déjà compressés. +La plupart des ressources textuelles (telles que HTML, CSS et JavaScript) peuvent bénéficier de la compression Gzip ou Brotli. Cependant, il n’est souvent pas nécessaire d’utiliser ces techniques sur des ressources binaires, telles que les images, les vidéos et certaines polices Web, leurs formats de fichiers sont déjà compressés. -Dans le graphique ci-dessous, les 25 premiers types de contenus sont affichés sous forme de boîtes dont les dimensions représentent le nombre de requêtes correspondantes. La couleur de chaque boîte représente le nombre de ces ressources qui ont été servies compressées. La plupart des contenus médias sont nuancés en orange, ce qui est normal puisque gzip et brotli ne leur apporteraient que peu ou pas d’avantages. La plupart des contenus textuels sont nuancés en bleu pour indiquer qu’ils sont compressés. Cependant, la teinte bleu clair de certains types de contenu indique qu’ils ne sont pas compressés de manière aussi cohérente que les autres. +Dans le graphique ci-dessous, les 25 premiers types de contenus sont affichés sous forme de boîtes dont les dimensions représentent le nombre de requêtes correspondantes. La couleur de chaque boîte représente le nombre de ces ressources qui ont été servies compressées. La plupart des contenus médias sont nuancés en orange, ce qui est normal puisque Gzip et Brotli ne leur apporteraient que peu ou pas d’avantages. La plupart des contenus textuels sont nuancés en bleu pour indiquer qu’ils sont compressés. Cependant, la teinte bleu clair de certains types de contenu indique qu’ils ne sont pas compressés de manière aussi cohérente que les autres. {{ figure_markup( image="fig3.png", @@ -255,13 +255,13 @@ Les types de contenus ayant les taux de compression les plus faibles sont `appli ) }} -Dans tous les types de contenus, gzip est l’algorithme de compression le plus populaire. La compression brotli, plus récente, est utilisée moins fréquemment, et les types de contenus où elle apparaît le plus sont `application/javascript`, `text/css` et `application/x-javascript`. Cela est probablement dû aux CDN qui appliquent automatiquement la compression brotli pour le trafic qui y transite. +Dans tous les types de contenus, Gzip est l’algorithme de compression le plus populaire. La compression Brotli, plus récente, est utilisée moins fréquemment, et les types de contenus où elle apparaît le plus sont `application/javascript`, `text/css` et `application/x-javascript`. Cela est probablement dû aux CDN qui appliquent automatiquement la compression Brotli pour le trafic qui y transite. ## Compression de première partie et tierce partie Dans le chapitre les [tierces parties](./third-parties), nous avons appris le rôle des tiers et leur impact sur les performances. Lorsque nous comparons les techniques de compression entre les premières et les tierces parties, nous pouvons constater que le contenu des tierces parties a tendance à être plus compressé que le contenu des premières parties. -En outre, le pourcentage de compression brotli est plus élevé pour les contenus tiers. Cela est probablement dû au nombre de ressources servies par les grands tiers qui soutiennent généralement le brotli, comme Google et Facebook. +En outre, le pourcentage de compression Brotli est plus élevé pour les contenus tiers. Cela est probablement dû au nombre de ressources servies par les grands tiers qui soutiennent généralement le Brotli, comme Google et Facebook.
58.26%
gzipgzip 29.33% 30.20% 30.87% 31.22%
brbr 4.41% 10.49% 4.56% 10.49%
deflatedeflate 0.02% 0.01% 0.02%285,158,644
gzipgzip 29,66 % 30,95 % 122,789,094 143,549,122
brbr 7.43 % 7.55 % 30,750,681 35,012,368
deflatedeflate 0,02 % 0,02 % 68,80268,352
identitéidentity 0,000709 % 0,000563 % 2,935 2,611
x-gzipx-gzip 0,000193 % 0,000179 % 800 829
compresscompress 0,000008 % 0,000007 % 33 32
x-compressx-compress 0,000002 % 0,000006 % 8
@@ -288,21 +288,21 @@ En outre, le pourcentage de compression brotli est plus élevé pour les contenu - + - + - + @@ -363,8 +363,8 @@ Lighthouse indique également combien d’octets pourraient être économisés e ## Conclusion -La compression HTTP est une fonctionnalité très utilisée et très précieuse pour réduire la taille des contenus web. La compression gzip et brotli sont les deux algorithmes les plus utilisés, et la quantité de contenu compressé varie selon le type de contenu. Des outils comme Lighthouse peuvent aider à découvrir des solutions pour comprimer le contenu. +La compression HTTP est une fonctionnalité très utilisée et très précieuse pour réduire la taille des contenus web. La compression Gzip et Brotli sont les deux algorithmes les plus utilisés, et la quantité de contenu compressé varie selon le type de contenu. Des outils comme Lighthouse peuvent aider à découvrir des solutions pour comprimer le contenu. Bien que de nombreux sites fassent bon usage de la compression HTTP, il y a encore des possibilités d’amélioration, en particulier pour le format `text/html` sur lequel le web est construit ! De même, les formats de texte moins bien compris comme `font/ttf`, `application/json`, `text/xml`, `text/plain`, `image/svg+xml`, et `image/x-icon` peuvent nécessiter une configuration supplémentaire qui manque à de nombreux sites web. -Au minimum, les sites web devraient utiliser la compression gzip pour toutes les ressources textuelles, car elle est largement prise en charge, facile à mettre en œuvre et a un faible coût de traitement. Des économies supplémentaires peuvent être réalisées grâce à la compression brotli, bien que les niveaux de compression doivent être choisis avec soin en fonction de la possibilité de précompression d’une ressource. +Au minimum, les sites web devraient utiliser la compression Gzip pour toutes les ressources textuelles, car elle est largement prise en charge, facile à mettre en œuvre et a un faible coût de traitement. Des économies supplémentaires peuvent être réalisées grâce à la compression Brotli, bien que les niveaux de compression doivent être choisis avec soin en fonction de la possibilité de précompression d’une ressource. diff --git a/src/content/fr/2019/javascript.md b/src/content/fr/2019/javascript.md index 2cc3fa16f78..1a5dd1e589f 100644 --- a/src/content/fr/2019/javascript.md +++ b/src/content/fr/2019/javascript.md @@ -146,8 +146,8 @@ Dans le contexte des interactions entre navigateur et serveur, la compression de Il existe de nombreux algorithmes de compression de texte, mais seuls deux sont principalement utilisés pour la compression (et la décompression) des requêtes sur le réseau HTTP : -- [Gzip](https://www.gzip.org/) (gzip) : le format de compression le plus utilisé pour les interactions entre serveurs et clients ; -- [Brotli](https://github.com/google/brotli) (br) : un algorithme de compression plus récent visant à améliorer encore les taux de compression. [90 % des navigateurs](https://caniuse.com/#feat=brotli) supportent la compression Brotli. +- [Gzip](https://www.gzip.org/) (`gzip`) : le format de compression le plus utilisé pour les interactions entre serveurs et clients ; +- [Brotli](https://github.com/google/brotli) (`br`) : un algorithme de compression plus récent visant à améliorer encore les taux de compression. [90 % des navigateurs](https://caniuse.com/#feat=brotli) supportent la compression Brotli. Les scripts compressés devront toujours être décompressés par le navigateur une fois transférés. Cela signifie que son contenu reste le même et que les temps d’exécution ne sont pas du tout optimisés. Cependant, la compression des ressources améliorera toujours leur temps de téléchargement, qui est également l’une des étapes les plus coûteuses du traitement JavaScript. S’assurer que les fichiers JavaScript sont correctement compressés peut constituer un des principaux facteurs d’amélioration des performances pour un site web. @@ -155,8 +155,8 @@ Combien de sites compressent leurs ressources JavaScript ? {{ figure_markup( image="fig10.png", - caption="Pourcentage de sites compressant des ressources JavaScript avec gzip ou brotli.", - description="Diagramme à barres montrant que 67 % / 65 % des ressources JavaScript sont compressées avec gzip sur les ordinateurs de bureau et les mobiles respectivement, et 15 % / 14 % sont compressées en utilisant Brotli.", + caption="Pourcentage de sites compressant des ressources JavaScript avec Gzip ou Brotli.", + description="Diagramme à barres montrant que 67 % / 65 % des ressources JavaScript sont compressées avec Gzip sur les ordinateurs de bureau et les mobiles respectivement, et 15 % / 14 % sont compressées en utilisant Brotli.", chart_url="https://docs.google.com/spreadsheets/d/e/2PACX-1vTpzDb9HGbdVvin6YPTOmw11qBVGGysltxmH545fUfnqIThAq878F_b-KxUo65IuXaeFVSnlmJ5K1Dm/pubchart?oid=241928028&format=interactive" ) }} diff --git a/src/content/ja/2019/caching.md b/src/content/ja/2019/caching.md index a211bd065ec..abd1605ffbd 100644 --- a/src/content/ja/2019/caching.md +++ b/src/content/ja/2019/caching.md @@ -413,7 +413,7 @@ HTTP/1.1[仕様](https://tools.ietf.org/html/rfc7234#section-5.2.1)には、`Cac < Last-Modified: Sun, 25 Aug 2019 16:00:30 GMT < Cache-Control: public, max-age=43200 < Expires: Mon, 14 Oct 2019 07:36:57 GMT -< ETag: "1566748830.0-3052-3932359948" +< ETag: "1566748830.0-3052-3932359948" ``` 全体的に、Webで提供されるリソースの59%のキャッシュTTLは、コンテンツの年齢に比べて短すぎます。さらに、TTLと経過時間のデルタの中央値は25日です。 @@ -542,7 +542,7 @@ HTTP/1.1[仕様](https://tools.ietf.org/html/rfc7234#section-5.2.1)には、`Cac ## ヘッダーを変更 -キャッシングで最も重要な手順の1つは、要求されているリソースがキャッシュされているかどうかを判断することです。これは単純に見えるかもしれませんが、多くの場合、URLだけではこれを判断するには不十分です。たとえば同じURLのリクエストは、使用する[圧縮](./compression)(gzip、brotliなど)が異なる場合や、モバイルの訪問者に合わせて変更および調整できます。 +キャッシングで最も重要な手順の1つは、要求されているリソースがキャッシュされているかどうかを判断することです。これは単純に見えるかもしれませんが、多くの場合、URLだけではこれを判断するには不十分です。たとえば同じURLのリクエストは、使用する[圧縮](./compression)(Gzip、Brotliなど)が異なる場合や、モバイルの訪問者に合わせて変更および調整できます。 この問題を解決するために、クライアントはキャッシュされた各リソースに一意の識別子(キャッシュキー)を与えます。デフォルトでは、このキャッシュキーは単にリソースのURLですが、開発者はVaryヘッダーを使用して他の要素(圧縮方法など)を追加できます。 diff --git a/src/content/ja/2019/cdn.md b/src/content/ja/2019/cdn.md index b65f98cbd2d..e8a753e8949 100644 --- a/src/content/ja/2019/cdn.md +++ b/src/content/ja/2019/cdn.md @@ -1057,7 +1057,7 @@ TLSおよびRTTのパフォーマンスにCDNを使用することに加えて 一般にCDNの使用は、TLS1.0のような非常に古くて侵害されたTLSバージョンの使用率が高いoriginホストサービスと比較して、強力な暗号およびTLSバージョンの迅速な採用と高い相関があります。 -

Web Almanacで使用されるChromeは、ホストが提供する最新のTLSバージョンと暗号にバイアスをかけることを強調することが重要です。また、これらのWebページは2019年7月にクロールされ、新しいバージョンを有効にしたWebサイトの採用を反映しています。

+

Web Almanacで使用されるChromeは、ホストが提供する最新のTLSバージョンと暗号にバイアスをかけることを強調することが重要です。また、これらのWebページは2019年7月にクロールされ、新しいバージョンを有効にしたWebサイトの採用を反映しています。

{{ figure_markup( image="fig18.png", @@ -1491,7 +1491,7 @@ CDNのHTTP/2の採用率は70%を超えていますが、originページはほ Webサイトは、さまざまなHTTPヘッダーを使用して、ブラウザーとCDNのキャッシュ動作を制御できます。 最も一般的なのは、最新のものであることを保証するためにoriginへ戻る前に何かをキャッシュできる期間を具体的に決定する `Cache-Control`ヘッダーです。 -別の便利なツールは、`Vary` HTTPヘッダーの使用です。このヘッダーは、キャッシュをフラグメント化する方法をCDNとブラウザーの両方に指示します。`Vary`ヘッダーにより、originはリソースの表現が複数あることを示すことができ、CDNは各バリエーションを個別にキャッシュする必要があります。最も一般的な例は[圧縮](./compression)です。リソースを`Vary:Accept-Encoding`を使用すると、CDNは同じコンテンツを、非圧縮、gzip、Brotliなどの異なる形式でキャッシュできます。一部のCDNでは、使用可能なコピーを1つだけ保持するために、この圧縮を急いで実行します。同様に、この`Vary`ヘッダーは、コンテンツをキャッシュする方法と新しいコンテンツを要求するタイミングをブラウザーに指示します。 +別の便利なツールは、`Vary` HTTPヘッダーの使用です。このヘッダーは、キャッシュをフラグメント化する方法をCDNとブラウザーの両方に指示します。`Vary`ヘッダーにより、originはリソースの表現が複数あることを示すことができ、CDNは各バリエーションを個別にキャッシュする必要があります。最も一般的な例は[圧縮](./compression)です。リソースを`Vary:Accept-Encoding`を使用すると、CDNは同じコンテンツを、非圧縮、Gzip、Brotliなどの異なる形式でキャッシュできます。一部のCDNでは、使用可能なコピーを1つだけ保持するために、この圧縮を急いで実行します。同様に、この`Vary`ヘッダーは、コンテンツをキャッシュする方法と新しいコンテンツを要求するタイミングをブラウザーに指示します。 {{ figure_markup( image="use_of_vary_on_cdn.png", diff --git a/src/content/ja/2019/compression.md b/src/content/ja/2019/compression.md index b6cd8b5ab4e..091c5a84068 100644 --- a/src/content/ja/2019/compression.md +++ b/src/content/ja/2019/compression.md @@ -15,7 +15,7 @@ featured_quote: HTTP圧縮とは、元の表現よりも少ないビット数で featured_stat_1: 38% featured_stat_label_1: テキストベースの圧縮を使用したHTTPレスポンス featured_stat_2: 80% -featured_stat_label_2: gzip圧縮の使用 +featured_stat_label_2: Gzip圧縮の使用 featured_stat_3: 56% featured_stat_label_3: 圧縮を使用していないHTMLレスポンス --- @@ -36,7 +36,7 @@ HTTP圧縮は、元の表現よりも少ないビットを使用して情報を クライアントがHTTPリクエストを作成する場合、多くの場合、デコード可能な圧縮アルゴリズムを示す[`Accept-Encoding`](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Accept-Encoding)ヘッダーが含まれます。サーバーは、示されたエンコードのいずれかを選択してサポートし、圧縮されたレスポンスを提供できます。圧縮されたレスポンスには[`Content-Encoding`](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Encoding)ヘッダーが含まれるため、クライアントはどの圧縮が使用されたかを認識できます。また、提供されるリソースの[MIME](https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/MIME_types)タイプを示すために、[`Content-Type`](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Type)ヘッダーがよく使用されます。 -以下の例では、クライアントはgzip、brotli、およびdeflate圧縮のサポートを示してます。サーバーは、`text/html`ドキュメントを含むgzip圧縮された応答を返すことにしました。 +以下の例では、クライアントはGzip、Brotli、およびDeflate圧縮のサポートを示してます。サーバーは、`text/html`ドキュメントを含むGzip圧縮された応答を返すことにしました。 ``` @@ -44,7 +44,7 @@ HTTP圧縮は、元の表現よりも少ないビットを使用して情報を > Host: httparchive.org > Accept-Encoding: gzip, deflate, br - < HTTP/1.1 200 + < HTTP/1.1 200 < Content-type: text/html; charset=utf-8 < Content-encoding: gzip ``` @@ -55,11 +55,11 @@ HTTP Archiveには、530万のWebサイトの測定値が含まれており、 ## 圧縮アルゴリズム -IANAは、`Accept-Encoding`および`Content-Encoding`ヘッダーで使用できる有効な[HTTPコンテンツエンコーディングのリスト](https://www.iana.org/assignments/http-parameters/http-parameters.xml#content-coding)を保持しています。これらには、gzip、deflate、br(brotli)などが含まれます。これらのアルゴリズムの簡単な説明を以下に示します。 +IANAは、`Accept-Encoding`および`Content-Encoding`ヘッダーで使用できる有効な[HTTPコンテンツエンコーディングのリスト](https://www.iana.org/assignments/http-parameters/http-parameters.xml#content-coding)を保持しています。これらには、`gzip`、`deflate`、`br`(Brotli)などが含まれます。これらのアルゴリズムの簡単な説明を以下に示します。 -* [Gzip](https://tools.ietf.org/html/rfc1952)は、[LZ77](https://ja.wikipedia.org/wiki/LZ77)および[ハフマンコーディング](https://ja.wikipedia.org/wiki/%E3%83%8F%E3%83%95%E3%83%9E%E3%83%B3%E7%AC%A6%E5%8F%B7)圧縮技術を使用しており、Web自体よりも古い。もともと1992年にUNIX gzipプログラム用として開発されました。HTTP/ 1.1以降、Web配信の実装が存在し、ほとんどのブラウザーとクライアントがそれをサポートしています。 -* [Deflate](https://tools.ietf.org/html/rfc1951)はgzipと同じアルゴリズムを使用しますが、コンテナは異なります。一部のサーバーおよびブラウザとの互換性の問題のため、Webでの使用は広く採用されていません。 -* [Brotli](https://tools.ietf.org/html/rfc7932)は、[Googleが発明](https://github.com/google/brotli)した新しい圧縮アルゴリズムです。 LZ77アルゴリズムの最新のバリアント、ハフマンコーディング、および2次コンテキストモデリングの組み合わせを使用します。 Brotliを介した圧縮はgzipと比較して計算コストが高くなりますが、アルゴリズムはgzip圧縮よりもファイルを[15〜25%](https://cran.r-project.org/web/packages/brotli/vignettes/brotli-2015-09-22.pdf)削減できます。 Brotliは2015年にWebコンテンツの圧縮に初めて使用され、[すべての最新のWebブラウザーでサポートされています](https://caniuse.com/#feat=brotli)。 +* [Gzip](https://tools.ietf.org/html/rfc1952)は、[LZ77](https://ja.wikipedia.org/wiki/LZ77)および[ハフマンコーディング](https://ja.wikipedia.org/wiki/%E3%83%8F%E3%83%95%E3%83%9E%E3%83%B3%E7%AC%A6%E5%8F%B7)圧縮技術を使用しており、Web自体よりも古い。もともと1992年にUNIX `gzip`プログラム用として開発されました。HTTP/ 1.1以降、Web配信の実装が存在し、ほとんどのブラウザーとクライアントがそれをサポートしています。 +* [Deflate](https://tools.ietf.org/html/rfc1951)はGzipと同じアルゴリズムを使用しますが、コンテナは異なります。一部のサーバーおよびブラウザとの互換性の問題のため、Webでの使用は広く採用されていません。 +* [Brotli](https://tools.ietf.org/html/rfc7932)は、[Googleが発明](https://github.com/google/brotli)した新しい圧縮アルゴリズムです。 LZ77アルゴリズムの最新のバリアント、ハフマンコーディング、および2次コンテキストモデリングの組み合わせを使用します。 Brotliを介した圧縮はGzipと比較して計算コストが高くなりますが、アルゴリズムはGzip圧縮よりもファイルを[15〜25%](https://cran.r-project.org/web/packages/brotli/vignettes/brotli-2015-09-22.pdf)削減できます。 Brotliは2015年にWebコンテンツの圧縮に初めて使用され、[すべての最新のWebブラウザーでサポートされています](https://caniuse.com/#feat=brotli)。 HTTPレスポンスの約38%はテキストベースの圧縮で配信されます。これは驚くべき統計のように思えるかもしれませんが、データセット内のすべてのHTTP要求に基づいていることに留意してください。画像などの一部のコンテンツは、これらの圧縮アルゴリズムの恩恵を受けません。次の表は、各コンテンツエンコーディングで処理されるリクエストの割合をまとめたものです。 @@ -88,21 +88,21 @@ HTTPレスポンスの約38%はテキストベースの圧縮で配信され - + - + - + @@ -116,28 +116,28 @@ HTTPレスポンスの約38%はテキストベースの圧縮で配信され - + - + - + - + @@ -148,7 +148,7 @@ HTTPレスポンスの約38%はテキストベースの圧縮で配信され
{{ figure_link(caption="圧縮アルゴリズムの採用。") }}
-圧縮されて提供されるリソースの大半は、gzip(80%)またはbrotli(20%)のいずれかを使用しています。他の圧縮アルゴリズムはあまり使用されません。 +圧縮されて提供されるリソースの大半は、Gzip(80%)またはBrotli(20%)のいずれかを使用しています。他の圧縮アルゴリズムはあまり使用されません。 {{ figure_markup( image="fig2.png", @@ -162,16 +162,16 @@ HTTPレスポンスの約38%はテキストベースの圧縮で配信され HTTP Archiveによって収集された診断から圧縮レベルを判断することはできませんが、コンテンツを圧縮するためのベストプラクティスは次のとおりです。 -* 少なくとも、テキストベースのアセットに対してgzip圧縮レベル6を有効にします。これは、計算コストと圧縮率の間の公平なトレードオフを提供し、[多くのWebサーバーのデフォルト](https://paulcalvano.com/index.php/2018/07/25/brotli-compression-how-much-will-it-reduce-your-content/)にもかかわらず、[Nginxは依然として低すぎることが多いレベル1のままです](http://nginx.org/en/docs/http/ngx_http_gzip_module.html#gzip_comp_level)。 -* brotliおよびprecompressリソースをサポートできる場合は、brotliレベル11に圧縮します。これはgzipよりも計算コストが高くなります。したがって、遅延を避けるためには、事前圧縮が絶対に必要です。 -* brotliをサポートでき、事前圧縮できない場合は、brotliレベル5に圧縮します。このレベルでは、gzipと比較してペイロードが小さくなり、同様の計算オーバーヘッドが発生します。 +* 少なくとも、テキストベースのアセットに対してGzip圧縮レベル6を有効にします。これは、計算コストと圧縮率の間の公平なトレードオフを提供し、[多くのWebサーバーのデフォルト](https://paulcalvano.com/index.php/2018/07/25/brotli-compression-how-much-will-it-reduce-your-content/)にもかかわらず、[Nginxは依然として低すぎることが多いレベル1のままです](http://nginx.org/en/docs/http/ngx_http_gzip_module.html#gzip_comp_level)。 +* Brotliおよびprecompressリソースをサポートできる場合は、Brotliレベル11に圧縮します。これはGzipよりも計算コストが高くなります。したがって、遅延を避けるためには、事前圧縮が絶対に必要です。 +* Brotliをサポートでき、事前圧縮できない場合は、Brotliレベル5に圧縮します。このレベルでは、Gzipと比較してペイロードが小さくなり、同様の計算オーバーヘッドが発生します。 ## どの種類のコンテンツを圧縮していますか? -ほとんどのテキストベースのリソース(HTML、CSS、JavaScriptなど)は、gzipまたはbrotli圧縮の恩恵を受けることができます。ただし、多くの場合、これらの圧縮技術をバイナリリソースで使用する必要はありません。画像、ビデオ、一部のWebフォントなどが既に圧縮されているため。 +ほとんどのテキストベースのリソース(HTML、CSS、JavaScriptなど)は、GzipまたはBrotli圧縮の恩恵を受けることができます。ただし、多くの場合、これらの圧縮技術をバイナリリソースで使用する必要はありません。画像、ビデオ、一部のWebフォントなどが既に圧縮されているため。 -次のグラフでは、上位25のコンテンツタイプが、リクエストの相対数を表すボックスサイズで表示されています。各ボックスの色は、これらのリソースのうちどれだけ圧縮されて提供されたかを表します。ほとんどのメディアコンテンツはオレンジ色で網掛けされていますが、これはgzipとbrotliにはほとんどまたはまったく利点がないためです。テキストコンテンツのほとんどは、それらが圧縮されていることを示すために青色で網掛けされています。ただし、一部のコンテンツタイプの水色の網掛けは、他のコンテンツタイプほど一貫して圧縮されていないことを示しています。 +次のグラフでは、上位25のコンテンツタイプが、リクエストの相対数を表すボックスサイズで表示されています。各ボックスの色は、これらのリソースのうちどれだけ圧縮されて提供されたかを表します。ほとんどのメディアコンテンツはオレンジ色で網掛けされていますが、これはGzipとBrotliにはほとんどまたはまったく利点がないためです。テキストコンテンツのほとんどは、それらが圧縮されていることを示すために青色で網掛けされています。ただし、一部のコンテンツタイプの水色の網掛けは、他のコンテンツタイプほど一貫して圧縮されていないことを示しています。 {{ figure_markup( image="fig3.png", @@ -255,13 +255,13 @@ HTTP Archiveによって収集された診断から圧縮レベルを判断す ) }} -すべてのコンテンツタイプで、gzipは最も一般的な圧縮アルゴリズムです。新しいbrotli圧縮はあまり頻繁に使用されず、最も多く表示されるコンテンツタイプは`application/javascript`、`text/css`、`application/x-javascript`です。これは、CDNが通過するトラフィックにbrotli圧縮を自動的に適用することの原因である可能性があります。 +すべてのコンテンツタイプで、Gzipは最も一般的な圧縮アルゴリズムです。新しいBrotli圧縮はあまり頻繁に使用されず、最も多く表示されるコンテンツタイプは`application/javascript`、`text/css`、`application/x-javascript`です。これは、CDNが通過するトラフィックにBrotli圧縮を自動的に適用することの原因である可能性があります。 ## ファーストパーティとサードパーティの圧縮 [サードパーティ](./third-parties)の章では、サードパーティとパフォーマンスへの影響について学びました。ファーストパーティとサードパーティの圧縮技術を比較すると、サードパーティのコンテンツはファーストパーティのコンテンツよりも圧縮される傾向であることがわかります。 -さらに、サードパーティのコンテンツの場合、brotli圧縮の割合が高くなります。これは、GoogleやFacebookなど、通常brotliをサポートする大規模なサードパーティから提供されるリソースの数が原因である可能性と考えられます。 +さらに、サードパーティのコンテンツの場合、Brotli圧縮の割合が高くなります。これは、GoogleやFacebookなど、通常Brotliをサポートする大規模なサードパーティから提供されるリソースの数が原因である可能性と考えられます。
58.26 %
gzipgzip 29.33 % 30.20 % 30.87 % 31.22 %
brbr 4.41 % 10.49 % 4.56 % 10.49 %
deflatedeflate 0.02 % 0.01 % 0.02 %285,158,644
gzipgzip 29.66% 30.95% 122,789,094 143,549,122
brbr 7.43% 7.55% 30,750,681 35,012,368
deflatedeflate 0.02% 0.02% 68,80268,352
identityidentity 0.000709% 0.000563% 2,935 2,611
x-gzipx-gzip 0.000193% 0.000179% 800 829
compresscompress 0.000008% 0.000007% 33 32
x-compressx-compress 0.000002% 0.000006% 8
@@ -288,21 +288,21 @@ HTTP Archiveによって収集された診断から圧縮レベルを判断す - + - + - + @@ -363,8 +363,8 @@ Lighthouseは、テキストベースの圧縮を有効にすることで、保 ## 結論 -HTTP圧縮は、Webコンテンツのサイズを削減するために広く使用されている非常に貴重な機能です。 gzipとbrotliの両方の圧縮が使用される主要なアルゴリズムであり、圧縮されたコンテンツの量はコンテンツの種類によって異なります。 Lighthouseなどのツールは、コンテンツを圧縮する機会を発見するのに役立ちます。 +HTTP圧縮は、Webコンテンツのサイズを削減するために広く使用されている非常に貴重な機能です。 GzipとBrotliの両方の圧縮が使用される主要なアルゴリズムであり、圧縮されたコンテンツの量はコンテンツの種類によって異なります。 Lighthouseなどのツールは、コンテンツを圧縮する機会を発見するのに役立ちます。 多くのサイトがHTTP圧縮をうまく利用していますが、特にWebが構築されている`text/html`形式については、まだ改善の余地があります! 同様に、`font/ttf`、`application/json`、`text/xml`、`text/plain`、`image/svg+xml`、`image/x-icon`のようなあまり理解されていないテキスト形式は、多くのWebサイトで見落とされる余分な構成を取る場合があります。 -Webサイトは広くサポートされており、簡単に実装で処理のオーバーヘッドが低いため、少なくともすべてのテキストベースのリソースにgzip圧縮を使用する必要があります。 brotli圧縮を使用するとさらに節約できますが、リソースを事前に圧縮できるかどうかに基づいて圧縮レベルを慎重に選択する必要があります。 +Webサイトは広くサポートされており、簡単に実装で処理のオーバーヘッドが低いため、少なくともすべてのテキストベースのリソースにGzip圧縮を使用する必要があります。 Brotli圧縮を使用するとさらに節約できますが、リソースを事前に圧縮できるかどうかに基づいて圧縮レベルを慎重に選択する必要があります。 diff --git a/src/content/ja/2019/http2.md b/src/content/ja/2019/http2.md index 0d3c78f96d6..73bcafa4cb7 100644 --- a/src/content/ja/2019/http2.md +++ b/src/content/ja/2019/http2.md @@ -34,7 +34,7 @@ HTTP/2は、ほぼ20年ぶりになるWebのメイン送信プロトコルの初 特に暗号化を設定するための追加の手順を必要とするHTTPSを使用する場合、TCP接続は設定と完全な効率を得るのに時間とリソースを要するため、それ自体に問題が生じます。 HTTP/1.1はこれを幾分改善し、後続のリクエストでTCP接続を再利用できるようにしましたが、それでも並列化の問題は解決しませんでした。 -HTTPはテキストベースですが、実際、少なくとも生の形式でテキストを転送するために使用されることはほとんどありませんでした。 HTTPヘッダーがテキストのままであることは事実でしたが、ペイロード自体しばしばそうではありませんでした。 [HTML](./markup)、[JS](./javascript)、[CSS](./css)などのテキストファイルは通常、gzip、brotliなどを使用してバイナリ形式に転送するため[圧縮](./compression)されます。[画像や動画](./media) などの非テキストファイルは、独自の形式で提供されます。その後、[セキュリティ](./security)上の理由からメッセージ全体を暗号化するために、HTTPメッセージ全体がHTTPSでラップされることがよくあります。 +HTTPはテキストベースですが、実際、少なくとも生の形式でテキストを転送するために使用されることはほとんどありませんでした。 HTTPヘッダーがテキストのままであることは事実でしたが、ペイロード自体しばしばそうではありませんでした。 [HTML](./markup)、[JS](./javascript)、[CSS](./css)などのテキストファイルは通常、Gzip、Brotliなどを使用してバイナリ形式に転送するため[圧縮](./compression)されます。[画像や動画](./media) などの非テキストファイルは、独自の形式で提供されます。その後、[セキュリティ](./security)上の理由からメッセージ全体を暗号化するために、HTTPメッセージ全体がHTTPSでラップされることがよくあります。 そのため、Webは基本的に長い間テキストベースの転送から移行していましたが、HTTPは違いました。この停滞の1つの理由は、HTTPのようなユビキタスプロトコルに重大な変更を導入することが非常に困難だったためです(以前努力しましたが、失敗しました)。多くのルーター、ファイアウォール、およびその他のミドルボックスはHTTPを理解しており、HTTPへの大きな変更に対して過剰に反応します。それらをすべてアップグレードして新しいバージョンをサポートすることは、単に不可能でした。 diff --git a/src/content/ja/2019/javascript.md b/src/content/ja/2019/javascript.md index d8d46b92a64..5290073ba70 100644 --- a/src/content/ja/2019/javascript.md +++ b/src/content/ja/2019/javascript.md @@ -146,8 +146,8 @@ Webページで使用されているJavaScriptの量を分析しようとする テキスト圧縮アルゴリズムは複数ありますが、HTTPネットワークリクエストの圧縮(および解凍)に使われることが多いのはこの2つだけです。 -- [Gzip](https://www.gzip.org/) (gzip): サーバーとクライアントの相互作用のために最も広く使われている圧縮フォーマット。 -- [Brotli](https://github.com/google/brotli) (br): 圧縮率のさらなる向上を目指した新しい圧縮アルゴリズム。[90%のブラウザ](https://caniuse.com/#feat=brotli)がBrotliエンコーディングをサポートしています。 +- [Gzip](https://www.gzip.org/) (`gzip`): サーバーとクライアントの相互作用のために最も広く使われている圧縮フォーマット。 +- [Brotli](https://github.com/google/brotli) (`br`): 圧縮率のさらなる向上を目指した新しい圧縮アルゴリズム。[90%のブラウザ](https://caniuse.com/#feat=brotli)がBrotliエンコーディングをサポートしています。 圧縮されたスクリプトは、一度転送されるとブラウザによって常に解凍される必要があります。これは、コンテンツの内容が変わらないことを意味し、実行時間が最適化されないことを意味します。しかし、リソース圧縮は常にダウンロード時間を改善しますが、これはJavaScriptの処理で最もコストのかかる段階の1つでもあります。JavaScriptファイルが正しく圧縮されていることを確認することは、サイトのパフォーマンスを向上させるための最も重要な要因の1つとなります。 @@ -155,8 +155,8 @@ JavaScriptのリソースを圧縮しているサイトはどれくらいある {{ figure_markup( image="fig10.png", - caption="JavaScript リソースをgzipまたはbrotliで圧縮しているサイトの割合。", - description="バーチャートを見ると、デスクトップとモバイルでそれぞれJavaScriptリソースの67%/65%がgzipで圧縮されており、15%/14%がBrotliで圧縮されていることがわかります。", + caption="JavaScript リソースをGzipまたはBrotliで圧縮しているサイトの割合。", + description="バーチャートを見ると、デスクトップとモバイルでそれぞれJavaScriptリソースの67%/65%がGzipで圧縮されており、15%/14%がBrotliで圧縮されていることがわかります。", chart_url="https://docs.google.com/spreadsheets/d/e/2PACX-1vTpzDb9HGbdVvin6YPTOmw11qBVGGysltxmH545fUfnqIThAq878F_b-KxUo65IuXaeFVSnlmJ5K1Dm/pubchart?oid=241928028&format=interactive" ) }} diff --git a/src/content/pt/2019/http2.md b/src/content/pt/2019/http2.md index d48f7478717..59ae0e1fcbc 100644 --- a/src/content/pt/2019/http2.md +++ b/src/content/pt/2019/http2.md @@ -34,7 +34,7 @@ O protocolo parecia simples, mas também apresentava limitações. Como o HTTP e Isso por si só já trás seus próprios problemas considerando que as conexões TCP demandam tempo e recursos para serem estabelecidas e obter eficiência total, especialmente ao usar HTTPS, que requer etapas adicionais para configurar a criptografia. O HTTP/1.1 melhorou isso em alguma medida, permitindo a reutilização de conexões TCP para requisições subsequentes, mas ainda não resolveu a dificuldade em paralelização. -Apesar do HTTP ser baseado em texto, a realidade é que ele raramente era usado para transportar texto, ao menos em seu formato puro. Embora fosse verdade que os cabeçalhos ainda eram texto, os payloads em si frequentemente não eram. Arquivos de texto como [HTML](./markup), [JS](./javascript) e [CSS](./css) costumam ser [compactados](./compression) para transporte em formato binário usando gzip, brotli ou similar. Arquivos não textuais como [imagens e vídeos](./media) são distribuidos em seus próprio formatos. A mensagem HTTP completa é então costumeiramente encapsulada em HTTPS para criptografar as mensagens por razões de [segurança](./security). +Apesar do HTTP ser baseado em texto, a realidade é que ele raramente era usado para transportar texto, ao menos em seu formato puro. Embora fosse verdade que os cabeçalhos ainda eram texto, os payloads em si frequentemente não eram. Arquivos de texto como [HTML](./markup), [JS](./javascript) e [CSS](./css) costumam ser [compactados](./compression) para transporte em formato binário usando Gzip, Brotli ou similar. Arquivos não textuais como [imagens e vídeos](./media) são distribuidos em seus próprio formatos. A mensagem HTTP completa é então costumeiramente encapsulada em HTTPS para criptografar as mensagens por razões de [segurança](./security). Portanto, a web tinha basicamente movido de um transporte baseado em texto há muito tempo, mas o HTTP não. Uma razão para essa estagnação foi porque era muito difícil introduzir qualquer alteração significativa em um protocolo tão onipresente como o HTTP (esforços anteriores haviam tentado e falhado). Muitos roteadores, firewalls e outros dispositivos de rede entendiam o HTTP e reagiriam mal a grandes mudanças maiores. Atualizar todos eles para suportar uma nova versão simplesmente não era impossível. diff --git a/src/content/pt/2019/javascript.md b/src/content/pt/2019/javascript.md index dbca2a135af..467b9d89e65 100644 --- a/src/content/pt/2019/javascript.md +++ b/src/content/pt/2019/javascript.md @@ -146,8 +146,8 @@ No contexto das interações navegador-servidor, a compactação de recursos se Existem vários algoritmos de compactação de texto, mas apenas dois são usados ​​principalmente para compactação (e descompressão) de solicitações de rede HTTP: -- [Gzip](https://www.gzip.org/) (gzip): O formato de compactação mais amplamente usado para interações de servidor e cliente. -- [Brotli](https://github.com/google/brotli) (br): Um algoritmo de compressão mais recente que visa melhorar ainda mais as taxas de compressão. [90% dos navegadores](https://caniuse.com/#feat=brotli) eles suportam a codificação Brotli. +- [Gzip](https://www.gzip.org/) (`gzip`): O formato de compactação mais amplamente usado para interações de servidor e cliente. +- [Brotli](https://github.com/google/brotli) (`br`): Um algoritmo de compressão mais recente que visa melhorar ainda mais as taxas de compressão. [90% dos navegadores](https://caniuse.com/#feat=brotli) eles suportam a codificação Brotli. Scripts compactados devem sempre ser descompactados pelo navegador depois de transferidos. Isso significa que seu conteúdo permanece o mesmo e os tempos de execução não são otimizados de forma alguma. No entanto, a compactação de recursos sempre melhorará os tempos de download, que também é um dos estágios mais caros do processamento de JavaScript. Garantir que os arquivos JavaScript sejam compactados corretamente pode ser um dos fatores mais importantes para melhorar o desempenho do site. @@ -155,8 +155,8 @@ Quantos sites estão compactando seus recursos JavaScript? {{ figure_markup( image="/static/images/2019/javascript/fig10.png", - caption="Porcentagem de sites que compactam recursos JavaScript com gzip ou brotli.", - description="Gráfico de barras mostrando 67% / 65% dos recursos JavaScript são compactados com gzip em desktops e dispositivos móveis, respectivamente, e 15% / 14% são compactados com Brotli.", + caption="Porcentagem de sites que compactam recursos JavaScript com Gzip ou Brotli.", + description="Gráfico de barras mostrando 67% / 65% dos recursos JavaScript são compactados com Gzip em desktops e dispositivos móveis, respectivamente, e 15% / 14% são compactados com Brotli.", chart_url="https://docs.google.com/spreadsheets/d/e/2PACX-1vTpzDb9HGbdVvin6YPTOmw11qBVGGysltxmH545fUfnqIThAq878F_b-KxUo65IuXaeFVSnlmJ5K1Dm/pubchart?oid=241928028&format=interactive" ) }}
58.26%
gzipgzip 29.33% 30.20% 30.87% 31.22%
brbr 4.41% 10.49% 4.56% 10.49%
deflatedeflate 0.02% 0.01% 0.02%