Skip to content

Commit

Permalink
unforunate edit made statement incorrect (#3950)
Browse files Browse the repository at this point in the history
  • Loading branch information
dwsmart authored Dec 3, 2024
1 parent 29609bf commit ea02f9f
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion src/content/en/2024/seo.md
Original file line number Diff line number Diff line change
Expand Up @@ -97,7 +97,7 @@ The vast majority of `robots.txt` files—97.82% of mobile crawls and 97.80% of

According to RFC 9309 standards, crawlers should limit the size of `robots.txt` files they look at, and the parsing limit must be at least <a hreflang="en" href="https://www.rfc-editor.org/rfc/rfc9309.html#name-limit">500 kiB</a>. A `robots.txt` file under that size should be fully parsed. Google, for example, [enforces the max limit at 500 kiB](https://developers.google.com/search/docs/crawling-indexing/robots/robots_txt#file-format). Only a tiny number of sites (just 0.06%) had `robots.txt` files over this limit. Directives found beyond that limit are ignored by the search engine.

Interestingly, 1.59% of mobile crawls and 1.66% of desktop crawls returned a 0-sized `robots.txt` file. This is likely a configuration issue. Since it is not documented by the RFC 9303 specification or support documentation for popular search engine crawlers, it is unclear how this would be handled. If a site returns an empty response for `robots.txt`, a sensible approach would be to return a `robots.txt` file with appropriate rules or, if one wishes to restrict crawling, return a [`404 status code`](https://developer.mozilla.org/docs/Web/HTTP/Status/404) for the URL.
Interestingly, 1.59% of mobile crawls and 1.66% of desktop crawls returned a 0-sized `robots.txt` file. This is likely a configuration issue. Since it is not documented by the RFC 9303 specification or support documentation for popular search engine crawlers, it is unclear how this would be handled. If a site returns an empty response for `robots.txt`, a sensible approach would be to return a `robots.txt` file with appropriate rules or, if one does not wish to restrict crawling, return a [`404 status code`](https://developer.mozilla.org/docs/Web/HTTP/Status/404) for the URL.

#### `robots.txt` user agent usage

Expand Down

0 comments on commit ea02f9f

Please sign in to comment.