Skip to content
This repository has been archived by the owner on Aug 10, 2021. It is now read-only.
Ross Fairbanks edited this page Jan 5, 2016 · 1 revision

Elasticrawl Wiki

List of Crawls

A list is maintained here of all web crawls released by Common Crawl that are compatible with elasticrawl.

Common Crawl announce new crawls on their blog.

Crawl Name Month Web Pages Segments
CC-MAIN-2015-48 November 2015 ~ 1.82 billion 100
CC-MAIN-2015-40 September 2015 ~ 1.32 billion 99
CC-MAIN-2015-35 August 2015 ~ 1.84 billion 100
CC-MAIN-2015-32 July 2015 ~ 1.81 billion 99
CC-MAIN-2015-27 June 2015 ~ 1.67 billion 100
CC-MAIN-2015-22 May 2015 ~ 2.05 billion 124
CC-MAIN-2015-18 April 2015 ~ 2.11 billion 188
CC-MAIN-2015-14 March 2015 ~ 1.64 billion 100
CC-MAIN-2015-11 February 2015 ~ 1.9 billion 100
CC-MAIN-2015-06 January 2015 ~ 1.82 billion 98
CC-MAIN-2014-52 December 2014 ~ 2.08 billion 314
CC-MAIN-2014-49 November 2014 ~ 1.95 billion 136
CC-MAIN-2014-35 August 2014 ~ 2.8 billion 111
CC-MAIN-2014-23 July 2014 ~ 3.6 billion 253
CC-MAIN-2014-15 April 2014 ~ 2.3 billion 70
Clone this wiki locally