diff --git a/docs/experiments-msmarco-doc.md b/docs/experiments-msmarco-doc.md index a5784b1775..6a3b2563b3 100644 --- a/docs/experiments-msmarco-doc.md +++ b/docs/experiments-msmarco-doc.md @@ -23,10 +23,9 @@ There's no need to uncompress the file, as Anserini can directly index gzipped f Build the index with the following command: ``` -nohup sh target/appassembler/bin/IndexCollection -collection CleanTrecCollection \ - -generator DefaultLuceneDocumentGenerator -threads 1 -input collections/msmarco-doc \ - -index indexes/msmarco-doc/lucene-index.msmarco-doc.pos+docvectors+rawdocs \ - -storePositions -storeDocvectors -storeRaw >& logs/log.msmarco-doc.pos+docvectors+rawdocs & +sh target/appassembler/bin/IndexCollection -threads 1 -collection CleanTrecCollection \ + -generator DefaultLuceneDocumentGenerator -input collections/msmarco-doc \ + -index indexes/msmarco-doc/lucene-index-msmarco -storePositions -storeDocvectors -storeRaw ``` On a modern desktop with an SSD, indexing takes around 40 minutes. @@ -40,7 +39,7 @@ The dev queries are already stored in our repo: ``` target/appassembler/bin/SearchCollection -topicreader TsvInt \ - -index indexes/msmarco-doc/lucene-index.msmarco-doc.pos+docvectors+rawdocs \ + -index indexes/msmarco-doc/lucene-index-msmarco \ -topics src/main/resources/topics-and-qrels/topics.msmarco-doc.dev.txt \ -output runs/run.msmarco-doc.dev.bm25.txt -bm25 ``` @@ -93,7 +92,7 @@ To perform a run with these parameters, issue the following command: ``` target/appassembler/bin/SearchCollection -topicreader TsvString \ - -index indexes/msmarco-doc/lucene-index.msmarco-doc.pos+docvectors+rawdocs \ + -index indexes/msmarco-doc/lucene-index-msmarco \ -topics src/main/resources/topics-and-qrels/topics.msmarco-doc.dev.txt \ -output runs/run.msmarco-doc.dev.bm25.tuned.txt -bm25 -bm25.k1 3.44 -bm25.b 0.87 ```