Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleanup of regression script #1179

Merged
merged 6 commits into from
May 11, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions docs/regressions-backgroundlinking18.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,8 @@ Typical indexing command:

```
nohup sh target/appassembler/bin/IndexCollection -collection WashingtonPostCollection -input /path/to/backgroundlinking18 \
-index lucene-index.backgroundlinking18.pos+docvectors+rawdocs -generator WashingtonPostGenerator -threads 1 \
-storePositions -storeDocvectors -storeRaw >& log.backgroundlinking18.pos+docvectors+rawdocs &
-index indexes/lucene-index.core18.pos+docvectors+raw -generator WashingtonPostGenerator -threads 1 \
-storePositions -storeDocvectors -storeRaw >& logs/log.backgroundlinking18.pos+docvectors+rawdocs &
```

The directory `/path/to/core18/` should be the root directory of the [TREC Washington Post Corpus](https://trec.nist.gov/data/wapost/), i.e., `ls /path/to/core18/`
Expand All @@ -29,15 +29,15 @@ Topics and qrels are stored in [`src/main/resources/topics-and-qrels/`](../src/m
After indexing has completed, you should be able to perform retrieval as follows:

```
nohup target/appassembler/bin/SearchCollection -index lucene-index.backgroundlinking18.pos+docvectors+rawdocs \
nohup target/appassembler/bin/SearchCollection -index indexes/lucene-index.core18.pos+docvectors+raw \
-topicreader BackgroundLinking -topics src/main/resources/topics-and-qrels/topics.backgroundlinking18.txt \
-backgroundlinking -backgroundlinking.k 100 -bm25 -hits 100 -output run.backgroundlinking18.bm25.topics.backgroundlinking18.txt &

nohup target/appassembler/bin/SearchCollection -index lucene-index.backgroundlinking18.pos+docvectors+rawdocs \
nohup target/appassembler/bin/SearchCollection -index indexes/lucene-index.core18.pos+docvectors+raw \
-topicreader BackgroundLinking -topics src/main/resources/topics-and-qrels/topics.backgroundlinking18.txt \
-backgroundlinking -backgroundlinking.k 100 -bm25 -rm3 -hits 100 -output run.backgroundlinking18.bm25+rm3.topics.backgroundlinking18.txt &

nohup target/appassembler/bin/SearchCollection -index lucene-index.backgroundlinking18.pos+docvectors+rawdocs \
nohup target/appassembler/bin/SearchCollection -index indexes/lucene-index.core18.pos+docvectors+raw \
-topicreader BackgroundLinking -topics src/main/resources/topics-and-qrels/topics.backgroundlinking18.txt \
-backgroundlinking -backgroundlinking.datefilter -backgroundlinking.k 100 -bm25 -rm3 -hits 100 -output run.backgroundlinking18.bm25+rm3+df.topics.backgroundlinking18.txt &
```
Expand Down
10 changes: 5 additions & 5 deletions docs/regressions-backgroundlinking19.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,8 @@ Typical indexing command:

```
nohup sh target/appassembler/bin/IndexCollection -collection WashingtonPostCollection -input /path/to/backgroundlinking19 \
-index lucene-index.backgroundlinking19.pos+docvectors+rawdocs -generator WashingtonPostGenerator -threads 1 \
-storePositions -storeDocvectors -storeRaw >& log.backgroundlinking19.pos+docvectors+rawdocs &
-index indexes/lucene-index.core18.pos+docvectors+raw -generator WashingtonPostGenerator -threads 1 \
-storePositions -storeDocvectors -storeRaw >& logs/log.backgroundlinking19.pos+docvectors+rawdocs &
```

The directory `/path/to/core18/` should be the root directory of the [TREC Washington Post Corpus](https://trec.nist.gov/data/wapost/), i.e., `ls /path/to/core18/`
Expand All @@ -29,15 +29,15 @@ Topics and qrels are stored in [`src/main/resources/topics-and-qrels/`](../src/m
After indexing has completed, you should be able to perform retrieval as follows:

```
nohup target/appassembler/bin/SearchCollection -index lucene-index.backgroundlinking19.pos+docvectors+rawdocs \
nohup target/appassembler/bin/SearchCollection -index indexes/lucene-index.core18.pos+docvectors+raw \
-topicreader BackgroundLinking -topics src/main/resources/topics-and-qrels/topics.backgroundlinking19.txt \
-backgroundlinking -backgroundlinking.k 100 -bm25 -hits 100 -output run.backgroundlinking19.bm25.topics.backgroundlinking19.txt &

nohup target/appassembler/bin/SearchCollection -index lucene-index.backgroundlinking19.pos+docvectors+rawdocs \
nohup target/appassembler/bin/SearchCollection -index indexes/lucene-index.core18.pos+docvectors+raw \
-topicreader BackgroundLinking -topics src/main/resources/topics-and-qrels/topics.backgroundlinking19.txt \
-backgroundlinking -backgroundlinking.k 100 -bm25 -rm3 -hits 100 -output run.backgroundlinking19.bm25+rm3.topics.backgroundlinking19.txt &

nohup target/appassembler/bin/SearchCollection -index lucene-index.backgroundlinking19.pos+docvectors+rawdocs \
nohup target/appassembler/bin/SearchCollection -index indexes/lucene-index.core18.pos+docvectors+raw \
-topicreader BackgroundLinking -topics src/main/resources/topics-and-qrels/topics.backgroundlinking19.txt \
-backgroundlinking -backgroundlinking.datefilter -backgroundlinking.k 100 -bm25 -rm3 -hits 100 -output run.backgroundlinking19.bm25+rm3+df.topics.backgroundlinking19.txt &
```
Expand Down
16 changes: 8 additions & 8 deletions docs/regressions-car17v1.5.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,8 @@ Typical indexing command:

```
nohup sh target/appassembler/bin/IndexCollection -collection CarCollection -input /path/to/car17v1.5 \
-index lucene-index.car17v1.5.pos+docvectors+rawdocs -generator DefaultLuceneDocumentGenerator -threads 1 \
-storePositions -storeDocvectors -storeRaw >& log.car17v1.5.pos+docvectors+rawdocs &
-index indexes/lucene-index.car17v1.5.pos+docvectors+raw -generator DefaultLuceneDocumentGenerator -threads 1 \
-storePositions -storeDocvectors -storeRaw >& logs/log.car17v1.5.pos+docvectors+rawdocs &
```

The directory `/path/to/car17v1.5` should be the root directory of Complex Answer Retrieval (CAR) paragraph corpus (v1.5), which can be downloaded [here](http://trec-car.cs.unh.edu/datareleases/).
Expand All @@ -30,27 +30,27 @@ Specifically, this is the section-level passage retrieval task with automatic gr
After indexing has completed, you should be able to perform retrieval as follows:

```
nohup target/appassembler/bin/SearchCollection -index lucene-index.car17v1.5.pos+docvectors+rawdocs \
nohup target/appassembler/bin/SearchCollection -index indexes/lucene-index.car17v1.5.pos+docvectors+raw \
-topicreader Car -topics src/main/resources/topics-and-qrels/topics.car17v1.5.benchmarkY1test.txt \
-bm25 -output run.car17v1.5.bm25.topics.car17v1.5.benchmarkY1test.txt &

nohup target/appassembler/bin/SearchCollection -index lucene-index.car17v1.5.pos+docvectors+rawdocs \
nohup target/appassembler/bin/SearchCollection -index indexes/lucene-index.car17v1.5.pos+docvectors+raw \
-topicreader Car -topics src/main/resources/topics-and-qrels/topics.car17v1.5.benchmarkY1test.txt \
-bm25 -rm3 -output run.car17v1.5.bm25+rm3.topics.car17v1.5.benchmarkY1test.txt &

nohup target/appassembler/bin/SearchCollection -index lucene-index.car17v1.5.pos+docvectors+rawdocs \
nohup target/appassembler/bin/SearchCollection -index indexes/lucene-index.car17v1.5.pos+docvectors+raw \
-topicreader Car -topics src/main/resources/topics-and-qrels/topics.car17v1.5.benchmarkY1test.txt \
-bm25 -axiom -axiom.deterministic -rerankCutoff 20 -output run.car17v1.5.bm25+ax.topics.car17v1.5.benchmarkY1test.txt &

nohup target/appassembler/bin/SearchCollection -index lucene-index.car17v1.5.pos+docvectors+rawdocs \
nohup target/appassembler/bin/SearchCollection -index indexes/lucene-index.car17v1.5.pos+docvectors+raw \
-topicreader Car -topics src/main/resources/topics-and-qrels/topics.car17v1.5.benchmarkY1test.txt \
-qld -output run.car17v1.5.ql.topics.car17v1.5.benchmarkY1test.txt &

nohup target/appassembler/bin/SearchCollection -index lucene-index.car17v1.5.pos+docvectors+rawdocs \
nohup target/appassembler/bin/SearchCollection -index indexes/lucene-index.car17v1.5.pos+docvectors+raw \
-topicreader Car -topics src/main/resources/topics-and-qrels/topics.car17v1.5.benchmarkY1test.txt \
-qld -rm3 -output run.car17v1.5.ql+rm3.topics.car17v1.5.benchmarkY1test.txt &

nohup target/appassembler/bin/SearchCollection -index lucene-index.car17v1.5.pos+docvectors+rawdocs \
nohup target/appassembler/bin/SearchCollection -index indexes/lucene-index.car17v1.5.pos+docvectors+raw \
-topicreader Car -topics src/main/resources/topics-and-qrels/topics.car17v1.5.benchmarkY1test.txt \
-qld -axiom -axiom.deterministic -rerankCutoff 20 -output run.car17v1.5.ql+ax.topics.car17v1.5.benchmarkY1test.txt &
```
Expand Down
16 changes: 8 additions & 8 deletions docs/regressions-car17v2.0-doc2query.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,8 +16,8 @@ Typical indexing command:

```
nohup sh target/appassembler/bin/IndexCollection -collection JsonCollection -input /path/to/car17v2.0-doc2query \
-index lucene-index.car17v2.0-doc2query.pos+docvectors+rawdocs -generator DefaultLuceneDocumentGenerator -threads 30 \
-storePositions -storeDocvectors -storeRaw >& log.car17v2.0-doc2query.pos+docvectors+rawdocs &
-index indexes/lucene-index.car17v2.0-doc2query.pos+docvectors+raw -generator DefaultLuceneDocumentGenerator -threads 30 \
-storePositions -storeDocvectors -storeRaw >& logs/log.car17v2.0-doc2query.pos+docvectors+rawdocs &
```

The directory `/path/to/car17v2.0-doc2query` should be the root directory of Complex Answer Retrieval (CAR) paragraph corpus (v2.0) that has been augmented with the doc2query expansions, i.e., `collection_jsonl_expanded_topk10/` as described in [this page](experiments-doc2query.md).
Expand All @@ -36,27 +36,27 @@ Specifically, this is the section-level passage retrieval task with automatic gr
After indexing has completed, you should be able to perform retrieval as follows:

```
nohup target/appassembler/bin/SearchCollection -index lucene-index.car17v2.0-doc2query.pos+docvectors+rawdocs \
nohup target/appassembler/bin/SearchCollection -index indexes/lucene-index.car17v2.0-doc2query.pos+docvectors+raw \
-topicreader Car -topics src/main/resources/topics-and-qrels/topics.car17v2.0.benchmarkY1test.txt \
-bm25 -output run.car17v2.0-doc2query.bm25.topics.car17v2.0.benchmarkY1test.txt &

nohup target/appassembler/bin/SearchCollection -index lucene-index.car17v2.0-doc2query.pos+docvectors+rawdocs \
nohup target/appassembler/bin/SearchCollection -index indexes/lucene-index.car17v2.0-doc2query.pos+docvectors+raw \
-topicreader Car -topics src/main/resources/topics-and-qrels/topics.car17v2.0.benchmarkY1test.txt \
-bm25 -rm3 -output run.car17v2.0-doc2query.bm25+rm3.topics.car17v2.0.benchmarkY1test.txt &

nohup target/appassembler/bin/SearchCollection -index lucene-index.car17v2.0-doc2query.pos+docvectors+rawdocs \
nohup target/appassembler/bin/SearchCollection -index indexes/lucene-index.car17v2.0-doc2query.pos+docvectors+raw \
-topicreader Car -topics src/main/resources/topics-and-qrels/topics.car17v2.0.benchmarkY1test.txt \
-bm25 -axiom -axiom.deterministic -rerankCutoff 20 -output run.car17v2.0-doc2query.bm25+ax.topics.car17v2.0.benchmarkY1test.txt &

nohup target/appassembler/bin/SearchCollection -index lucene-index.car17v2.0-doc2query.pos+docvectors+rawdocs \
nohup target/appassembler/bin/SearchCollection -index indexes/lucene-index.car17v2.0-doc2query.pos+docvectors+raw \
-topicreader Car -topics src/main/resources/topics-and-qrels/topics.car17v2.0.benchmarkY1test.txt \
-qld -output run.car17v2.0-doc2query.ql.topics.car17v2.0.benchmarkY1test.txt &

nohup target/appassembler/bin/SearchCollection -index lucene-index.car17v2.0-doc2query.pos+docvectors+rawdocs \
nohup target/appassembler/bin/SearchCollection -index indexes/lucene-index.car17v2.0-doc2query.pos+docvectors+raw \
-topicreader Car -topics src/main/resources/topics-and-qrels/topics.car17v2.0.benchmarkY1test.txt \
-qld -rm3 -output run.car17v2.0-doc2query.ql+rm3.topics.car17v2.0.benchmarkY1test.txt &

nohup target/appassembler/bin/SearchCollection -index lucene-index.car17v2.0-doc2query.pos+docvectors+rawdocs \
nohup target/appassembler/bin/SearchCollection -index indexes/lucene-index.car17v2.0-doc2query.pos+docvectors+raw \
-topicreader Car -topics src/main/resources/topics-and-qrels/topics.car17v2.0.benchmarkY1test.txt \
-qld -axiom -axiom.deterministic -rerankCutoff 20 -output run.car17v2.0-doc2query.ql+ax.topics.car17v2.0.benchmarkY1test.txt &
```
Expand Down
16 changes: 8 additions & 8 deletions docs/regressions-car17v2.0.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,8 @@ Typical indexing command:

```
nohup sh target/appassembler/bin/IndexCollection -collection CarCollection -input /path/to/car17v2.0 \
-index lucene-index.car17v2.0.pos+docvectors+rawdocs -generator DefaultLuceneDocumentGenerator -threads 1 \
-storePositions -storeDocvectors -storeRaw >& log.car17v2.0.pos+docvectors+rawdocs &
-index indexes/lucene-index.car17v2.0.pos+docvectors+raw -generator DefaultLuceneDocumentGenerator -threads 1 \
-storePositions -storeDocvectors -storeRaw >& logs/log.car17v2.0.pos+docvectors+rawdocs &
```

The directory `/path/to/car17v2.0` should be the root directory of Complex Answer Retrieval (CAR) paragraph corpus (v2.0), which can be downloaded [here](http://trec-car.cs.unh.edu/datareleases/).
Expand All @@ -30,27 +30,27 @@ Specifically, this is the section-level passage retrieval task with automatic gr
After indexing has completed, you should be able to perform retrieval as follows:

```
nohup target/appassembler/bin/SearchCollection -index lucene-index.car17v2.0.pos+docvectors+rawdocs \
nohup target/appassembler/bin/SearchCollection -index indexes/lucene-index.car17v2.0.pos+docvectors+raw \
-topicreader Car -topics src/main/resources/topics-and-qrels/topics.car17v2.0.benchmarkY1test.txt \
-bm25 -output run.car17v2.0.bm25.topics.car17v2.0.benchmarkY1test.txt &

nohup target/appassembler/bin/SearchCollection -index lucene-index.car17v2.0.pos+docvectors+rawdocs \
nohup target/appassembler/bin/SearchCollection -index indexes/lucene-index.car17v2.0.pos+docvectors+raw \
-topicreader Car -topics src/main/resources/topics-and-qrels/topics.car17v2.0.benchmarkY1test.txt \
-bm25 -rm3 -output run.car17v2.0.bm25+rm3.topics.car17v2.0.benchmarkY1test.txt &

nohup target/appassembler/bin/SearchCollection -index lucene-index.car17v2.0.pos+docvectors+rawdocs \
nohup target/appassembler/bin/SearchCollection -index indexes/lucene-index.car17v2.0.pos+docvectors+raw \
-topicreader Car -topics src/main/resources/topics-and-qrels/topics.car17v2.0.benchmarkY1test.txt \
-bm25 -axiom -axiom.deterministic -rerankCutoff 20 -output run.car17v2.0.bm25+ax.topics.car17v2.0.benchmarkY1test.txt &

nohup target/appassembler/bin/SearchCollection -index lucene-index.car17v2.0.pos+docvectors+rawdocs \
nohup target/appassembler/bin/SearchCollection -index indexes/lucene-index.car17v2.0.pos+docvectors+raw \
-topicreader Car -topics src/main/resources/topics-and-qrels/topics.car17v2.0.benchmarkY1test.txt \
-qld -output run.car17v2.0.ql.topics.car17v2.0.benchmarkY1test.txt &

nohup target/appassembler/bin/SearchCollection -index lucene-index.car17v2.0.pos+docvectors+rawdocs \
nohup target/appassembler/bin/SearchCollection -index indexes/lucene-index.car17v2.0.pos+docvectors+raw \
-topicreader Car -topics src/main/resources/topics-and-qrels/topics.car17v2.0.benchmarkY1test.txt \
-qld -rm3 -output run.car17v2.0.ql+rm3.topics.car17v2.0.benchmarkY1test.txt &

nohup target/appassembler/bin/SearchCollection -index lucene-index.car17v2.0.pos+docvectors+rawdocs \
nohup target/appassembler/bin/SearchCollection -index indexes/lucene-index.car17v2.0.pos+docvectors+raw \
-topicreader Car -topics src/main/resources/topics-and-qrels/topics.car17v2.0.benchmarkY1test.txt \
-qld -axiom -axiom.deterministic -rerankCutoff 20 -output run.car17v2.0.ql+ax.topics.car17v2.0.benchmarkY1test.txt &
```
Expand Down
6 changes: 3 additions & 3 deletions docs/regressions-clef06-fr.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,8 @@ Typical indexing command:

```
nohup sh target/appassembler/bin/IndexCollection -collection JsonCollection -input /path/to/clef06-fr \
-index lucene-index.clef06-fr.pos+docvectors+rawdocs -generator DefaultLuceneDocumentGenerator -threads 16 \
-storePositions -storeDocvectors -storeRaw -language fr >& log.clef06-fr.pos+docvectors+rawdocs &
-index indexes/lucene-index.clef06-fr.pos+docvectors+raw -generator DefaultLuceneDocumentGenerator -threads 16 \
-storePositions -storeDocvectors -storeRaw -language fr >& logs/log.clef06-fr.pos+docvectors+rawdocs &
```

The collection comprises news articles from ATS (SDA) and Le Monde totaling 177,452 documents.
Expand All @@ -32,7 +32,7 @@ Topics and qrels are stored in [`src/main/resources/topics-and-qrels/`](../src/m
After indexing has completed, you should be able to perform retrieval as follows:

```
nohup target/appassembler/bin/SearchCollection -index lucene-index.clef06-fr.pos+docvectors+rawdocs \
nohup target/appassembler/bin/SearchCollection -index indexes/lucene-index.clef06-fr.pos+docvectors+raw \
-topicreader TsvString -topics src/main/resources/topics-and-qrels/topics.clef06fr.mono.fr.txt \
-language fr -bm25 -output run.clef06-fr.bm25.topics.clef06fr.mono.fr.txt &
```
Expand Down
16 changes: 8 additions & 8 deletions docs/regressions-core17.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,8 @@ Typical indexing command:

```
nohup sh target/appassembler/bin/IndexCollection -collection NewYorkTimesCollection -input /path/to/core17 \
-index lucene-index.core17.pos+docvectors+rawdocs -generator DefaultLuceneDocumentGenerator -threads 16 \
-storePositions -storeDocvectors -storeRaw >& log.core17.pos+docvectors+rawdocs &
-index indexes/lucene-index.core17.pos+docvectors+raw -generator DefaultLuceneDocumentGenerator -threads 16 \
-storePositions -storeDocvectors -storeRaw >& logs/log.core17.pos+docvectors+rawdocs &
```

The directory `/path/to/nyt_corpus/` should be the root directory of the [New York Times Annotated Corpus](https://catalog.ldc.upenn.edu/LDC2008T19), i.e., `ls /path/to/nyt_corpus/`
Expand All @@ -29,27 +29,27 @@ Topics and qrels are stored in [`src/main/resources/topics-and-qrels/`](../src/m
After indexing has completed, you should be able to perform retrieval as follows:

```
nohup target/appassembler/bin/SearchCollection -index lucene-index.core17.pos+docvectors+rawdocs \
nohup target/appassembler/bin/SearchCollection -index indexes/lucene-index.core17.pos+docvectors+raw \
-topicreader Trec -topics src/main/resources/topics-and-qrels/topics.core17.txt \
-bm25 -output run.core17.bm25.topics.core17.txt &

nohup target/appassembler/bin/SearchCollection -index lucene-index.core17.pos+docvectors+rawdocs \
nohup target/appassembler/bin/SearchCollection -index indexes/lucene-index.core17.pos+docvectors+raw \
-topicreader Trec -topics src/main/resources/topics-and-qrels/topics.core17.txt \
-bm25 -rm3 -output run.core17.bm25+rm3.topics.core17.txt &

nohup target/appassembler/bin/SearchCollection -index lucene-index.core17.pos+docvectors+rawdocs \
nohup target/appassembler/bin/SearchCollection -index indexes/lucene-index.core17.pos+docvectors+raw \
-topicreader Trec -topics src/main/resources/topics-and-qrels/topics.core17.txt \
-bm25 -axiom -axiom.deterministic -rerankCutoff 20 -output run.core17.bm25+ax.topics.core17.txt &

nohup target/appassembler/bin/SearchCollection -index lucene-index.core17.pos+docvectors+rawdocs \
nohup target/appassembler/bin/SearchCollection -index indexes/lucene-index.core17.pos+docvectors+raw \
-topicreader Trec -topics src/main/resources/topics-and-qrels/topics.core17.txt \
-qld -output run.core17.ql.topics.core17.txt &

nohup target/appassembler/bin/SearchCollection -index lucene-index.core17.pos+docvectors+rawdocs \
nohup target/appassembler/bin/SearchCollection -index indexes/lucene-index.core17.pos+docvectors+raw \
-topicreader Trec -topics src/main/resources/topics-and-qrels/topics.core17.txt \
-qld -rm3 -output run.core17.ql+rm3.topics.core17.txt &

nohup target/appassembler/bin/SearchCollection -index lucene-index.core17.pos+docvectors+rawdocs \
nohup target/appassembler/bin/SearchCollection -index indexes/lucene-index.core17.pos+docvectors+raw \
-topicreader Trec -topics src/main/resources/topics-and-qrels/topics.core17.txt \
-qld -axiom -axiom.deterministic -rerankCutoff 20 -output run.core17.ql+ax.topics.core17.txt &
```
Expand Down
Loading