Skip to content

Commit

Permalink
Update of TCT-ColBERTv2 docs for MS MARCO V2 (#736)
Browse files Browse the repository at this point in the history
  • Loading branch information
lintool authored Aug 14, 2021
1 parent b3676b1 commit 9b4ec11
Show file tree
Hide file tree
Showing 5 changed files with 71 additions and 70 deletions.
15 changes: 8 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -387,18 +387,19 @@ With Pyserini, it's easy to [reproduce](docs/reproducibility.md) runs on a numbe

+ [Reproducing runs directly from the Python package](docs/pypi-reproduction.md)
+ [Reproducing Robust04 baselines for ad hoc retrieval](docs/experiments-robust04.md)
+ [Reproducing the BM25 baseline for MS MARCO Passage Ranking](docs/experiments-msmarco-passage.md)
+ [Reproducing the BM25 baseline for MS MARCO Document Ranking](docs/experiments-msmarco-doc.md)
+ [Reproducing the multi-field BM25 baseline for MS MARCO Document Ranking from Elasticsearch](docs/experiments-elastic.md)
+ [Reproducing the BM25 baseline for MS MARCO (V1) Passage Ranking](docs/experiments-msmarco-passage.md)
+ [Reproducing the BM25 baseline for MS MARCO (V1) Document Ranking](docs/experiments-msmarco-doc.md)
+ [Reproducing the multi-field BM25 baseline for MS MARCO (V1) Document Ranking from Elasticsearch](docs/experiments-elastic.md)
+ [Reproducing BM25 baselines on the MS MARCO (V2) Collections](docs/experiments-msmarco-v2.md)
+ [Reproducing DeepImpact experiments for MS MARCO Passage Ranking](docs/experiments-deepimpact.md)
+ [Reproducing uniCOIL experiments for MS MARCO Passage Ranking](docs/experiments-unicoil.md)
+ [Reproducing DeepImpact experiments for MS MARCO (V1) Passage Ranking](docs/experiments-deepimpact.md)
+ [Reproducing uniCOIL experiments for MS MARCO (V1) Passage Ranking](docs/experiments-unicoil.md)
+ [Reproducing uniCOIL experiments on the MS MARCO (V2) Collections](docs/experiments-msmarco-v2-unicoil.md)

### Dense Retrieval

+ [Reproducing TCT-ColBERTv2 experiments](docs/experiments-tct_colbert-v2.md)
+ [Reproducing TCT-ColBERTv1 experiments](docs/experiments-tct_colbert.md)
+ [Reproducing TCT-ColBERTv1 experiments on the MS MARCO (V1) Collections](docs/experiments-tct_colbert.md)
+ [Reproducing TCT-ColBERTv2 experiments on the MS MARCO (V1) Collections](docs/experiments-tct_colbert-v2.md)
+ [Reproducing TCT-ColBERTv2 experiments on the MS MARCO (V2) Collections](docs/experiments-msmarco-v2-tct_colbert-v2.md)
+ [Reproducing DPR experiments](docs/experiments-dpr.md)
+ [Reproducing ANCE experiments](docs/experiments-ance.md)
+ [Reproducing DistilBERT KD experiments](docs/experiments-distilbert_kd.md)
Expand Down
96 changes: 49 additions & 47 deletions docs/experiments-msmarco-v2-tct_colbert-v2.md
Original file line number Diff line number Diff line change
@@ -1,91 +1,93 @@
# Pyserini: Baseline for MS MARCO V2: TCT-ColBERT-V2
# Pyserini: TCT-ColBERTv2 for MS MARCO (V2) Collections

This guide provides instructions to reproduce the family of TCT-ColBERT-V2 dense retrieval models described in the following paper:
This guide provides instructions to reproduce experiments using TCT-ColBERTv2 dense retrieval models on the MS MARCO (V2) collections.
The model is described in the following paper:

> Sheng-Chieh Lin, Jheng-Hong Yang, and Jimmy Lin. [In-Batch Negatives for Knowledge Distillation with Tightly-CoupledTeachers for Dense Retrieval.](https://cs.uwaterloo.ca/~jimmylin/publications/Lin_etal_2021_RepL4NLP.pdf) _RepL4NLP 2021_.
> Sheng-Chieh Lin, Jheng-Hong Yang, and Jimmy Lin. [In-Batch Negatives for Knowledge Distillation with Tightly-CoupledTeachers for Dense Retrieval.](https://aclanthology.org/2021.repl4nlp-1.17/) _Proceedings of the 6th Workshop on Representation Learning for NLP (RepL4NLP-2021)_, pages 163-173, August 2021.
At present, all indexes are referenced as absolute paths on our Waterloo machine `orca`, so these results are not broadly reproducible.
We are working on figuring out ways to distribute the indexes.

## Data Prep
<!-- # Anserini: Guide to Working with the MS MARCO V2 Collections -->

<!-- This guide presents information for working with V2 of the MS MARCO passage and document test collections. -->

If you're having issues downloading the collection via `wget`, try using [AzCopy](https://docs.microsoft.com/en-us/azure/storage/common/storage-use-azcopy-v10).


1. We use [augmented passage collection](https://github.com/castorini/anserini/blob/master/docs/experiments-msmarco-v2.md#passage-collection-augmented) and [segmented document collection](https://github.com/castorini/anserini/blob/master/docs/experiments-msmarco-v2.md#document-collection-segmented)
2. Currently, the prebuilt index is on our Waterloo machine `orca`.
3. We only encode `title`, `headings`, and `passage` (or `segment`) for passage (or document) collections.
For the TREC 2021 Deep Learning Track, we applied our TCT-ColBERTv2 model trained on MS MARCO (V1) in a zero-shot manner.
Specifically, we applied inference over the MS MARCO V2 [passage corpus](https://github.com/castorini/anserini/blob/master/docs/experiments-msmarco-v2.md#passage-collection) and [segmented document corpus](https://github.com/castorini/anserini/blob/master/docs/experiments-msmarco-v2.md#document-collection-segmented) to obtain the dense vectors.

Let's prepare our environment variables:

```bash
export PSG_INDEX="/store/scratch/indexes/trec2021/faiss-flat.tct_colbert-v2-hnp.0shot.msmarco-passage-v2-augmented"
export DOC_INDEX="/store/scratch/indexes/trec2021/faiss-flat.tct_colbert-v2-hnp.0shot.msmarco-doc-v2-segmented"
export ENCODER="castorini/tct_colbert-v2-hnp-msmarco"
export PASSAGE_INDEX0="/store/scratch/indexes/trec2021/faiss-flat.tct_colbert-v2-hnp.0shot.msmarco-passage-v2-augmented"
export DOC_INDEX0="/store/scratch/indexes/trec2021/faiss-flat.tct_colbert-v2-hnp.0shot.msmarco-doc-v2-segmented"
export ENCODER0="castorini/tct_colbert-v2-hnp-msmarco"
```

## MS MARCO Passage V2
## Passage V2

Dense retrieval with TCT-ColBERT-V2, brute-force index:

```bash
$ python -m pyserini.dsearch --topics collections/passv2_dev_queries.tsv \
--index ${PSG_INDEX} \
--encoder ${ENCODER} \
--index ${PASSAGE_INDEX0} \
--encoder ${ENCODER0} \
--batch-size 144 \
--threads 36 \
--output runs/run.msmarco-passage-v2-augmented.tct_colbert-v2-hnp.0shot.top1k.dev1.trec \
--output runs/run.msmarco-passage-v2-augmented.tct_colbert-v2-hnp.0shot.dev1.trec \
--output-format trec
```

To evaluate:

We use the official TREC evaluation tool `trec_eval` to compute metrics.
> Note: There are duplicated passages in msmarco v2, the following results will be different from using `--output-format msmarco` with `pyserini.eval.convert_msmarco_run_to_trec_run` because of tie breaking.
To evaluate using `trec_eval`:

```bash
$ python -m pyserini.eval.trec_eval -c -m recall.10,100,1000 -mmap -m -m recip_rank collections/passv2_dev_qrels.uniq.tsv runs/run.msmarco-passage-v2-augmented.tct_colbert-v2-hnp.0shot.top1k.dev1.trec
$ python -m pyserini.eval.trec_eval -c -M 100 -m map -m recip_rank collections/passv2_dev_qrels.tsv runs/run.msmarco-passage-v2-augmented.tct_colbert-v2-hnp.0shot.dev1.trec
Results:
map all 0.1461
recip_rank all 0.1473

$ python -m pyserini.eval.trec_eval -c -m recall.100,1000 collections/passv2_dev_qrels.tsv runs/run.msmarco-passage-v2-augmented.tct_colbert-v2-hnp.0shot.dev1.trec
Results:
map all 0.1472
recip_rank all 0.1483
recall_10 all 0.2743
recall_100 all 0.5873
recall_1000 all 0.8321
recall_100 all 0.5873
recall_1000 all 0.8321
```

## MS MARCO Document V2
We evaluate MAP and MRR at a cutoff of 100 hits to match the official evaluation metrics.
However, we measure recall at both 100 and 1000 hits; the latter is a common setting for reranking.

Dense retrieval with TCT-ColBERT-V2, brute-force index:
Because there are duplicate passages in MS MARCO V2 collections, score differences might be observed due to tie-breaking effects.
For example, if we output in MS MARCO format `--output-format msmarco` and then convert to TREC format with `pyserini.eval.convert_msmarco_run_to_trec_run`, the scores will be different.

## Document V2

```bash
Dense retrieval with TCT-ColBERT-V2, brute-force index:

```bash
$ python -m pyserini.dsearch --topics collections/docv2_dev_queries.tsv \
--index ${DOC_INDEX} \
--encoder ${ENCODER} \
--index ${DOC_INDEX0} \
--encoder ${ENCODER0} \
--batch-size 144 \
--threads 36 \
--hits 1000 \
--max-passage-hits 100 \
--hits 10000 \
--max-passage-hits 1000 \
--max-passage \
--output runs/run.msmarco-document-v2-segmented.tct_colbert-v2-hnp.0shot.maxp.top100.dev1.trec \
--output runs/run.msmarco-document-v2-segmented.tct_colbert-v2-hnp.0shot.dev1.trec \
--output-format trec
```

To evaluate:

We use the official TREC evaluation tool `trec_eval` to compute metrics.
To evaluate using `trec_eval`:

```bash
$ python -m pyserini.eval.trec_eval -c -m recall.10,100 -mmap -m -m recip_rank collections/docv2_dev_qrels.tsv runs/run.msmarco-document-v2-segmented.tct_colbert-v2-hnp.0shot.maxp.top100.dev1.trec
$ python -m pyserini.eval.trec_eval -c -M 100 -m map -m recip_rank collections/docv2_dev_qrels.tsv runs/run.msmarco-document-v2-segmented.tct_colbert-v2-hnp.0shot.dev1.trec
Results:
map all 0.2440
recip_rank all 0.2464
recall_10 all 0.4784
recall_100 all 0.7873
map all 0.2440
recip_rank all 0.2464

$ python -m pyserini.eval.trec_eval -c -m recall.100,1000 collections/docv2_dev_qrels.tsv runs/run.msmarco-document-v2-segmented.tct_colbert-v2-hnp.0shot.dev1.trec
Results:
recall_100 all 0.7873
recall_1000 all 0.9161
```

We evaluate MAP and MRR at a cutoff of 100 hits to match the official evaluation metrics.
However, we measure recall at both 100 and 1000 hits; the latter is a common setting for reranking.

Same comment about duplicate passages and score ties applies here as well.

## Reproduction Log[*](reproducibility.md)

12 changes: 6 additions & 6 deletions docs/experiments-msmarco-v2-unicoil.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ Thus, we applied uniCOIL without expansions in a zero-shot manner using the mode

Specifically, we applied inference over the MS MARCO V2 [passage corpus](https://github.com/castorini/anserini/blob/master/docs/experiments-msmarco-v2.md#passage-collection) and [segmented document corpus](https://github.com/castorini/anserini/blob/master/docs/experiments-msmarco-v2.md#document-collection-segmented) to obtain the term weights.

### Passage V2 Corpus
### Passage V2

Sparse retrieval with uniCOIL:

Expand Down Expand Up @@ -48,7 +48,7 @@ recall_1000 all 0.7013
Note that we evaluate MAP and MRR at a cutoff of 100 hits to match the official evaluation metrics.
However, we measure recall at both 100 and 1000 hits; the latter is a common setting for reranking.

### Document V2 Corpus
### Document V2

Sparse retrieval with uniCOIL:

Expand Down Expand Up @@ -82,15 +82,15 @@ recall_100 all 0.7190
recall_1000 all 0.8813
```

Note that we evaluate MAP and MRR at a cutoff of 100 hits to match the official evaluation metrics.
We evaluate MAP and MRR at a cutoff of 100 hits to match the official evaluation metrics.
However, we measure recall at both 100 and 1000 hits; the latter is a common setting for reranking.

## Zero-Shot uniCOIL + Dense Retrieval Hybrid

Note that there are duplicate passages in MS MARCO V2 collections, so score differences might be observed due to tie-breaking effects.
Because there are duplicate passages in MS MARCO V2 collections, score differences might be observed due to tie-breaking effects.
For example, if we output in MS MARCO format `--output-format msmarco` and then convert to TREC format with `pyserini.eval.convert_msmarco_run_to_trec_run`, the scores will be different.

### Passage V2 Corpus
### Passage V2

Dense-sparse hybrid retrieval (uniCOIL zero-shot + TCT_ColBERT_v2 zero-shot):

Expand Down Expand Up @@ -148,7 +148,7 @@ recall_100 all 0.6701
recall_1000 all 0.8748
```

### Document V2 Corpus
### Document V2

Dense-sparse hybrid retrieval (uniCOIL zero-shot + TCT_ColBERT_v2 zero-shot):

Expand Down
12 changes: 5 additions & 7 deletions docs/experiments-tct_colbert-v2.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# Pyserini: Reproducing TCT-ColBERT-V2 Results
# Pyserini: TCT-ColBERTv2 for MS MARCO (V1) Collections

This guide provides instructions to reproduce the family of TCT-ColBERT-V2 dense retrieval models described in the following paper:

> Sheng-Chieh Lin, Jheng-Hong Yang, and Jimmy Lin. [In-Batch Negatives for Knowledge Distillation with Tightly-CoupledTeachers for Dense Retrieval.](https://cs.uwaterloo.ca/~jimmylin/publications/Lin_etal_2021_RepL4NLP.pdf) _RepL4NLP 2021_.
> Sheng-Chieh Lin, Jheng-Hong Yang, and Jimmy Lin. [In-Batch Negatives for Knowledge Distillation with Tightly-CoupledTeachers for Dense Retrieval.](https://aclanthology.org/2021.repl4nlp-1.17/) _Proceedings of the 6th Workshop on Representation Learning for NLP (RepL4NLP-2021)_, pages 163-173, August 2021.
Since dense retrieval depends on neural networks, Pyserini requires a more complex set of dependencies to use this feature.
See [package installation notes](../README.md#package-installation) for more details.
Expand All @@ -25,7 +25,7 @@ Summary of results (figures from the paper are in parentheses):

The slight differences between the reproduced scores and those reported in the paper can be attributed to TensorFlow implementations in the published paper vs. PyTorch implementations here in this reproduction guide.

## TCT_ColBERT-V2
### TCT_ColBERT-V2

Dense retrieval with TCT-ColBERT, brute-force index:

Expand Down Expand Up @@ -61,7 +61,7 @@ map all 0.3509
recall_1000 all 0.9670
```

## TCT_ColBERT-V2-HN
### TCT_ColBERT-V2-HN

```bash
$ python -m pyserini.dsearch --topics msmarco-passage-dev-subset \
Expand All @@ -88,7 +88,7 @@ map all 0.3608
recall_1000 all 0.9708
```

## TCT_ColBERT-V2-HN+
### TCT_ColBERT-V2-HN+

```bash
$ python -m pyserini.dsearch --topics msmarco-passage-dev-subset \
Expand Down Expand Up @@ -119,7 +119,6 @@ To perform on-the-fly query encoding with our [pretrained encoder model](https:/
Query encoding will run on the CPU by default.
To perform query encoding on the GPU, use the option `--device cuda:0`.


### Hybrid Dense-Sparse Retrieval with TCT_ColBERT-V2-HN+

Hybrid retrieval with dense-sparse representations (without document expansion):
Expand Down Expand Up @@ -295,7 +294,6 @@ ndcg_cut_10 all 0.6592
```



## Reproduction Log[*](reproducibility.md)

+ Results reproduced by [@lintool](https://github.com/lintool) on 2021-07-01 (commit [`b1576a2`](https://github.com/castorini/pyserini/commit/b1576a2c3e899349be12e897f92f3ad75ec82d6f))
6 changes: 3 additions & 3 deletions docs/experiments-tct_colbert.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Pyserini: Reproducing TCT-ColBERT Results
# Pyserini: TCT-ColBERT for MS MARCO (V1) Collections

This guide provides instructions to reproduce the TCT-ColBERT dense retrieval model described in the following paper:

Expand All @@ -23,7 +23,7 @@ Summary of results:
| TCT-ColBERT (brute-force index) + BoW BM25 | 0.3529 | 0.3594 | 0.9698 |
| TCT-ColBERT (brute-force index) + BM25 w/ doc2query-T5 | 0.3647 | 0.3711 | 0.9751 |

## Dense Retrieval
### Dense Retrieval

Dense retrieval with TCT-ColBERT, brute-force index:

Expand Down Expand Up @@ -91,7 +91,7 @@ recall_1000 all 0.9618
Follow the same instructions above to perform on-the-fly query encoding.
The caveat about minor differences in score applies here as well.

## Hybrid Dense-Sparse Retrieval
### Hybrid Dense-Sparse Retrieval

Hybrid retrieval with dense-sparse representations (without document expansion):
- dense retrieval with TCT-ColBERT, brute force index.
Expand Down

0 comments on commit 9b4ec11

Please sign in to comment.