This page contains instructions for running monoT5 on the Robust 04 collection using GPUs.
To learn more about monoT5, please read "Document Ranking with a Pretrained Sequence-to-Sequence Model" (Nogueira et al., 2020)
Note: Robust04 uses TREC Disks 4 & 5 corpora, which are only available after filling and signing a release form from NIST. Therefore, only proceed with this documentation if you already have the corpus.
We will focus on using monoT5-base since it is difficult to run such a large model without a TPU.
Prior to running this, we suggest looking at our first-stage BM25 ranking instructions.
We rerank the BM25 run files that contain ~1,000 documents per query using monoT5.
MonoT5 is a pointwise reranker. This means that each document is scored independently using T5.
Note that we do not train monoT5 on Robust04. Hence, the results are zero-shot.
We store all the files in the data/robust04
export DATA_DIR=data/robust04
mkdir ${DATA_DIR}
We download the query, qrels and corpus files. The run file was generated during the BM25 stage and contains ~1,000 documents per query.
You can change the amount of candidate texts by setting the -hits parameter when performing Anserini's BM25 ranking instructions.
In short, the files are:
: 250 queries (also called "topics") from Robust04.qrels.robust04.txt
: 311,410 pairs of query and relevant document ids.trec_disks_4_and_5_concat.txt
: TREC disks 4 & 5 documents (528,164) concatenated as a single text
: 242,339 pairs of queries and retrieved documents using Anserini's BM25 (1,000 hits).
Let's start:
cd ${DATA_DIR}
cd ../../
If not generated yet, you can download the run file (1,000 hits):
As a sanity check, we can evaluate the first-stage (BM25) retrieved documents using the trec_eval
tools/eval/trec_eval.9.0.4/trec_eval -m map -m ndcg_cut.20 ${DATA_DIR}/qrels.robust04.txt ${DATA_DIR}/run.robust04.bm25.txt
The output should be:
map all 0.2531
ndcg_cut_20 all 0.4240
We use the script below to prepare the query-doc pairs in the monoT5 input format and then rerank it using a monoT5-base model available in pygaggle:
python ./pygaggle/run/ \
--queries=${DATA_DIR}/topics.robust04.txt \
--run=${DATA_DIR}/run.robust04.bm25.txt \
--corpus=${DATA_DIR}/trec_disks_4_and_5_concat.txt \
--output_monot5=${DATA_DIR}/monot5_results.txt \
>> ${DATA_DIR}/output.log 2>&1
You might want to run this process in background using screen
to make sure it does not get killed.
Using a NVIDIA Tesla T4, it takes approximately 16 hours to rerank with monoT5-base for ~1,000 candidate texts per query. It creates an output file containing monoT5 output:
Each line in the output follows trec_eval
f'{query_id} Q0 {docid} {rank + 1} {1 / (rank + 1)} T5\n'
After reranking is done, we can evaluate the reranked results using the trec_eval
tools/eval/trec_eval.9.0.4/trec_eval -m map -m ndcg_cut.20 $DATA_DIR/qrels.robust04.txt $DATA_DIR/monot5_results.txt
For monoT5-base, the output should be:
map all 0.3489
ndcg_cut_20 all 0.5578
Note: These results are slightly higher than the ones obtained with TPUs, probably because we used a more recent version of spacy ('3.0.6' instead of '2.2.4').
If you were able to replicate these results, please submit a PR adding to the replication log. Please mention in your PR if you note any differences.