From 55055a081864e780914d12f9459e47c46e07c220 Mon Sep 17 00:00:00 2001
From: HangCui0510 <h29cui@uwaterloo.ca>
Date: Wed, 8 Jul 2020 12:51:06 -0400
Subject: [PATCH 1/7] Update Replication Log

---
 docs/experiments-msmarco-document.md | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/docs/experiments-msmarco-document.md b/docs/experiments-msmarco-document.md
index b6c82125..fc9628a9 100644
--- a/docs/experiments-msmarco-document.md
+++ b/docs/experiments-msmarco-document.md
@@ -115,3 +115,5 @@ If you were able to replicate these results, please submit a PR adding to the re
 
 
 ## Replication Log
+
++ Results replicated by [@HangCui0510](https://github.com/HangCui0510) on 2020-05-29 (commit [`f2e078e`](https://github.com/HangCui0510/pygaggle/commit/f2e078e47c87156925a9151632753be861ec403d)) (Tesla P100)

From fffc10315be7f7d3fcbb76b2902c21ba3908c4e7 Mon Sep 17 00:00:00 2001
From: HangCui0510 <h29cui@uwaterloo.ca>
Date: Wed, 8 Jul 2020 12:52:48 -0400
Subject: [PATCH 2/7] Update Replication Log

---
 docs/experiments-msmarco-document.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/experiments-msmarco-document.md b/docs/experiments-msmarco-document.md
index fc9628a9..f5444d77 100644
--- a/docs/experiments-msmarco-document.md
+++ b/docs/experiments-msmarco-document.md
@@ -116,4 +116,4 @@ If you were able to replicate these results, please submit a PR adding to the re
 
 ## Replication Log
 
-+ Results replicated by [@HangCui0510](https://github.com/HangCui0510) on 2020-05-29 (commit [`f2e078e`](https://github.com/HangCui0510/pygaggle/commit/f2e078e47c87156925a9151632753be861ec403d)) (Tesla P100)
++ Results replicated by [@HangCui0510](https://github.com/HangCui0510) on 2020-07-08 (commit [`f2e078e`](https://github.com/HangCui0510/pygaggle/commit/f2e078e47c87156925a9151632753be861ec403d)) (Tesla P100)

From 314bebfcc0ae8b18f57e92a1a1c3e50d469b37ff Mon Sep 17 00:00:00 2001
From: HangCui0510 <h29cui@uwaterloo.ca>
Date: Wed, 8 Jul 2020 12:57:18 -0400
Subject: [PATCH 3/7] Update requirements

---
 requirements.txt | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/requirements.txt b/requirements.txt
index e97e587a..1c76c3a1 100644
--- a/requirements.txt
+++ b/requirements.txt
@@ -8,6 +8,6 @@ scipy>=1.4
 spacy==2.2.4
 tensorboard>=2.1.0
 tensorflow>=2.2.0rc1
-tokenizers>=0.7
+tokenizers==0.7
 tqdm==4.45.0
-transformers>=2.9.0
+transformers==2.10.0

From 472f59260014a9e9d3398752df4a4131cff97550 Mon Sep 17 00:00:00 2001
From: HangCui0510 <h29cui@uwaterloo.ca>
Date: Thu, 9 Jul 2020 05:08:01 -0400
Subject: [PATCH 4/7] Create CovidQA doc

---
 docs/CovidQA.md | 79 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 79 insertions(+)
 create mode 100644 docs/CovidQA.md

diff --git a/docs/CovidQA.md b/docs/CovidQA.md
new file mode 100644
index 00000000..a8ae8796
--- /dev/null
+++ b/docs/CovidQA.md
@@ -0,0 +1,79 @@
+# PyGaggle: Neural Ranking Baselines on [MS MARCO Passage Retrieval](https://github.com/microsoft/MSMARCO-Passage-Ranking)
+
+This page contains instructions for running various neural reranking baselines on the CovidQA ranking task. 
+
+Note 1: Run the following instructions at root of this repo.
+Note 2: Make sure that you have access to a GPU
+Note 3: Installation must have been done from source and make sure the [anserini-eval](https://github.com/castorini/anserini-eval) submodule is pulled. 
+To do this, first clone the repository recursively.
+
+```
+git clone --recursive https://github.com/castorini/pygaggle.git
+```
+
+Then install PyGaggle using:
+
+```
+pip install pygaggle/
+```
+
+## Re-Ranking with Random
+
+```
+python -um pygaggle.run.evaluate_kaggle_highlighter --method random \
+                                                    --dataset data/kaggle-lit-review-0.2.json \
+                                                    --index-dir indexes/lucene-index-cord19-paragraph-2020-05-12
+```
+
+The following output will be visible after it has finished:
+
+```
+precision@1     0.0
+recall@3        0.0199546485260771
+recall@50       0.3247165532879819
+recall@1000     1.0
+mrr     0.03999734528458418
+mrr@10  0.020888672929489253
+```
+
+## Re-Ranking with BM25
+
+```
+python -um pygaggle.run.evaluate_kaggle_highlighter --method bm25 \
+                                                    --dataset data/kaggle-lit-review-0.2.json \
+                                                    --index-dir indexes/lucene-index-cord19-paragraph-2020-05-12
+```
+
+The following output will be visible after it has finished:
+
+```
+precision@1     0.14685314685314685
+recall@3        0.2199546485260771
+recall@50       0.6582766439909296
+recall@1000     0.6820861678004534
+mrr     0.24651188194041115
+mrr@10  0.2267060792570997
+```
+
+It takes about 10 seconds to re-rank this subset on CovidQA using a P100.
+
+## Re-Ranking with monoT5-Base
+
+```
+python -um pygaggle.run.evaluate_kaggle_highlighter --method t5 \
+                                                    --dataset data/kaggle-lit-review-0.2.json \
+                                                    --index-dir indexes/lucene-index-cord19-paragraph-2020-05-12
+```
+
+The following output will be visible after it has finished:
+
+```
+precision@1     0.2789115646258503
+recall@3        0.41854551344347257
+recall@50       0.92555879494655
+recall@1000     1.0
+mrr     0.417982565405279
+mrr@10  0.4045405463772811
+```
+
+It takes about 17 minutes to re-rank this subset on CovidQA using a P100.
\ No newline at end of file

From 55a9bbb48139145d336a6a1fac8e63b4c7f1ec4b Mon Sep 17 00:00:00 2001
From: HangCui0510 <h29cui@uwaterloo.ca>
Date: Thu, 9 Jul 2020 05:09:33 -0400
Subject: [PATCH 5/7] Create CovidQA doc

---
 docs/CovidQA.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/CovidQA.md b/docs/CovidQA.md
index a8ae8796..862b5f29 100644
--- a/docs/CovidQA.md
+++ b/docs/CovidQA.md
@@ -1,4 +1,4 @@
-# PyGaggle: Neural Ranking Baselines on [MS MARCO Passage Retrieval](https://github.com/microsoft/MSMARCO-Passage-Ranking)
+# PyGaggle: Neural Ranking Baselines on CovidQA
 
 This page contains instructions for running various neural reranking baselines on the CovidQA ranking task. 
 

From 998211b4f4b73aa2dc2f032ebef01241880d6c0a Mon Sep 17 00:00:00 2001
From: HangCui0510 <h29cui@uwaterloo.ca>
Date: Thu, 9 Jul 2020 19:35:39 -0400
Subject: [PATCH 6/7] Create CovidQA Docs

---
 docs/CovidQA.md | 87 ++++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 79 insertions(+), 8 deletions(-)

diff --git a/docs/CovidQA.md b/docs/CovidQA.md
index 862b5f29..e3672dac 100644
--- a/docs/CovidQA.md
+++ b/docs/CovidQA.md
@@ -19,6 +19,8 @@ pip install pygaggle/
 
 ## Re-Ranking with Random
 
+NL Question:
+
 ```
 python -um pygaggle.run.evaluate_kaggle_highlighter --method random \
                                                     --dataset data/kaggle-lit-review-0.2.json \
@@ -32,12 +34,34 @@ precision@1     0.0
 recall@3        0.0199546485260771
 recall@50       0.3247165532879819
 recall@1000     1.0
-mrr     0.03999734528458418
-mrr@10  0.020888672929489253
+mrr             0.03999734528458418
+mrr@10          0.020888672929489253
+```
+
+Keyword Query
+
+```
+python -um pygaggle.run.evaluate_kaggle_highlighter --method random \
+                                                     --split kq \
+                                                     --dataset data/kaggle-lit-review-0.2.json \
+                                                     --index-dir indexes/lucene-index-cord19-paragraph-2020-05-12
+```
+
+The following output will be visible after it has finished:
+
+```
+precision@1     0.0
+recall@3        0.0199546485260771
+recall@50       0.3247165532879819
+recall@1000     1.0
+mrr             0.03999734528458418
+mrr@10          0.020888672929489253
 ```
 
 ## Re-Ranking with BM25
 
+NL Question:
+
 ```
 python -um pygaggle.run.evaluate_kaggle_highlighter --method bm25 \
                                                     --dataset data/kaggle-lit-review-0.2.json \
@@ -51,13 +75,35 @@ precision@1     0.14685314685314685
 recall@3        0.2199546485260771
 recall@50       0.6582766439909296
 recall@1000     0.6820861678004534
-mrr     0.24651188194041115
-mrr@10  0.2267060792570997
+mrr             0.24651188194041115
+mrr@10          0.2267060792570997
+```
+
+Keyword Query:
+
+```
+python -um pygaggle.run.evaluate_kaggle_highlighter --method bm25 \
+                                                     --split kq \
+                                                     --dataset data/kaggle-lit-review-0.2.json \
+                                                     --index-dir indexes/lucene-index-cord19-paragraph-2020-05-12
+```
+
+The following output will be visible after it has finished:
+
+```
+precision@1     0.14685314685314685
+recall@3        0.22675736961451243
+recall@50       0.6650793650793649
+recall@1000     0.6888888888888888
+mrr             0.249090910278702
+mrr@10          0.22846344887161213
 ```
 
 It takes about 10 seconds to re-rank this subset on CovidQA using a P100.
 
-## Re-Ranking with monoT5-Base
+## Re-Ranking with monoT5
+
+NL Question:
 
 ```
 python -um pygaggle.run.evaluate_kaggle_highlighter --method t5 \
@@ -72,8 +118,33 @@ precision@1     0.2789115646258503
 recall@3        0.41854551344347257
 recall@50       0.92555879494655
 recall@1000     1.0
-mrr     0.417982565405279
-mrr@10  0.4045405463772811
+mrr             0.417982565405279
+mrr@10          0.4045405463772811
 ```
 
-It takes about 17 minutes to re-rank this subset on CovidQA using a P100.
\ No newline at end of file
+Keyword Query:
+
+```
+python -um pygaggle.run.evaluate_kaggle_highlighter --method t5 \
+                                                     --split kq \
+                                                     --dataset data/kaggle-lit-review-0.2.json \
+                                                     --index-dir indexes/lucene-index-cord19-paragraph-2020-05-12
+```
+
+The following output will be visible after it has finished:
+
+```
+precision@1     0.24489795918367346
+recall@3        0.38566569484936825
+recall@50       0.9231778425655977
+recall@1000     1.0
+mrr             0.37988285486956513
+mrr@10          0.3671336788683727
+```
+
+It takes about 17 minutes to re-rank this subset on CovidQA using a P100.
+
+If you were able to replicate these results, please submit a PR adding to the replication log!
+
+
+## Replication Log
\ No newline at end of file

From abb49bbd990f4a6e192e50c12b27d4b9e0477f6d Mon Sep 17 00:00:00 2001
From: HangCui0510 <h29cui@uwaterloo.ca>
Date: Mon, 13 Jul 2020 15:08:38 -0400
Subject: [PATCH 7/7] Match file names, delete CovidQA from Readme

---
 README.md                                   | 111 ++------------------
 docs/{CovidQA.md => experiments-CovidQA.md} |   2 +-
 2 files changed, 10 insertions(+), 103 deletions(-)
 rename docs/{CovidQA.md => experiments-CovidQA.md} (98%)

diff --git a/README.md b/README.md
index d5fb3f9c..35a0dde4 100644
--- a/README.md
+++ b/README.md
@@ -20,6 +20,15 @@ Currently, this repo contains implementations of the rerankers for [CovidQA](htt
 
 0. Install [Anserini](https://github.com/castorini/anserini).
 
+## Additional Instructions
+
+0. Clone the repo with `git clone --recursive https://github.com/castorini/pygaggle.git`
+
+0. Make you sure you have an installation of [Python 3.6+](https://www.python.org/downloads/). All `python` commands below refer to this.
+
+0. For pip, do `pip install -r requirements.txt`
+    * If you prefer Anaconda, use `conda env create -f environment.yml && conda activate pygaggle`.
+
 
 # A simple reranking example
 The code below exemplifies how to score two documents for a given query using a T5 reranker from [Document Ranking with a Pretrained
@@ -56,105 +65,3 @@ scores = [result.score for result in reranker.rerank(query, documents)]
 # scores = [-0.1782158613204956, -0.36637523770332336]
 ```
 
-# Evaluations
-
-## Additional Instructions
-
-0. Clone the repo with `git clone --recursive https://github.com/castorini/pygaggle.git`
-
-0. Make you sure you have an installation of [Python 3.6+](https://www.python.org/downloads/). All `python` commands below refer to this.
-
-0. For pip, do `pip install -r requirements.txt`
-    * If you prefer Anaconda, use `conda env create -f environment.yml && conda activate pygaggle`.
-
-
-## Running rerankers on CovidQA
-
-For a full list of mostly self-explanatory environment variables, see [this file](https://github.com/castorini/pygaggle/blob/master/pygaggle/settings.py#L7).
-
-BM25 uses the CPU. If you don't have a GPU for the transformer models, pass `--device cpu` (PyTorch device string format) to the script.
-
-*Note: Run the following evaluations at root of this repo.*
-
-### Unsupervised Methods
-
-**BM25**:
-
-```bash
-python -um pygaggle.run.evaluate_kaggle_highlighter --method bm25
-```
-
-**BERT**:
-
-```bash
-python -um pygaggle.run.evaluate_kaggle_highlighter --method transformer --model-name bert-base-cased
-```
-
-**SciBERT**:
-
-```bash
-python -um pygaggle.run.evaluate_kaggle_highlighter --method transformer --model-name allenai/scibert_scivocab_cased
-```
-
-**BioBERT**:
-
-```bash
-python -um pygaggle.run.evaluate_kaggle_highlighter --method transformer --model-name biobert
-```
-
-### Supervised Methods
-
-**T5 (fine-tuned on MS MARCO)**:
-
-```bash
-python -um pygaggle.run.evaluate_kaggle_highlighter --method t5
-```
-
-**BioBERT (fine-tuned on SQuAD v1.1)**:
-
-0. `mkdir biobert-squad && cd biobert-squad`
-
-0. Download the weights, vocab, and config from the [BioBERT repository](https://github.com/dmis-lab/bioasq-biobert) to `biobert-squad`.
-
-0. Untar the model and rename some files in `biobert-squad`:
-
-```bash
-tar -xvzf BERT-pubmed-1000000-SQuAD.tar.gz
-mv bert_config.json config.json
-for filename in model.ckpt*; do
-    mv $filename $(python -c "import re; print(re.sub(r'ckpt-\\d+', 'ckpt', '$filename'))");
-done
-```
-
-0. Evaluate the model:
-
-```bash
-cd .. # go to root of this of repo
-python -um pygaggle.run.evaluate_kaggle_highlighter --method qa_transformer --model-name <folder path>
-```
-
-**BioBERT (fine-tuned on MS MARCO)**:
-
-0. Download the weights, vocab, and config from our Google Storage bucket. This requires an installation of [gsutil](https://cloud.google.com/storage/docs/gsutil_install?hl=ru).
-
-```bash
-mkdir biobert-marco && cd biobert-marco
-gsutil cp "gs://neuralresearcher_data/doc2query/experiments/exp374/model.ckpt-100000*" .
-gsutil cp gs://neuralresearcher_data/biobert_models/biobert_v1.1_pubmed/bert_config.json config.json
-gsutil cp gs://neuralresearcher_data/biobert_models/biobert_v1.1_pubmed/vocab.txt .
-```
-
-0. Rename the files:
-
-```bash
-for filename in model.ckpt*; do
-    mv $filename $(python -c "import re; print(re.sub(r'ckpt-\\d+', 'ckpt', '$filename'))");
-done
-```
-
-0. Evaluate the model:
-
-```bash
-cd .. # go to root of this repo
-python -um pygaggle.run.evaluate_kaggle_highlighter --method seq_class_transformer --model-name <folder path>
-```
diff --git a/docs/CovidQA.md b/docs/experiments-CovidQA.md
similarity index 98%
rename from docs/CovidQA.md
rename to docs/experiments-CovidQA.md
index e3672dac..d7bc6c03 100644
--- a/docs/CovidQA.md
+++ b/docs/experiments-CovidQA.md
@@ -99,7 +99,7 @@ mrr             0.249090910278702
 mrr@10          0.22846344887161213
 ```
 
-It takes about 10 seconds to re-rank this subset on CovidQA using a P100.
+It takes about 10 seconds to re-rank this subset on CovidQA
 
 ## Re-Ranking with monoT5