From ed9674056fe5c51c88a28169a33122a4b87fcbed Mon Sep 17 00:00:00 2001 From: HangCui0510 <64120158+HangCui0510@users.noreply.github.com> Date: Mon, 13 Jul 2020 19:22:20 -0400 Subject: [PATCH] Create CovidQA Doc (#56) * Update Replication Log * Update Replication Log * Update requirements * Create CovidQA doc * Create CovidQA doc * Create CovidQA Docs * Match file names, delete CovidQA from Readme --- README.md | 111 +++----------------------- docs/experiments-CovidQA.md | 150 ++++++++++++++++++++++++++++++++++++ 2 files changed, 159 insertions(+), 102 deletions(-) create mode 100644 docs/experiments-CovidQA.md diff --git a/README.md b/README.md index d5fb3f9c..35a0dde4 100644 --- a/README.md +++ b/README.md @@ -20,6 +20,15 @@ Currently, this repo contains implementations of the rerankers for [CovidQA](htt 0. Install [Anserini](https://github.com/castorini/anserini). +## Additional Instructions + +0. Clone the repo with `git clone --recursive https://github.com/castorini/pygaggle.git` + +0. Make you sure you have an installation of [Python 3.6+](https://www.python.org/downloads/). All `python` commands below refer to this. + +0. For pip, do `pip install -r requirements.txt` + * If you prefer Anaconda, use `conda env create -f environment.yml && conda activate pygaggle`. + # A simple reranking example The code below exemplifies how to score two documents for a given query using a T5 reranker from [Document Ranking with a Pretrained @@ -56,105 +65,3 @@ scores = [result.score for result in reranker.rerank(query, documents)] # scores = [-0.1782158613204956, -0.36637523770332336] ``` -# Evaluations - -## Additional Instructions - -0. Clone the repo with `git clone --recursive https://github.com/castorini/pygaggle.git` - -0. Make you sure you have an installation of [Python 3.6+](https://www.python.org/downloads/). All `python` commands below refer to this. - -0. For pip, do `pip install -r requirements.txt` - * If you prefer Anaconda, use `conda env create -f environment.yml && conda activate pygaggle`. - - -## Running rerankers on CovidQA - -For a full list of mostly self-explanatory environment variables, see [this file](https://github.com/castorini/pygaggle/blob/master/pygaggle/settings.py#L7). - -BM25 uses the CPU. If you don't have a GPU for the transformer models, pass `--device cpu` (PyTorch device string format) to the script. - -*Note: Run the following evaluations at root of this repo.* - -### Unsupervised Methods - -**BM25**: - -```bash -python -um pygaggle.run.evaluate_kaggle_highlighter --method bm25 -``` - -**BERT**: - -```bash -python -um pygaggle.run.evaluate_kaggle_highlighter --method transformer --model-name bert-base-cased -``` - -**SciBERT**: - -```bash -python -um pygaggle.run.evaluate_kaggle_highlighter --method transformer --model-name allenai/scibert_scivocab_cased -``` - -**BioBERT**: - -```bash -python -um pygaggle.run.evaluate_kaggle_highlighter --method transformer --model-name biobert -``` - -### Supervised Methods - -**T5 (fine-tuned on MS MARCO)**: - -```bash -python -um pygaggle.run.evaluate_kaggle_highlighter --method t5 -``` - -**BioBERT (fine-tuned on SQuAD v1.1)**: - -0. `mkdir biobert-squad && cd biobert-squad` - -0. Download the weights, vocab, and config from the [BioBERT repository](https://github.com/dmis-lab/bioasq-biobert) to `biobert-squad`. - -0. Untar the model and rename some files in `biobert-squad`: - -```bash -tar -xvzf BERT-pubmed-1000000-SQuAD.tar.gz -mv bert_config.json config.json -for filename in model.ckpt*; do - mv $filename $(python -c "import re; print(re.sub(r'ckpt-\\d+', 'ckpt', '$filename'))"); -done -``` - -0. Evaluate the model: - -```bash -cd .. # go to root of this of repo -python -um pygaggle.run.evaluate_kaggle_highlighter --method qa_transformer --model-name -``` - -**BioBERT (fine-tuned on MS MARCO)**: - -0. Download the weights, vocab, and config from our Google Storage bucket. This requires an installation of [gsutil](https://cloud.google.com/storage/docs/gsutil_install?hl=ru). - -```bash -mkdir biobert-marco && cd biobert-marco -gsutil cp "gs://neuralresearcher_data/doc2query/experiments/exp374/model.ckpt-100000*" . -gsutil cp gs://neuralresearcher_data/biobert_models/biobert_v1.1_pubmed/bert_config.json config.json -gsutil cp gs://neuralresearcher_data/biobert_models/biobert_v1.1_pubmed/vocab.txt . -``` - -0. Rename the files: - -```bash -for filename in model.ckpt*; do - mv $filename $(python -c "import re; print(re.sub(r'ckpt-\\d+', 'ckpt', '$filename'))"); -done -``` - -0. Evaluate the model: - -```bash -cd .. # go to root of this repo -python -um pygaggle.run.evaluate_kaggle_highlighter --method seq_class_transformer --model-name -``` diff --git a/docs/experiments-CovidQA.md b/docs/experiments-CovidQA.md new file mode 100644 index 00000000..d7bc6c03 --- /dev/null +++ b/docs/experiments-CovidQA.md @@ -0,0 +1,150 @@ +# PyGaggle: Neural Ranking Baselines on CovidQA + +This page contains instructions for running various neural reranking baselines on the CovidQA ranking task. + +Note 1: Run the following instructions at root of this repo. +Note 2: Make sure that you have access to a GPU +Note 3: Installation must have been done from source and make sure the [anserini-eval](https://github.com/castorini/anserini-eval) submodule is pulled. +To do this, first clone the repository recursively. + +``` +git clone --recursive https://github.com/castorini/pygaggle.git +``` + +Then install PyGaggle using: + +``` +pip install pygaggle/ +``` + +## Re-Ranking with Random + +NL Question: + +``` +python -um pygaggle.run.evaluate_kaggle_highlighter --method random \ + --dataset data/kaggle-lit-review-0.2.json \ + --index-dir indexes/lucene-index-cord19-paragraph-2020-05-12 +``` + +The following output will be visible after it has finished: + +``` +precision@1 0.0 +recall@3 0.0199546485260771 +recall@50 0.3247165532879819 +recall@1000 1.0 +mrr 0.03999734528458418 +mrr@10 0.020888672929489253 +``` + +Keyword Query + +``` +python -um pygaggle.run.evaluate_kaggle_highlighter --method random \ + --split kq \ + --dataset data/kaggle-lit-review-0.2.json \ + --index-dir indexes/lucene-index-cord19-paragraph-2020-05-12 +``` + +The following output will be visible after it has finished: + +``` +precision@1 0.0 +recall@3 0.0199546485260771 +recall@50 0.3247165532879819 +recall@1000 1.0 +mrr 0.03999734528458418 +mrr@10 0.020888672929489253 +``` + +## Re-Ranking with BM25 + +NL Question: + +``` +python -um pygaggle.run.evaluate_kaggle_highlighter --method bm25 \ + --dataset data/kaggle-lit-review-0.2.json \ + --index-dir indexes/lucene-index-cord19-paragraph-2020-05-12 +``` + +The following output will be visible after it has finished: + +``` +precision@1 0.14685314685314685 +recall@3 0.2199546485260771 +recall@50 0.6582766439909296 +recall@1000 0.6820861678004534 +mrr 0.24651188194041115 +mrr@10 0.2267060792570997 +``` + +Keyword Query: + +``` +python -um pygaggle.run.evaluate_kaggle_highlighter --method bm25 \ + --split kq \ + --dataset data/kaggle-lit-review-0.2.json \ + --index-dir indexes/lucene-index-cord19-paragraph-2020-05-12 +``` + +The following output will be visible after it has finished: + +``` +precision@1 0.14685314685314685 +recall@3 0.22675736961451243 +recall@50 0.6650793650793649 +recall@1000 0.6888888888888888 +mrr 0.249090910278702 +mrr@10 0.22846344887161213 +``` + +It takes about 10 seconds to re-rank this subset on CovidQA + +## Re-Ranking with monoT5 + +NL Question: + +``` +python -um pygaggle.run.evaluate_kaggle_highlighter --method t5 \ + --dataset data/kaggle-lit-review-0.2.json \ + --index-dir indexes/lucene-index-cord19-paragraph-2020-05-12 +``` + +The following output will be visible after it has finished: + +``` +precision@1 0.2789115646258503 +recall@3 0.41854551344347257 +recall@50 0.92555879494655 +recall@1000 1.0 +mrr 0.417982565405279 +mrr@10 0.4045405463772811 +``` + +Keyword Query: + +``` +python -um pygaggle.run.evaluate_kaggle_highlighter --method t5 \ + --split kq \ + --dataset data/kaggle-lit-review-0.2.json \ + --index-dir indexes/lucene-index-cord19-paragraph-2020-05-12 +``` + +The following output will be visible after it has finished: + +``` +precision@1 0.24489795918367346 +recall@3 0.38566569484936825 +recall@50 0.9231778425655977 +recall@1000 1.0 +mrr 0.37988285486956513 +mrr@10 0.3671336788683727 +``` + +It takes about 17 minutes to re-rank this subset on CovidQA using a P100. + +If you were able to replicate these results, please submit a PR adding to the replication log! + + +## Replication Log \ No newline at end of file