diff --git a/docs/experiments-msmarco-passage.md b/docs/experiments-msmarco-passage.md
index 5a3e72266b..1ad1ce260a 100644
--- a/docs/experiments-msmarco-passage.md
+++ b/docs/experiments-msmarco-passage.md
@@ -14,6 +14,17 @@ If you're a Waterloo student traversing the [onboarding path](https://github.com
+ Be able to evaluate the retrieved results above.
+ Understand the MRR metric.
+What's Anserini?
+Well, it's the repo that you're in right now.
+Anserini is a toolkit (in Java) for reproducible information retrieval research built on the [Luence search library](https://lucene.apache.org/).
+The Lucene search library provides components of the popular [Elasticsearch](https://www.elastic.co/) platform.
+
+Think of it this way: Lucene provides a "kit of parts".
+Elasticsearch provides "assembly of parts" targeted to production search applications, with a REST-centric API.
+Anserini provides an alternative way of composing the same core components together, targeted at information retrieval researchers.
+By building on Lucene, Anserini aims to bridge the gap between academic information retrieval research and the practice of building real-world search applications.
+That is, most things done with Anserini can be "translated" into Elasticsearch quite easily.
+
## Data Prep
In this guide, we're just going through the mechanical steps of data prep.
@@ -263,8 +274,9 @@ We can find the MRR@10 for `qid` 1048585 above:
```bash
$ tools/eval/trec_eval.9.0.4/trec_eval -q -c -M 10 -m recip_rank \
- collections/msmarco-passage/qrels.dev.small.trec \
- runs/run.msmarco-passage.dev.small.trec | grep 1048585
+ collections/msmarco-passage/qrels.dev.small.trec \
+ runs/run.msmarco-passage.dev.small.trec | grep 1048585
+
recip_rank 1048585 1.0000
```
@@ -280,6 +292,8 @@ In short, it's complicated.
At this time, look back through the learning outcomes again and make sure you're good.
As a next step in the onboarding path, you basically [do the same thing again in Python with Pyserini](https://github.com/castorini/pyserini/blob/master/docs/experiments-msmarco-passage.md) (as opposed to Java with Anserini here).
+Before you move on, however, add an entry in the "Reproduction Log" at the bottom of this page, following the same format: use `yyyy-mm-dd`, make sure you're using a commit id that's on the main trunk of Anserini, and use its 7-hexadecimal prefix for the link anchor text.
+
## BM25 Tuning
This section is **not** part of the onboarding path, so feel free to skip.
diff --git a/docs/start-here.md b/docs/start-here.md
index a8a8dd121a..8af9480e57 100644
--- a/docs/start-here.md
+++ b/docs/start-here.md
@@ -20,10 +20,11 @@ What's the problem we're trying to solve?
This is the definition I typically give:
-> Given an information need expressed as a query _q_, the text ranking task is to return a ranked list of _k_ texts {_d1_, _d2_ ... _dk_} from an arbitrarily large but finite collection
+> Given an information need expressed as a query _q_, the text retrieval task is to return a ranked list of _k_ texts {_d1_, _d2_ ... _dk_} from an arbitrarily large but finite collection
of texts _C_ = {_di_} that maximizes a metric of interest, for example, nDCG, AP, etc.
-This problem has been given various names, e.g., the search problem, the information retrieval problem, the text ranking problem, etc.
+This problem has been given various names, e.g., the search problem, the information retrieval problem, the text ranking problem, the top-_k_ document retrieval problem, etc.
+In most contexts, "ranking" and "retrieval" are used interchangeably.
Basically, this is what _search_ (i.e., information retrieval) is all about.
Let's try to unpack the definition a bit.
@@ -276,5 +277,7 @@ By now you should be able to connect the concepts we introduced to how they mani
From here, you're now ready to proceed to try and reproduce the [BM25 Baselines for MS MARCO Passage Ranking
](experiments-msmarco-passage.md).
+Before you move on, however, add an entry in the "Reproduction Log" at the bottom of this page, following the same format: use `yyyy-mm-dd`, make sure you're using a commit id that's on the main trunk of Anserini, and use its 7-hexadecimal prefix for the link anchor text.
+
## Reproduction Log[*](reproducibility.md)