Skip to content

Commit

Permalink
Update docs (#2172)
Browse files Browse the repository at this point in the history
+ Update README
+ Update onboarding docs
+ Add deprecation message to Solrini and Elastrini
  • Loading branch information
lintool authored Aug 27, 2023
1 parent af0219f commit b64a412
Show file tree
Hide file tree
Showing 6 changed files with 28 additions and 21 deletions.
33 changes: 12 additions & 21 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,12 +13,6 @@ Among other goals, our effort aims to be [the opposite of this](http://phdcomics
Anserini grew out of [a reproducibility study of various open-source retrieval engines in 2016](https://cs.uwaterloo.ca/~jimmylin/publications/Lin_etal_ECIR2016.pdf) (Lin et al., ECIR 2016).
See [Yang et al. (SIGIR 2017)](https://dl.acm.org/authorize?N47337) and [Yang et al. (JDIQ 2018)](https://dl.acm.org/citation.cfm?doid=3289400.3239571) for overviews.

**NOTE**: Anserini was upgraded to Lucene 9.3 at commit [`272565`](https://github.com/castorini/anserini/commit/27256551e958f39495b04e89ef55de9d27f33414) (8/2/2022): this upgrade created backward compatibility issues, see [#1952](https://github.com/castorini/anserini/issues/1952).
Anserini will automatically detect Lucene 8 indexes and disable consistent tie-breaking to avoid runtime errors.
However, Lucene 9 code running on Lucene 8 indexes may give slightly different results than Lucene 8 code running on Lucene 8 indexes.
Lucene 8 code will _not_ run on Lucene 9 indexes.
Pyserini has also been upgraded and similar issues apply: Lucene 9 code running on Lucene 8 indexes may give slightly different results than Lucene 8 code running on Lucene 8 indexes.

## 🎬 Getting Started

Many Anserini features are exposed in the [Pyserini](http://pyserini.io/) Python interface.
Expand Down Expand Up @@ -285,9 +279,8 @@ For the most part, manual copying and pasting of commands into a shell is requir
+ [Indexing AI2's COVID-19 Open Research Dataset](docs/experiments-cord19.md)
+ [Baselines for the TREC-COVID Challenge](docs/experiments-covid.md)
+ [Baselines for the TREC-COVID Challenge using doc2query](docs/experiments-covid-doc2query.md)
+ [Ingesting AI2's COVID-19 Open Research Dataset into Solr and Elasticsearch](docs/experiments-cord19-extras.md)

### Other Experiments
### Other Experiments and Features

+ [Working with the 20 Newsgroups Dataset](docs/experiments-20newsgroups.md)
+ [Guide to BM25 baselines for the FEVER Fact Verification Task](docs/experiments-fever.md)
Expand All @@ -297,13 +290,7 @@ For the most part, manual copying and pasting of commands into a shell is requir
+ Runbooks for TREC 2018: [[Anserini group](docs/runbook-trec2018-anserini.md)] [[h2oloo group](docs/runbook-trec2018-h2oloo.md)]
+ Runbook for [ECIR 2019 paper on axiomatic semantic term matching](docs/runbook-ecir2019-axiomatic.md)
+ Runbook for [ECIR 2019 paper on cross-collection relevance feedback](docs/runbook-ecir2019-ccrf.md)

### Other Features

+ Use Anserini in Python via [Pyserini](http://pyserini.io/)
+ Anserini integrates with SolrCloud via [Solrini](docs/solrini.md)
+ Anserini integrates with Elasticsearch via [Elasterini](docs/elastirini.md)
+ Anserini supports [approximate nearest-neighbor search](docs/approximate-nearestneighbor.md) on arbitrary dense vectors with Lucene
+ Support for [approximate nearest-neighbor search](docs/approximate-nearestneighbor.md) on dense vectors with inverted indexes

## 🙋 How Can I Contribute?

Expand All @@ -316,10 +303,14 @@ In turn, you'll be recognized as a [contributor](https://github.com/castorini/an

Beyond that, there are always [open issues](https://github.com/castorini/anserini/issues) we would appreciate help on!

## ℹ️ Release History
## 📜️ Release History

+ v0.21.0: March 31, 2023 [[Release Notes](docs/release-notes/release-notes-v0.21.0.md)]
+ v0.20.0: January 20, 2023 [[Release Notes](docs/release-notes/release-notes-v0.20.0.md)]

<details>
<summary>older... (and historic notes)</summary>

+ v0.16.2: December 12, 2022 [[Release Notes](docs/release-notes/release-notes-v0.16.2.md)]
+ v0.16.1: November 2, 2022 [[Release Notes](docs/release-notes/release-notes-v0.16.1.md)]
+ v0.16.0: October 23, 2022 [[Release Notes](docs/release-notes/release-notes-v0.16.0.md)]
Expand All @@ -329,10 +320,6 @@ Beyond that, there are always [open issues](https://github.com/castorini/anserin
+ v0.14.2: March 24, 2022 [[Release Notes](docs/release-notes/release-notes-v0.14.2.md)]
+ v0.14.1: February 27, 2022 [[Release Notes](docs/release-notes/release-notes-v0.14.1.md)]
+ v0.14.0: January 10, 2022 [[Release Notes](docs/release-notes/release-notes-v0.14.0.md)]

<details>
<summary>older... (and historic notes)</summary>

+ v0.13.5: November 2, 2021 [[Release Notes](docs/release-notes/release-notes-v0.13.5.md)]
+ v0.13.4: October 22, 2021 [[Release Notes](docs/release-notes/release-notes-v0.13.4.md)]
+ v0.13.3: August 22, 2021 [[Release Notes](docs/release-notes/release-notes-v0.13.3.md)]
Expand Down Expand Up @@ -361,9 +348,13 @@ Beyond that, there are always [open issues](https://github.com/castorini/anserin
+ v0.2.0: September 10, 2018 [[Release Notes](docs/release-notes/release-notes-v0.2.0.md)]
+ v0.1.0: July 4, 2018 [[Release Notes](docs/release-notes/release-notes-v0.1.0.md)]

## Historical Notes
## 📜️ Historical Notes

+ Anserini was upgraded to Lucene 9.3 at commit [`272565`](https://github.com/castorini/anserini/commit/27256551e958f39495b04e89ef55de9d27f33414) (8/2/2022): this upgrade created backward compatibility issues, see [#1952](https://github.com/castorini/anserini/issues/1952).
Anserini will automatically detect Lucene 8 indexes and disable consistent tie-breaking to avoid runtime errors.
However, Lucene 9 code running on Lucene 8 indexes may give slightly different results than Lucene 8 code running on Lucene 8 indexes.
Lucene 8 code will _not_ run on Lucene 9 indexes.
Pyserini has also been upgraded and similar issues apply: Lucene 9 code running on Lucene 8 indexes may give slightly different results than Lucene 8 code running on Lucene 8 indexes.
+ Anserini was upgraded to Java 11 at commit [`17b702d`](https://github.com/castorini/anserini/commit/17b702d9c3c0971e04eb8386ab83bf2fb2630714) (7/11/2019) from Java 8.
Maven 3.3+ is also required.
+ Anserini was upgraded to Lucene 8.0 as of commit [`75e36f9`](https://github.com/castorini/anserini/commit/75e36f97f7037d1ceb20fa9c91582eac5e974131) (6/12/2019); prior to that, the toolkit uses Lucene 7.6.
Expand Down
3 changes: 3 additions & 0 deletions docs/elastirini.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
# Elastirini: Anserini Integration with Elasticsearch

⁉️ **Important Note:** As part of Anserini's upgrade to Lucene 9, support for Solrini was removed at [`272565`](https://github.com/castorini/anserini/commit/27256551e958f39495b04e89ef55de9d27f33414) on August 2, 2022.
The features documented below are no longer available, and this guide is retained only for historical reasons.

Anserini provides code for indexing into an [ELK stack](https://www.elastic.co/what-is/elk-stack) (i.e., the stack built on top of Elasticsearch), thus providing interoperable support for existing test collections.
This is the same idea discussed in the following paper:

Expand Down
3 changes: 3 additions & 0 deletions docs/experiments-cord19-extras.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
# Ingesting CORD-19 into Solr and Elasticsearch

⁉️ **Important Note:** As part of Anserini's upgrade to Lucene 9, support for Solrini was removed at [`272565`](https://github.com/castorini/anserini/commit/27256551e958f39495b04e89ef55de9d27f33414) on August 2, 2022.
The features documented below are no longer available, and this guide is retained only for historical reasons.

This document describes how to ingest the [COVID-19 Open Research Dataset (CORD-19)](https://pages.semanticscholar.org/coronavirus-research) from the [Allen Institute for AI](https://allenai.org/) into Solr and Elasticsearch.
If you want to build or download Lucene indexes for CORD-19, see [this guide](experiments-cord19.md).

Expand Down
3 changes: 3 additions & 0 deletions docs/experiments-msmarco-passage.md
Original file line number Diff line number Diff line change
Expand Up @@ -300,6 +300,9 @@ At this time, look back through the learning outcomes again and make sure you're
As a next step in the onboarding path, you basically [do the same thing again in Python with Pyserini](https://github.com/castorini/pyserini/blob/master/docs/experiments-msmarco-passage.md) (as opposed to Java with Anserini here).

Before you move on, however, add an entry in the "Reproduction Log" at the bottom of this page, following the same format: use `yyyy-mm-dd`, make sure you're using a commit id that's on the main trunk of Anserini, and use its 7-hexadecimal prefix for the link anchor text.
In the description of your pull request, please provide some details on your setup (e.g., operating system, environment and configuration, etc.).
In addition, also provide some indication of success (e.g., everything worked) or document issues you encountered.
If you think this guide can be improved in any way (e.g., you caught a typo or think a clarification is warranted), feel free to include it in the pull request.

## BM25 Tuning

Expand Down
3 changes: 3 additions & 0 deletions docs/solrini.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
# Solrini: Anserini Integration with Solr

⁉️ **Important Note:** As part of Anserini's upgrade to Lucene 9, support for Solrini was removed at [`272565`](https://github.com/castorini/anserini/commit/27256551e958f39495b04e89ef55de9d27f33414) on August 2, 2022.
The features documented below are no longer available, and this guide is retained only for historical reasons.

This page documents code for reproducing results from the following paper:

> Ryan Clancy, Toke Eskildsen, Nick Ruest, and Jimmy Lin. [Solr Integration in the Anserini Information Retrieval Toolkit.](https://cs.uwaterloo.ca/~jimmylin/publications/Clancy_etal_SIGIR2019a.pdf) _Proceedings of the 42nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2019)_, July 2019, Paris, France.
Expand Down
4 changes: 4 additions & 0 deletions docs/start-here.md
Original file line number Diff line number Diff line change
Expand Up @@ -275,8 +275,12 @@ From here, you're now ready to proceed to try and reproduce the [BM25 Baselines
](experiments-msmarco-passage.md).
Before you move on, however, add an entry in the "Reproduction Log" at the bottom of this page, following the same format: use `yyyy-mm-dd`, make sure you're using a commit id that's on the main trunk of Anserini, and use its 7-hexadecimal prefix for the link anchor text.
In the description of your pull request, please provide some details on your setup (e.g., operating system, environment and configuration, etc.).
In addition, also provide some indication of success (e.g., everything worked) or document issues you encountered.
If you think this guide can be improved in any way (e.g., you caught a typo or think a clarification is warranted), feel free to include it in the pull request.
## Reproduction Log[*](reproducibility.md)
+ Results reproduced by [@sahel-sh](https://github.com/sahel-sh) on 2023-07-21 (commit [`0e759fd`](https://github.com/castorini/anserini/commit/0e759fd3b9161a24f66c56e07f73f16eaf1490c6))
+ Results reproduced by [@Mofetoluwa](https://github.com/Mofetoluwa) on 2023-08-03 (commit [`7314128`](https://github.com/castorini/anserini/commit/73141282b62979e189ac3c87d9a902064f34a1c5))
+ Results reproduced by [@yilinjz](https://github.com/yilinjz) on 2023-08-23 (commit [`862bd27`](https://github.com/castorini/anserini/commit/862bd27d5c1400763e11424a7d44dcbf4cf48c17))

0 comments on commit b64a412

Please sign in to comment.