Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document Store test refactoring #3449

Merged
merged 23 commits into from
Oct 31, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
212 changes: 92 additions & 120 deletions .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -91,17 +91,22 @@ jobs:
if: failure() && github.repository_owner == 'deepset-ai' && github.ref == 'refs/heads/main'

unit-tests:
name: Unit / ${{ matrix.os }}
name: Unit / ${{ matrix.topic }} / ${{ matrix.os }}
needs:
- mypy
- pylint
strategy:
fail-fast: false
matrix:
os: [ubuntu-latest, windows-latest, macos-latest]
os:
- ubuntu-latest
- windows-latest
- macos-latest
topic:
- document_stores
runs-on: ${{ matrix.os }}
steps:
- uses: actions/checkout@v2
- uses: actions/checkout@v3

- name: Setup Python
uses: ./.github/actions/python_cache/
Expand All @@ -110,14 +115,94 @@ jobs:
run: pip install .[all]

- name: Run
run: pytest -m "unit" test/
run: pytest -m "unit" test/${{ matrix.topic }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha, this needs to match the test directory, we add a small comment perhaps?


- uses: act10ns/slack@v1
with:
status: ${{ job.status }}
channel: '#haystack'
if: failure() && github.repository_owner == 'deepset-ai' && github.ref == 'refs/heads/main'

integration-tests-elasticsearch:
name: Integration / Elasticsearch / ${{ matrix.os }}
needs:
- unit-tests
strategy:
fail-fast: false
matrix:
os: [ubuntu-latest]
runs-on: ${{ matrix.os }}
services:
elasticsearch:
image: elasticsearch:7.17.6
env:
discovery.type: "single-node"
ES_JAVA_OPTS: "-Xms128m -Xmx256m"
ports:
- 9200:9200
# env:
# ELASTICSEARCH_HOST: "elasticsearch"
steps:
- uses: actions/checkout@v3

- name: Setup Python
uses: ./.github/actions/python_cache/

- name: Install Haystack
run: pip install -U .[docstores]

- name: Run tests
run: |
pytest -x -m "document_store and integration" test/document_stores/test_elasticsearch.py

- uses: act10ns/slack@v1
with:
status: ${{ job.status }}
channel: '#haystack'
if: failure() && github.repository_owner == 'deepset-ai' && github.ref == 'refs/heads/main'

integration-tests-opensearch:
name: Integration / Opensearch / ${{ matrix.os }}
needs:
- unit-tests
strategy:
fail-fast: false
matrix:
os: [ubuntu-latest]
runs-on: ${{ matrix.os }}
services:
opensearch:
image: opensearchproject/opensearch:1.3.5
env:
discovery.type: "single-node"
ES_JAVA_OPTS: "-Xms128m -Xmx256m"
ports:
- 9200:9200
# env:
# OPENSEARCH_HOST: "opensearch"
steps:
- uses: actions/checkout@v3

- name: Setup Python
uses: ./.github/actions/python_cache/

- name: Install Haystack
run: pip install -U .[docstores]

- name: Run tests
run: |
pytest -x -m "document_store and integration" test/document_stores/test_opensearch.py

- uses: act10ns/slack@v1
with:
status: ${{ job.status }}
channel: '#haystack'
if: failure() && github.repository_owner == 'deepset-ai' && github.ref == 'refs/heads/main'

#
# TODO: the following steps need to be revisited
#

unit-tests-linux:
needs:
- mypy
Expand Down Expand Up @@ -215,117 +300,6 @@ jobs:
channel: '#haystack'
if: failure() && github.repository_owner == 'deepset-ai' && github.ref == 'refs/heads/main'

elasticsearch-tests-linux:
needs:
- mypy
- pylint
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2

- name: Setup Elasticsearch
run: |
docker run -d -p 9200:9200 -e "discovery.type=single-node" -e "ES_JAVA_OPTS=-Xms128m -Xmx256m" elasticsearch:7.9.2

# TODO Let's try to remove this one from the unit tests
- name: Install pdftotext
run: wget --no-check-certificate https://dl.xpdfreader.com/xpdf-tools-linux-4.04.tar.gz && tar -xvf xpdf-tools-linux-4.04.tar.gz && sudo cp xpdf-tools-linux-4.04/bin64/pdftotext /usr/local/bin

- name: Setup Python
uses: ./.github/actions/python_cache/

- name: Install Haystack
run: pip install .

- name: Run tests
env:
TOKENIZERS_PARALLELISM: 'false'
run: |
pytest ${{ env.PYTEST_PARAMS }} -m "elasticsearch and not integration" test/document_stores/ --document_store_type=elasticsearch

- name: Dump docker logs on failure
if: failure()
uses: jwalton/gh-docker-logs@v1

- uses: act10ns/slack@v1
with:
status: ${{ job.status }}
channel: '#haystack'
if: failure() && github.repository_owner == 'deepset-ai' && github.ref == 'refs/heads/main'

elasticsearch-tests-windows:
needs:
- mypy
- pylint
runs-on: windows-latest
if: contains(github.event.pull_request.labels.*.name, 'topic:windows') || !github.event.pull_request.draft

steps:
- uses: actions/checkout@v2

- name: Install dependencies
run: |
choco install --no-progress xpdf-utils
choco install --no-progress openjdk --version=11.0.2.01
refreshenv
choco install --no-progress elasticsearch --version=7.9.2
refreshenv
Get-Service elasticsearch-service-x64 | Start-Service

- name: Setup Python
uses: ./.github/actions/python_cache/
with:
prefix: windows

- name: Run tests
env:
TOKENIZERS_PARALLELISM: 'false'
run: |
pytest ${{ env.PYTEST_PARAMS }} -m "elasticsearch and not integration" test/document_stores/ ${{ env.SUITES_EXCLUDED_FROM_WINDOWS }} --document_store_type=elasticsearch

- uses: act10ns/slack@v1
with:
status: ${{ job.status }}
channel: '#haystack'
if: failure() && github.repository_owner == 'deepset-ai' && github.ref == 'refs/heads/main'

opensearch-tests-linux:
needs:
- mypy
- pylint
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2

- name: Setup Opensearch
run: |
docker run -d -p 9201:9200 -p 9600:9600 -e "discovery.type=single-node" opensearchproject/opensearch:1.3.5

# TODO Let's try to remove this one from the unit tests
- name: Install pdftotext
run: wget --no-check-certificate https://dl.xpdfreader.com/xpdf-tools-linux-4.04.tar.gz && tar -xvf xpdf-tools-linux-4.04.tar.gz && sudo cp xpdf-tools-linux-4.04/bin64/pdftotext /usr/local/bin

- name: Setup Python
uses: ./.github/actions/python_cache/

- name: Install Haystack
run: pip install .

- name: Run tests
env:
TOKENIZERS_PARALLELISM: 'false'
run: |
pytest ${{ env.PYTEST_PARAMS }} -m "opensearch and not integration" test/document_stores/test_document_store.py --document_store_type=opensearch

- name: Dump docker logs on failure
if: failure()
uses: jwalton/gh-docker-logs@v1

- uses: act10ns/slack@v1
with:
status: ${{ job.status }}
channel: '#haystack'
if: failure() && github.repository_owner == 'deepset-ai' && github.ref == 'refs/heads/main'

faiss-tests-linux:
needs:
Expand Down Expand Up @@ -654,7 +628,6 @@ jobs:
integration-tests-linux:
needs:
- unit-tests-linux
- elasticsearch-tests-linux

timeout-minutes: 60
strategy:
Expand Down Expand Up @@ -689,7 +662,6 @@ jobs:
run: |
python -c "from transformers import AutoModel;[AutoModel.from_pretrained(model_name) for model_name in ['vblagoje/bart_lfqa','yjernite/bart_eli5', 'vblagoje/dpr-ctx_encoder-single-lfqa-wiki', 'vblagoje/dpr-question_encoder-single-lfqa-wiki', 'facebook/dpr-question_encoder-single-nq-base', 'facebook/dpr-ctx_encoder-single-nq-base', 'elastic/distilbert-base-cased-finetuned-conll03-english']]"


- name: Run Elasticsearch
run: |
docker run -d -p 9200:9200 -e "discovery.type=single-node" -e "ES_JAVA_OPTS=-Xms128m -Xmx256m" elasticsearch:7.9.2
Expand Down Expand Up @@ -736,8 +708,9 @@ jobs:
- name: Run tests
env:
TOKENIZERS_PARALLELISM: 'false' # Avoid logspam by tokenizers
# we add "and not document_store" to exclude the tests that were ported to the new strategy
run: |
pytest ${{ env.PYTEST_PARAMS }} -m "integration" test/${{ matrix.folder }}
pytest ${{ env.PYTEST_PARAMS }} -m "integration and not document_store" test/${{ matrix.folder }}

- name: Dump docker logs on failure
if: failure()
Expand All @@ -752,7 +725,6 @@ jobs:
integration-tests-windows:
needs:
- unit-tests-windows
- elasticsearch-tests-windows
runs-on: windows-latest
if: contains(github.event.pull_request.labels.*.name, 'topic:windows') || !github.event.pull_request.draft

Expand Down Expand Up @@ -798,4 +770,4 @@ jobs:
with:
status: ${{ job.status }}
channel: '#haystack'
if: failure() && github.repository_owner == 'deepset-ai' && github.ref == 'refs/heads/main'
if: failure() && github.repository_owner == 'deepset-ai' && github.ref == 'refs/heads/main'
2 changes: 1 addition & 1 deletion conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ def pytest_addoption(parser):
parser.addoption(
"--document_store_type",
action="store",
default="elasticsearch, faiss, sql, memory, milvus1, milvus, weaviate, pinecone, opensearch",
default="elasticsearch, faiss, sql, memory, milvus1, milvus, weaviate, pinecone",
)


Expand Down
1 change: 1 addition & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -347,6 +347,7 @@ markers = [
"milvus: requires a Milvus 2 setup",
"milvus1: requires a Milvus 1 container",
"opensearch",
"document_store",
]
log_cli = true

Expand Down
25 changes: 1 addition & 24 deletions test/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -152,7 +152,6 @@ def pytest_collection_modifyitems(config, items):
"pinecone": [pytest.mark.pinecone],
# FIXME GraphDB can't be treated as a regular docstore, it fails most of their tests
"graphdb": [pytest.mark.integration],
"opensearch": [pytest.mark.opensearch],
}
for item in items:
for name, markers in name_to_markers.items():
Expand Down Expand Up @@ -196,17 +195,7 @@ def infer_required_doc_store(item, keywords):
# 2. if the test name contains the docstore name, we use that
# 3. use an arbitrary one by calling set.pop()
required_doc_store = None
all_doc_stores = {
"elasticsearch",
"faiss",
"sql",
"memory",
"milvus1",
"milvus",
"weaviate",
"pinecone",
"opensearch",
}
all_doc_stores = {"elasticsearch", "faiss", "sql", "memory", "milvus1", "milvus", "weaviate", "pinecone"}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So as we add test_memory.py, it will get removed from these existing "references"? We'll do that one-by-one for each document store @masci ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vblagoje correct, up to the point where we can remove test_document_store.py

docstore_markers = set(keywords).intersection(all_doc_stores)
if len(docstore_markers) > 1:
# if parameterized infer the docstore from the parameter
Expand Down Expand Up @@ -1099,18 +1088,6 @@ def get_document_store(
knn_engine="faiss",
)

elif document_store_type == "opensearch":
document_store = OpenSearchDocumentStore(
index=index,
return_embedding=True,
embedding_dim=embedding_dim,
embedding_field=embedding_field,
similarity=similarity,
recreate_index=recreate_index,
port=9201,
knn_engine="nmslib",
)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have test_opensearch.py, and we are removing OpenSearchDocumentStore from get_document_store yet we don't do the same for ElasticsearchDocumentStore. We are not removing it from get_document_store and we added 'test_elasticsearch.py'. Just a bit confused here. What's happening?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Elasticsearch keeps being used in some tests in test_document_store.py that are triggered with the complex markers matrix, while Opensearch had a way smaller test coverage there. The result is that Opensearch tests are now 100% implemented in the new module test_opensearch.py while Elasticsearch needs more iterations to finish moving tests out of test_document_store.py into test_elasticsearch.py. This can't be done without touching the other Document Stores so I plan to do that later.

else:
raise Exception(f"No document store fixture for '{document_store_type}'")

Expand Down
Loading