Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Speed up reader tests #3476

Merged
merged 16 commits into from
Oct 26, 2022
Merged

feat: Speed up reader tests #3476

merged 16 commits into from
Oct 26, 2022

Conversation

sjrl
Copy link
Contributor

@sjrl sjrl commented Oct 25, 2022

Related Issues

  • fixes

Proposed Changes:

Changed the reader fixture in conftest.py to use a smaller reader (500mb to ~165mb) model that still produces sensible results. This will help to speed up tests and slightly reduce the disk space usage for downloading models.

How did you test it?

Updated the existing unit tests to work with the new smaller reader. The only real changes needed to happen when comparing metrics of the reader performance. They changed slightly which is to be expected.

Notes for the reviewer

Checklist

@sjrl sjrl changed the base branch from main to 2364-rething-tutorial-14 October 25, 2022 19:30
@sjrl sjrl changed the base branch from 2364-rething-tutorial-14 to main October 25, 2022 19:30
@sjrl sjrl force-pushed the sjrl-tests-speed branch from d0a6c50 to f0f17c3 Compare October 25, 2022 19:31
@sjrl sjrl marked this pull request as ready for review October 25, 2022 20:12
@sjrl sjrl requested a review from a team as a code owner October 25, 2022 20:12
@sjrl sjrl requested review from bogdankostic and removed request for a team October 25, 2022 20:12
Copy link
Contributor

@bogdankostic bogdankostic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just left one question out of interest.

assert metrics_top_2["Retriever"]["mrr"] == 0.5
assert metrics_top_2["Retriever"]["map"] == 0.5
assert metrics_top_2["Retriever"]["recall_multi_hit"] == 0.5
assert metrics_top_2["Retriever"]["recall_single_hit"] == 0.5
assert metrics_top_2["Retriever"]["precision"] == 0.1
assert metrics_top_2["Retriever"]["ndcg"] == 0.5

metrics_top_3 = eval_result.calculate_metrics(simulated_top_k_reader=3, document_scope="document_id")
metrics_top_5 = eval_result.calculate_metrics(simulated_top_k_reader=5, document_scope="document_id")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any specific reason why you changed simulated_top_k_reader here to 5?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. The metrics only changed and became better when changing it to 5 here. Keeping it at 3 would have required the metrics to be changed to match the ones when using the simulated_top_k_reader of 1 and 2.

@sjrl sjrl merged commit 59857cb into main Oct 26, 2022
@sjrl sjrl deleted the sjrl-tests-speed branch October 26, 2022 17:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants