Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Api optimization #33

Merged
merged 26 commits into from
Jun 12, 2024
Merged

Api optimization #33

merged 26 commits into from
Jun 12, 2024

Conversation

seankim658
Copy link
Member

Switched from the indirect method using an intermediary results collection to passing the burden of computation to the database by using the MongoDB aggregation framework. This helped significantly with large disk retrievals.

Before a text search with a generic term such as "cancer" required significant batching and cold start disk retrievals could take 5-7+ seconds for retrieving from disk. Implemented a naive LRU caching solution that took repeat batch retrievals from averaging ~2 seconds to ~0.02 seconds. However, this still did not fix the root cause of a cache cold start on the initial text queries. Putting the burden of processing onto the database and removing the two step list ID process got large retrievals on cold starts down significantly, from averaging ~35 seconds before to ~5 seconds.

Further optimizing, the wildcard text index was not ideal as the data model has a significant amount of nesting and fields. By manually creating an internal field that concatenated the string values from each string field into one field called all_text, this allowed for text indexing the all_text field directly and dropping the wildcard text index. This dropped vague queries down to ~2.5 seconds.

@seankim658 seankim658 added documentation Improvements or additions to documentation enhancement New feature or request labels Jun 12, 2024
@seankim658 seankim658 self-assigned this Jun 12, 2024
@seankim658 seankim658 merged commit 847c29a into main Jun 12, 2024
@seankim658 seankim658 linked an issue Jun 12, 2024 that may be closed by this pull request
@seankim658 seankim658 deleted the api-optimization branch June 12, 2024 03:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update documentation
1 participant