Api optimization #33

seankim658 · 2024-06-12T03:57:04Z

Switched from the indirect method using an intermediary results collection to passing the burden of computation to the database by using the MongoDB aggregation framework. This helped significantly with large disk retrievals.

Before a text search with a generic term such as "cancer" required significant batching and cold start disk retrievals could take 5-7+ seconds for retrieving from disk. Implemented a naive LRU caching solution that took repeat batch retrievals from averaging ~2 seconds to ~0.02 seconds. However, this still did not fix the root cause of a cache cold start on the initial text queries. Putting the burden of processing onto the database and removing the two step list ID process got large retrievals on cold starts down significantly, from averaging ~35 seconds before to ~5 seconds.

Further optimizing, the wildcard text index was not ideal as the data model has a significant amount of nesting and fields. By manually creating an internal field that concatenated the string values from each string field into one field called all_text, this allowed for text indexing the all_text field directly and dropping the wildcard text index. This dropped vague queries down to ~2.5 seconds.

seankim658 added 26 commits June 11, 2024 12:44

switch to aggregation framework

8cdfded

fix query path retrieval

12a2166

fix role count value retrieval

24fd4d4

fix total_count key typo

addc3af

add missing entity type for full search data model

3b19335

fix entity type and ID sorting

3d34da2

add aggregation pipeline explain logging

2752e6a

add pipeline logging

c5d308f

more logging

73bde24

fix explain logging

f9eb612

logging formatting

be32a64

additional log formatting

e882a06

more log format tweaking

099dc86

increment available filter orders

474788d

add misc script to create concat field

483355a

formatting

ee352e0

add log checkpoint

79d9643

add None check

c4b1800

index all_text field

bd5fa1f

fix assessed entity type filter

0c40514

add all_text to projection object

af61397

update documentation

882ebfe

update TOC

d484f2a

all all_text to projection and fix docstring

a4af617

add all_text to projection stage

b9e5a2e

add server specification

bca7472

seankim658 added documentation Improvements or additions to documentation enhancement New feature or request labels Jun 12, 2024

seankim658 self-assigned this Jun 12, 2024

seankim658 merged commit 847c29a into main Jun 12, 2024

seankim658 linked an issue Jun 12, 2024 that may be closed by this pull request

Update documentation #28

Closed

seankim658 deleted the api-optimization branch June 12, 2024 03:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Api optimization #33

Api optimization #33

seankim658 commented Jun 12, 2024

Api optimization #33

Api optimization #33

Conversation

seankim658 commented Jun 12, 2024