This one line fix improves performance by 99% on large databases :) And it is
also an object lesson in the law of unintended consequences, so here's a
summary to use as a test case.
Some background:
- the timeseries interface allows users to specify a set of keys that they are querying
- the underlying mongodb implementation splits the keys into the
`timeseries_db` which stores raw data, and `analysis_timeseries_db` which
stores processed data
- these were originally in the same database. so when we split them, to make
sure that we don't lose any data inadvertently, we query both collections and
merge the results
- in response to a query, instead of returning the full results in memory,
mongodb returns a cursor that you can iterate over
- To ensure that we didn't have to read the entire results into memory every
time, we chained the values returned from the two queries
e-mission@5367a01
```
return itertools.chain(orig_ts_db_result, analysis_ts_db_result)
```
so far so good. but then we ran into a series of bugs that we fixed by building
on each other.
1. If the entries are only in one database, the other database is queried with
an empty array for the key, which returns all values
(e-mission/e-mission-docs#168)
- so we added a check - if there are no keys queried, we return an empty
iterator that can be chained
e-mission@b7f835a
1. But then, the empty iterator is not a cursor, so we can't display counts
returned from each database (e-mission#599)
- we fix this by performing an invalid query so that we get an empty cursor (e-mission@14aa503)
- This is purely a nice to have, and the PR even says that the changes to
enable it can be reverted if needed.
- But the changes were correct, and the tests passed so we retained them
However, the INVALID_QUERY that we used was {"1": "2"}, and we do not have an
index in the database on the key "1". So as the database size grew, mongodb was
taking 45 seconds to iterate over record and determine that there were no "1"s.
Switching from "1" -> "metadata.key", which *is* indexed, dramatically improves
performance from 2 mins 6 secs to 150 ms.
e-mission/e-mission-docs#261 (comment)
to
e-mission/e-mission-docs#261 (comment)