fix: inefficient form metadata query leading to excessive memory usage #5763

tshuli · 2023-02-14T02:21:20Z

Problem

We have been seeing very slow formsg.submissions operations which have very high db disk util
Initial suspicion was that this was due to download of large forms
Closes [8] Database read time spikes due to response downloads #3394

Solution

For load test example, see forms 63e9b7e27e26fb0012795ca8 (200k small responses) and 63e9d46e6d6d6f0012d14402 (20k large responses of >700KB each) on staging (log in using form team email, key in 1pw).
It turns out that response download is already efficient. The current implementation returns a cursor which iterates over the matching documents. It appears that mongoDB has implemented this in a non-blocking fashion (from testing, multiple concurrent downloads of large forms can proceed concurrently without spike in disk util).
Instead, the reason for the high disk utilisation was inefficient query implementation in the /metadata endpoint, which returns response metadata for storage mode submissions. This is triggered once the admin navigates to response tab.
- In submission.server.model.ts, the following code retrieves all submission documents corresponding to the form, stores it all in memory (or disk, because allowDiskUse(true)), in order to count the number of responses. This led to excessive disk utilisation and slow query (see NOTE below).
- To solve this, this has been replaced with a simpler countDocuments query.

Existing code

.facet({
  pageResults: [
    { $skip: numToSkip },
    { $limit: pageSize },
    { $project: { _id: 1, created: 1 } },
  ],
  allResults: [
    { $group: { _id: null, count: { $sum: 1 } } },
    { $project: { _id: 0 } }, // NOTE: Means project ALL fields except _id
  ],
})
// prevents out-of-memory for large search results (max 100MB).
.allowDiskUse(true) 
// NOTE: As the above query required excessive memory, 
// it was written to disk instead and this led to high disk utilisation 
.then((result: MetadataAggregateResult[]) => {
  const [{ pageResults, allResults }] = result
  const [numResults] = allResults
  const count = numResults?.count ?? 0

Improvements:

Before & After Screenshots

BEFORE:

Timeout (504) for metadata endpoint for forms with many submissions

100% disk utilisation on DB

AFTER:

metadata endpoint resolves in 200+ms

No impact to db disk util

Tests

Create storage mode form. Submit 20 responses. Check that metadata and pagination shows correctly on results tab.

timotheeg

👍 Great work!

tshuli temporarily deployed to staging-al2 February 14, 2023 02:24 — with GitHub Actions Inactive

tshuli temporarily deployed to staging-al2 February 14, 2023 02:26 — with GitHub Actions Inactive

fix: inefficient form metadata query leading to excessive memory usage

1528c65

tshuli force-pushed the fix/metadata-query branch from bd228b8 to 1528c65 Compare February 14, 2023 02:56

tshuli temporarily deployed to staging-al2 February 14, 2023 02:56 — with GitHub Actions Inactive

mergify bot mentioned this pull request Feb 14, 2023

test PR #5765

Closed

tshuli changed the base branch from develop to release-al2 February 14, 2023 03:23

tshuli changed the base branch from release-al2 to develop February 14, 2023 03:24

timotheeg approved these changes Feb 14, 2023

View reviewed changes

tshuli merged commit 38abbc9 into develop Feb 14, 2023

tshuli deleted the fix/metadata-query branch February 14, 2023 06:46

tshuli mentioned this pull request Feb 14, 2023

feat: use find instead of aggregation for better performance #5770

Merged

justynoh mentioned this pull request Feb 15, 2023

build: release v6.30.0 #5775

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: inefficient form metadata query leading to excessive memory usage #5763

fix: inefficient form metadata query leading to excessive memory usage #5763

tshuli commented Feb 14, 2023 •

edited

Loading

timotheeg left a comment

fix: inefficient form metadata query leading to excessive memory usage #5763

fix: inefficient form metadata query leading to excessive memory usage #5763

Conversation

tshuli commented Feb 14, 2023 • edited Loading

Problem

Solution

Before & After Screenshots

Tests

timotheeg left a comment

Choose a reason for hiding this comment

tshuli commented Feb 14, 2023 •

edited

Loading