Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate: sort responses by date (oldest to newest) before download #1721

Closed
r00dgirl opened this issue Apr 25, 2021 · 7 comments · Fixed by #2028
Closed

Investigate: sort responses by date (oldest to newest) before download #1721

r00dgirl opened this issue Apr 25, 2021 · 7 comments · Fixed by #2028
Assignees
Labels
discuss to be discussed P3

Comments

@r00dgirl
Copy link
Contributor

r00dgirl commented Apr 25, 2021

Natural order in which responses are sorted on most if not all form managers (many users are confused this isn't the default). Our csv download should sort by date by default instead of requiring form admins to sort as an additional step. This makes it easier to copy paste just the newest responses into another spreadsheet that the admin may already be maintaining or have made other edits / analyses on.

@r00dgirl r00dgirl added the P2 planned for next 1-2 months label Apr 25, 2021
@karrui
Copy link
Contributor

karrui commented Apr 28, 2021

May not be possible; we write to the CSV file directly from the database cursor, and it is most likely not feasible to store and sort the data in memory as the response size is unbounded.

Running sort on the submissions collection cursor will likely also slow down the query drastically and increase memory usage (needs to be verified though).

Someone can feel free to chime in and correct me if I'm wrong!

@r00dgirl
Copy link
Contributor Author

@liangyuanruo think you've worked a lot with this part of the code too, any thoughts? @mantariksh @tshuli too

@r00dgirl r00dgirl changed the title Sort responses by date (oldest to newest) before download Investigate: sort responses by date (oldest to newest) before download May 26, 2021
@mantariksh
Copy link
Contributor

agree with @karrui's assessment, barring implementing some kind of external merge-sort, which is used in databases to sort large queries with limited memory

@mantariksh mantariksh added the discuss to be discussed label May 27, 2021
@mantariksh
Copy link
Contributor

I tried looking for external sort npm packages, but these all seem to have <20 weekly downloads, not sure if this problem is important enough to warrant using one of these or implementing our own
https://www.npmjs.com/package/external-sort
https://www.npmjs.com/package/external-sorting
https://www.npmjs.com/package/exmes

@r00dgirl r00dgirl added P3 and removed P2 planned for next 1-2 months labels May 27, 2021
@liangyuanruo
Copy link
Contributor

I think discussing the performance of the database is not helpful, because result ordering is no longer preserved due to concurrency during the decryption step in the worker pool. any sort will need to be performed client-side (which imposes limits on system resources) on a separate worker thread.

the simplest implementation that I can think of right now is to introduce another message-passing mechanism and use another Web Worker to sort the aggregated data right before the download step. I think we can try to get speedups with e.g. insertion sort if the natural order from the DB produces an almost-sorted array, but one still needs to contend with memory limits in the browser.

@mantariksh
Copy link
Contributor

I think discussing the performance of the database is not helpful

Sorry if I wasn't clear - when I said "used in databases", I was just providing context for other uses of the same algorithm. I was proposing that we use this algorithm client-side to overcome the memory limitations of sorting a large number of responses.

@liangyuanruo
Copy link
Contributor

liangyuanruo commented May 27, 2021

I was proposing that we use this algorithm client-side to overcome the memory limitations of sorting a large number of responses.

Ah I see... see how regular Array.prototype.sort()'s in-place sort performs first? I'd expect memory usage to temporarily double when data has to be passed between the main thread and the web worker.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss to be discussed P3
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants