-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Reporting] Fix performance of CSV generation #120309
[Reporting] Fix performance of CSV generation #120309
Conversation
Pinging @elastic/kibana-reporting-services (Team:Reporting Services) |
Pinging @elastic/kibana-app-services (Team:AppServicesUx) |
x-pack/plugins/reporting/server/export_types/csv_searchsource/generate_csv/generate_csv.ts
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work @tsullivan 🍻
Approving to unblock, but I think we should consider adding the suggested perf code change too.
|
||
const asyncGenerateRow = async (dataTableRow: Record<string, any>): Promise<string> => { | ||
return new Promise((resolve) => { | ||
setImmediate(() => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we add a comment explaining why we are using setImmediate
here? Perhaps a link to the to nodejs documentation too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
x-pack/plugins/reporting/server/export_types/csv_searchsource/generate_csv/generate_csv.ts
Outdated
Show resolved
Hide resolved
💚 Build Succeeded
Metrics [docs]
History
To update your PR or re-run it, just comment with: |
* 1. Partition the synchronous process with fewer partitions, by using | ||
* the loop counter to call setImmediate only every N amount of rows. | ||
* Testing is required to see what the best N value for most data will | ||
* be. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a feasible option. Filed #120426
* [Reporting] Fix performance of CSV generation * use a for loop with 1 operation instead of 3 chained maps * do without the callback * update comment
* [Reporting] Fix performance of CSV generation * use a for loop with 1 operation instead of 3 chained maps * do without the callback * update comment
* [Reporting] Fix performance of CSV generation * use a for loop with 1 operation instead of 3 chained maps * do without the callback * update comment
💔 Backport failed
Successful backport PRs will be merged automatically after passing CI. To backport manually run: |
* [Reporting] Fix performance of CSV generation * use a for loop with 1 operation instead of 3 chained maps * do without the callback * update comment Co-authored-by: Tim Sullivan <[email protected]>
* [Reporting] Fix performance of CSV generation * use a for loop with 1 operation instead of 3 chained maps * do without the callback * update comment
Summary
Closes #120275
Problem: CSV Generation is a CPU-intensive process. It was generating rows in a way that would not allow NodeJS to give control to other pending events: it was blocking the event loop.
Solution: This PR changes the implementation to use the "partitioning" strategy to yield control back to the event loop after every CSV row is generated.
Reference: https://nodejs.org/en/docs/guides/dont-block-the-event-loop/#complex-calculations-without-blocking-the-event-loop
Testing
metricbeat*
data that contains 806k hitsCPU utilization and event loop delay gets very high when generating a large CSV export, but Kibana is still able to process other events.