-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Reporting] Add logging of CPU usage and memory consumption by Chromium in the reporting. #99109
Conversation
Pinging @elastic/kibana-app-services (Team:AppServices) |
} | ||
|
||
function getCpuUsage(start: NormalizedMetrics, end: NormalizedMetrics) { | ||
return (end.ProcessTime - start.ProcessTime) / (end.Timestamp - start.Timestamp) / cpus().length; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the difference between x.ProcessTime
and x.Timestamp
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ProcessTime
is the amount of time for which a CPU was used in the current process (Wikipedia). It is a sum of the time from all the CPU cores that have been used.
Timestamp
is just the current time.
Sometimes when the process was in the idle state, the ProcessTime
will be less than the actual time the process was running for. Sometimes, when the process uses multiple cores, the ProcessTime
will be more than the actual duration.
import type { Metrics as PuppeteerMetrics } from 'puppeteer'; | ||
import { cpus } from 'os'; | ||
|
||
declare module 'puppeteer' { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this added because the code returns a different object than is defined by the TS type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the puppeter
' type definitions they return object
for the send
method which is too generic and doesn't provide any type safety. The declaration here doesn't change the original signature but adds another override for that method.
endMetrics | ||
); | ||
|
||
apm.setCustomContext({ cpu, memory }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect you'll want to perform range queries, aggregate, etc on this so you'll want to use labels instead - custom context is not indexed while labels are
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@graphaelli But labels are equivalent to tags. Do you think it's logically correct to use labels here? It's something we can use aggregation for (like CPU cores), but here we are dealing with metrics.
The data we pass is fully random and distinct. Please correct me if I am wrong, but indexing this will increase the index size. Besides that, I can still filter transactions by the custom context fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dokmic we do want to perform aggregations on the data, which means it has to be indexed. The custom context is not indexed, and we can see the raw data, but in order to get charts of the data, it needs to be indexed. It will increase the index size very negligibly since it's numeric.
@dokmic I expect this will change after |
@dokmic I expected that the changes in this PR would populate I started a local apm-server and pointed my Kibana dev server to it by adding these lines to the
Then I started the dev server using I ran a few test PNG reports using Sample ECommerce data. In the APM app, I see the transactions that should have the The details of those transactions don't have custom Memory or CPU metrics. I also set up a custom index pattern for It doesn't seem like this is related to labels / custom context. I'm not sure if |
@tsullivan The custom context might be empty when sampling probability is less than 1. Could you please make sure that |
@dokmic let's also add a metric for the byte size of the PDF that was generated: --- a/x-pack/plugins/reporting/server/export_types/printable_pdf/lib/generate_pdf.ts
+++ b/x-pack/plugins/reporting/server/export_types/printable_pdf/lib/generate_pdf.ts
@@ -89,7 +89,11 @@ export async function generatePdfObservableFactory(reporting: ReportingCore) {
tracker.startGetBuffer();
logger.debug(`Generating PDF Buffer...`);
buffer = await pdfOutput.getBuffer();
- logger.debug(`PDF buffer byte length: ${buffer?.byteLength || 0}`);
+ {
+ const byteLength = buffer?.byteLength || 0;
+ logger.debug(`PDF buffer byte length: ${byteLength}`);
+ tracker.setByteLength(byteLength);
+ }
tracker.endGetBuffer();
} catch (err) {
logger.error(`Could not generate the PDF buffer!`);
--- a/x-pack/plugins/reporting/server/export_types/printable_pdf/lib/tracker.ts
+++ b/x-pack/plugins/reporting/server/export_types/printable_pdf/lib/tracker.ts
@@ -8,6 +8,7 @@
import apm from 'elastic-apm-node';
interface PdfTracker {
+ setByteLength: (byteLength: number) => void;
startLayout: () => void;
endLayout: () => void;
startScreenshots: () => void;
@@ -77,6 +78,9 @@ export function getTracker(): PdfTracker {
endGetBuffer() {
if (apmGetBuffer) apmGetBuffer.end();
},
+ setByteLength(byteLength: number) {
+ apmTrans?.setLabel('byte_length', byteLength);
+ },
end() {
if (apmTrans) apmTrans.end();
}, And also something similar for PNG. |
6f2dbbd
to
31357a7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
The main goal is to see memory consumption while generating a PDF report, and this PR achieves it.
I see a few issues that are lower priority
- On my workstation, our
ProcessTime
metric is consistently-0.01
. The overal calculated CPU utilization is0
. I'm not sure if that means we need to revisit the CPU calculation. - I see the ByteSize, CPU and Memory labels on reports that go through
generate_pdf
. For PNG reports which go throughgenerate_png
, I only see ByteSize, not CPU and Memory. This should eventually be fixed, but I am OK with having this PR merged without that fix.
x-pack/plugins/reporting/server/export_types/png/lib/generate_png.ts
Outdated
Show resolved
Hide resolved
29fdadc
to
ab7ff4a
Compare
💚 Build Succeeded
Metrics [docs]Unknown metric groupsReferences to deprecated APIs
History
To update your PR or re-run it, just comment with: |
…um (elastic#99109) * Add logging of CPU usage by chromium * Add logging of memory consumption by chromium * Add PDF report byte length logging * Add PNG report byte length logging # Conflicts: # x-pack/plugins/reporting/server/browsers/chromium/driver_factory/index.ts
…um (elastic#99109) * Add logging of CPU usage by chromium * Add logging of memory consumption by chromium * Add PDF report byte length logging * Add PNG report byte length logging
Summary
Add logging of CPU usage and memory consumption by Chromium in the reporting.
Memory consumption is equal to the JavaScript heap size at the end of the screenshot transaction. That shows the process memory consumption proportionally, and therefore, can be used as historical data. Unfortunately, there is no other reliable way to get the metric because the Memory Domain is an experimental feature.
CPU usage is evaluated by the following formula Process Time / Actual Time / Number of Virtual Cores. Process time and actual time are vectors, so their division will give average CPU usage for a certain range. The ProcessTime can be higher than ActualTime on multicore systems, so it has to be divided by the number of virtual cores to get a relative value in the 0..1 range. This approach takes a constant time
O(1)
and more accurate than a polling mechanism. There is another potential way to get the CPU usage, but it is not available in the current Chromium build anyway.Those metrics are being logged in percentage and megabytes accordingly. At the same time, they are exposed to APM as a decimal and a number of bytes for the following formatting in Kibana itself.
The metrics are sent to the APM in the custom context and will be available in
transaction.custom.cpu
andtransaction.custom.memory
fields. Please mind that the custom context might be empty when sampling probability is less than 1.Resolves #79793.
Checklist