Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High-level metrics to improve web developer ergonomics #24

Closed
anssiko opened this issue Jan 14, 2022 · 12 comments
Closed

High-level metrics to improve web developer ergonomics #24

anssiko opened this issue Jan 14, 2022 · 12 comments

Comments

@anssiko
Copy link
Member

anssiko commented Jan 14, 2022

I had a chat with @kenchris to propose we revisit both the existing use cases and new use cases that have recently emerged (e.g. #14) to understand whether the current cpuSpeed and cpuUtilization metrics are still the best fit.

I think there's an opportunity to make the API even more ergonomic for web developers who are not experts in computing performance and tuning, and not familiar with related concepts.

I'd like us to assess whether the current use cases could be served with an API that instead of (or in addition to) the current cpuSpeed and cpuUtilization numerical pair would expose a finite set of human-readable compute pressure states that have semantics attached to them.

What I'm interested in exploring is to see if we could raise the level of abstraction (bonus: more privacy-preserving, future-proofing) and make the underlying low-level metrics implementation details. The low-level metrics are harder to explain to web developers and might evolve and in some cases become misleading. I suspect they could be more easily misinterpreted as well.

In this proposal, the low-level metrics to high-level metrics mapping would become an implementation detail, and implementations could also take into consideration other factors that may influence the compute pressure state such as device form factor, thermal budget, and so on when making the decision.

Here's a strawman proposal, plugging into the existing API for illustrative purposes:

enum ComputePressureState { "nominal", "fair", "serious", "critical" };

dictionary ComputePressureEntry {
  ComputePressureState state = "fair";
}

Thoughts?

Related, I think this blog post https://www.brendangregg.com/blog/2017-05-09/cpu-utilization-is-wrong.html arguing CPU utilization metric is misleading should be reviewed. It trigged quite a long discussion among developer audience (also on HN), so there's probably good nuggets of information hidden also in the comments. Leaving it here for interested folks to digest.

[Edit: The state names in the strawman proposal were tweaked a bit. The names should be considered as placeholders to illustrate the idea. These names are subject to change based on feedback received.]

@kenchris
Copy link
Contributor

kenchris commented Jan 17, 2022

I dislike these names though :-)

Low indicates that there will be a high, so I think something like "nominal" makes more sense

Normal also doesn't really make sense, what is normal load? I think "fair" is great because it is in use but load is fair and sustainable.

On the other hand moving from fair directly to critical seems a bit abrupt, so we would need a level before that, like its not critical yet, but getting there, like "serious"

nominal, fair (sustainable?), serious (significant), critical

@anssiko
Copy link
Member Author

anssiko commented Jan 17, 2022

Naming aside (🚲😁), I think working out this state machine and conditions when to transfer from one state to another would be a helpful exercise to figure out if this proposal has merit. That should shed light into what the semantics of each state should be, and how many states there should be.

I'll document some of my additional thoughts below for discussion.

My wish list for this proposed high-level API based on what I think web developers expect:

  • Hard to misuse, does not allow hard-coding expectations into web apps that don't stand the test of time
  • Makes it possible to progressively enhance existing web apps without domain expertise
  • Is self documenting, uses design, terminology and concepts familiar to web developers
  • States map closely to real-world app business logic needs and expectations, no redundant states ("when in doubt leave it out")
  • Transitions from one state to another are signalled, whenever possible, proactively so the web apps have time to adapt, especially important when transitioning toward higher pressure

From privacy perspective (with a chair hat on):

  • Adheres to data minimization, expose only as much information as is required to satisfy the key use cases
  • Enables modern privacy protections without compromising functionality (e.g. a privacy-first implementation can choose to disclose only a subset of the states, or a single state (say "fair" or "bikeshed") and web apps won't break, this combined with knobs such as adjustable reporting frequency)

For future-proofing (a browser implementer's hat on):

  • Allows implementation to evolve and improve with new hardware, form factors, OSes, without breaking existing content (e.g. does cpuSpeed >= 0.5 hard-coded in a web app now tested with system X survive the test of time when the same web app is run on system Y 5 years from now? Would e.g. "fair" leave more room for implementation to evolve?)
  • Extensible design, because we know we may want to add more things later...

@kenchris
Copy link
Contributor

In theory we should not spec what exact values map into these states as it can differ per hardware platform and even in many other cases, like a platform might become critical due to thermals but not be under heavy CPU load (looking at clock speed and utilization) and clock speed boosts can work quite differently whether connected to direct power (DC) or running off battery (AC).

I also think it would be great that silicon vendors can be innovative in this area on their platforms

@anssiko
Copy link
Member Author

anssiko commented Jan 18, 2022

Based on my initial assessment the spec should not normatively define a mapping from high-level states to any low-level metrics (such as speed or utilization value ranges) but leave that to the implementation. Otherwise, the high-level abstraction would get anchored into low-level metrics that can be misleading.

The abstraction should be defined as such it can be layered atop existing low-level metrics such as instructions per cycle (IPC) or its multiplicative inverse, cycles per instruction (CPI). It should also be possible for an implementation to make use of methods that better consider performance bottlenecks such as top-down microarchitecture analysis.

For a concrete example, it'd be up to the implementation to interpret what IPC < 1.0 or IPC > 1.0 mean in terms of compute pressure states. The former is likely CPU memory-bound, latter CPU instruction-bound. If memory-bound, different software tuning strategies apply than in an instruction-bound scenario. This suggests the spec should perhaps have informative content for implementers around the low-level metrics and their interpretation.

To summarize my thinking:

  • Web developers using the high-level API should not need to understand the intricacies of low-level metrics
  • Implementers would benefit from informative guidance on how to interpret well-established low-level metrics

Most importantly, I believe this layering would follow the priority of constituencies principle.

@fideltian
Copy link

fideltian commented Jan 20, 2022

HI Anssi & Kenneth,

 I think it makes sense if CPU manufacture could tell the applicaiton value of cpu pressure.Currently, we just did some adaptor according to the CPU usage. For example: 
 >85% : we might downgrade the resolution to assure audio.
60%-85%: we might upgrade the resolution. 

So if just consider the usage(besides others you mentioned, such as cooling, battery, memory, etc. ). We think we could give more buckets beyond 50%. such as(just a concept example) :
normal: <55%
fair: 55%-70%
serious: 70%-85%
Warning: 85%-95%
critical: >95%

@anssiko
Copy link
Member Author

anssiko commented Jan 20, 2022

@fideltian thanks for the discussion and this feedback from Zoom PWA perspective!

Hearing that this high-level metrics proposal is getting support, we're now investigating how many states would strike the right balance between the needs of web developers and privacy. I feel 4-5 would be a good starting point, but this will be clarified as the work progresses.

Some additional considerations:

  • I think more states is not always better. We aim to introduce states that we can clearly explain to web developers, especially the expected UX impact. We also want to ensure these states are mappable to web application logic that drives the key use cases.

  • We want to review what can be reliably implemented across platforms. I think it is crucial that the API is able to inform the web developer ahead of time when the pressure trend is rising.

We'll keep on refining this proposal and will loop you in for review. Thanks for your contributions!

@anssiko
Copy link
Member Author

anssiko commented Jan 26, 2022

WebKit recently restored navigator.hardwareConcurrency (Bug 233381, ships in Safari TP 138) and based on comments in that WebKit bug there would be preference for a higher-level API instead (if there was one). When this proposal is more baked in, I think it’d be good to reach out to WebKit friends for review, and make a connection to that WK bug for context.

@kenchris
Copy link
Contributor

The specification and explainer has been updated with this new approach

@anssiko
Copy link
Member Author

anssiko commented Feb 15, 2022

@kenchris thanks for updating the spec. When I opened this issue, I honestly did not expect my proposal to be turned into spec prose this fast! But given the consensus emerged fast and the proposal resonated with folks, including implementers, moving fast was appropriate.

Before you close this issue, I suggest you spend some time to update https://wicg.github.io/compute-pressure/#security-and-privacy-considerations -- currently it contains references to the old deprecated API.

Security and privacy considerations are very important for new work that is expected to advance to standardization. This new approach brings substantial improvements in these areas in addition to developer ergonomics improvements and design that is future-proof. My recommendation is to be explicit about these improvements, because some implementers may have reviewed the old API and have formed an opinion based on the old design. Concerns raised earlier have been addressed by the new API design but that may not be obvious to people who are not following this work closely.

@anssiko
Copy link
Member Author

anssiko commented Feb 17, 2022

@kenchris thanks for #51 -- this is very helpful for reviewers.

I suggest you reference https://github.com/WICG/compute-pressure/blob/main/security-privacy-self-assessment.md from https://wicg.github.io/compute-pressure/#security-and-privacy-considerations and rewording this:

Exposing hardware related events related to low level details such as exact CPU utilization or clock speed increases the risk of harming the user's privacy.

To minimize this risk, only the absolute minimal amount of information needed to to support the use-cases is exposed.

Proposal:

To mitigate this risk, no such low level details are exposed.

@kenchris
Copy link
Contributor

This has been done, I think we can close this now

@anssiko
Copy link
Member Author

anssiko commented Feb 17, 2022

Thanks!

@anssiko anssiko closed this as completed Feb 17, 2022
aarongable pushed a commit to chromium/chromium that referenced this issue Jul 19, 2022
CPU frequency statistics are not a good metric. A CPU can
have multiple boost modes depending on which core is running
and depending on whether on AC or DC power. We cannot deduct
that a CPU is stressed just because it is running at a high
frequency. Also it might be running at a lower frequency due
to power settings or being on DC power and then the system
can easily be under pressure at this lower frequency. (GitHub
issue: w3c/compute-pressure#24,
Spec change: w3c/compute-pressure@e3da844).

We remove CPU frequency information from the implementation
in this patch and use only CPU utilization temporarily. We
will switch to PressureState in the future according to the
newest spec (https://wicg.github.io/compute-pressure/#pressure-states)
and we are working on a robust algorithm to calculate
PressureState.

Bug: 1339205
Change-Id: I15a17fd1eefeeefaddc5f3df5e2a98b04cac4368
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/3725753
Reviewed-by: Matthew Denton <[email protected]>
Commit-Queue: Wei4 Wang <[email protected]>
Reviewed-by: Raphael Kubo Da Costa <[email protected]>
Reviewed-by: Reilly Grant <[email protected]>
Cr-Commit-Position: refs/heads/main@{#1025486}
chromium-wpt-export-bot pushed a commit to web-platform-tests/wpt that referenced this issue Jul 19, 2022
CPU frequency statistics are not a good metric. A CPU can
have multiple boost modes depending on which core is running
and depending on whether on AC or DC power. We cannot deduct
that a CPU is stressed just because it is running at a high
frequency. Also it might be running at a lower frequency due
to power settings or being on DC power and then the system
can easily be under pressure at this lower frequency. (GitHub
issue: w3c/compute-pressure#24,
Spec change: w3c/compute-pressure@e3da844).

We remove CPU frequency information from the implementation
in this patch and use only CPU utilization temporarily. We
will switch to PressureState in the future according to the
newest spec (https://wicg.github.io/compute-pressure/#pressure-states)
and we are working on a robust algorithm to calculate
PressureState.

Bug: 1339205
Change-Id: I15a17fd1eefeeefaddc5f3df5e2a98b04cac4368
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/3725753
Reviewed-by: Matthew Denton <[email protected]>
Commit-Queue: Wei4 Wang <[email protected]>
Reviewed-by: Raphael Kubo Da Costa <[email protected]>
Reviewed-by: Reilly Grant <[email protected]>
Cr-Commit-Position: refs/heads/main@{#1025486}
chromium-wpt-export-bot pushed a commit to web-platform-tests/wpt that referenced this issue Jul 19, 2022
CPU frequency statistics are not a good metric. A CPU can
have multiple boost modes depending on which core is running
and depending on whether on AC or DC power. We cannot deduct
that a CPU is stressed just because it is running at a high
frequency. Also it might be running at a lower frequency due
to power settings or being on DC power and then the system
can easily be under pressure at this lower frequency. (GitHub
issue: w3c/compute-pressure#24,
Spec change: w3c/compute-pressure@e3da844).

We remove CPU frequency information from the implementation
in this patch and use only CPU utilization temporarily. We
will switch to PressureState in the future according to the
newest spec (https://wicg.github.io/compute-pressure/#pressure-states)
and we are working on a robust algorithm to calculate
PressureState.

Bug: 1339205
Change-Id: I15a17fd1eefeeefaddc5f3df5e2a98b04cac4368
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/3725753
Reviewed-by: Matthew Denton <[email protected]>
Commit-Queue: Wei4 Wang <[email protected]>
Reviewed-by: Raphael Kubo Da Costa <[email protected]>
Reviewed-by: Reilly Grant <[email protected]>
Cr-Commit-Position: refs/heads/main@{#1025486}
moz-v2v-gh pushed a commit to mozilla/gecko-dev that referenced this issue Jul 21, 2022
…nformation., a=testonly

Automatic update from web-platform-tests
[ComputePressure] Remove CPU frequency information.

CPU frequency statistics are not a good metric. A CPU can
have multiple boost modes depending on which core is running
and depending on whether on AC or DC power. We cannot deduct
that a CPU is stressed just because it is running at a high
frequency. Also it might be running at a lower frequency due
to power settings or being on DC power and then the system
can easily be under pressure at this lower frequency. (GitHub
issue: w3c/compute-pressure#24,
Spec change: w3c/compute-pressure@e3da844).

We remove CPU frequency information from the implementation
in this patch and use only CPU utilization temporarily. We
will switch to PressureState in the future according to the
newest spec (https://wicg.github.io/compute-pressure/#pressure-states)
and we are working on a robust algorithm to calculate
PressureState.

Bug: 1339205
Change-Id: I15a17fd1eefeeefaddc5f3df5e2a98b04cac4368
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/3725753
Reviewed-by: Matthew Denton <[email protected]>
Commit-Queue: Wei4 Wang <[email protected]>
Reviewed-by: Raphael Kubo Da Costa <[email protected]>
Reviewed-by: Reilly Grant <[email protected]>
Cr-Commit-Position: refs/heads/main@{#1025486}

--

wpt-commits: b9afc4f837c1ebaade07b0d04ecc40fc3b28f21d
wpt-pr: 34890
jamienicol pushed a commit to jamienicol/gecko that referenced this issue Jul 27, 2022
…nformation., a=testonly

Automatic update from web-platform-tests
[ComputePressure] Remove CPU frequency information.

CPU frequency statistics are not a good metric. A CPU can
have multiple boost modes depending on which core is running
and depending on whether on AC or DC power. We cannot deduct
that a CPU is stressed just because it is running at a high
frequency. Also it might be running at a lower frequency due
to power settings or being on DC power and then the system
can easily be under pressure at this lower frequency. (GitHub
issue: w3c/compute-pressure#24,
Spec change: w3c/compute-pressure@e3da844).

We remove CPU frequency information from the implementation
in this patch and use only CPU utilization temporarily. We
will switch to PressureState in the future according to the
newest spec (https://wicg.github.io/compute-pressure/#pressure-states)
and we are working on a robust algorithm to calculate
PressureState.

Bug: 1339205
Change-Id: I15a17fd1eefeeefaddc5f3df5e2a98b04cac4368
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/3725753
Reviewed-by: Matthew Denton <[email protected]>
Commit-Queue: Wei4 Wang <[email protected]>
Reviewed-by: Raphael Kubo Da Costa <[email protected]>
Reviewed-by: Reilly Grant <[email protected]>
Cr-Commit-Position: refs/heads/main@{#1025486}

--

wpt-commits: b9afc4f837c1ebaade07b0d04ecc40fc3b28f21d
wpt-pr: 34890
mjfroman pushed a commit to mjfroman/moz-libwebrtc-third-party that referenced this issue Oct 14, 2022
CPU frequency statistics are not a good metric. A CPU can
have multiple boost modes depending on which core is running
and depending on whether on AC or DC power. We cannot deduct
that a CPU is stressed just because it is running at a high
frequency. Also it might be running at a lower frequency due
to power settings or being on DC power and then the system
can easily be under pressure at this lower frequency. (GitHub
issue: w3c/compute-pressure#24,
Spec change: w3c/compute-pressure@e3da844).

We remove CPU frequency information from the implementation
in this patch and use only CPU utilization temporarily. We
will switch to PressureState in the future according to the
newest spec (https://wicg.github.io/compute-pressure/#pressure-states)
and we are working on a robust algorithm to calculate
PressureState.

Bug: 1339205
Change-Id: I15a17fd1eefeeefaddc5f3df5e2a98b04cac4368
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/3725753
Reviewed-by: Matthew Denton <[email protected]>
Commit-Queue: Wei4 Wang <[email protected]>
Reviewed-by: Raphael Kubo Da Costa <[email protected]>
Reviewed-by: Reilly Grant <[email protected]>
Cr-Commit-Position: refs/heads/main@{#1025486}
NOKEYCHECK=True
GitOrigin-RevId: 55b9c5aa2a3c05b4ac31cd19fdc6bb6132e69be9
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants