[Estimates on VMs] - Improving vHost-Ratio splitting #5

ArneTR · 2023-07-01T17:26:44Z

ArneTR
Jul 1, 2023
Maintainer

At the moment the model uses the vhost-ratio parameter to split the energy in a virtualized system.

An example:

A machine might have 40 cores, but we are only assigned one
We can then measure a CPU utilization of 100%
We have a vhost-ratio parameter of 1/40
Then the result will be whatever the power draw of the full machine is divided by 40

This is the mechanism we account for at the moment when the model is in a virtualized system.

However, this is most likely not the case. The machine as a whole is most likely rather on a 20-70% utilization, as it is normal with cloud vendors.

The picture from VM ware shows non-hyperscaler datacenters. Hyperscalers report however a higher utilization.

I propose setting a new variable that is the "bare-metal-utilization" and then rather using the vhost-ratio as a factor to shift that a little.

An example:

If we have a machine that is assigned 1 core out of 40
and we assume we are on Google
then we set the bare-metal-utilization to 0.5, which means 50%
When our measured utilzation on our core is now 100% we request from the model the utilization of (0.5 + 1/40 = 0.525)

The downside with that approach is that it is even one more extra assumption. However assuming that no one else on the machine, as we did before, is most likely wrong.

On a very high-core machine the resulting values will then change only in very small quantities, which is probably more closer to reality, but also incentivises users less to reduce CPU consumption as the effects are smaller.

This is an idea and I would like to discuss it. Especially if there are logical errors in it ...

ArneTR · 2023-07-01T17:39:20Z

ArneTR
Jul 1, 2023
Maintainer Author

Added to this: Here some actual measurements from a 48-core machine we have:

Power draw in idle: 17.60 W
Power draw with one core on 100%: 65.5W
Power draw with 24 cores on 100%: 181.57 W
Power draw with 25 cores on 100%: 186.18 W
Power draw with 48 cores on 100%: 214.62 W

As we can see the power draw in idle is greatly reduced and strongly non linear. However, this is not what we wanted to look at.

If the model estimation would be spot on for the bare metal machine, but we would now virtualize it and assign ourselves one core than we would guess 214.62 W and then divide by 48, which equals to 4.47 W

if we use the method proposed here by setting an operating point (bare-metal-utilization ) we would guess 186.18 W and then divided it by 48 which equals to 3.88 W

0 replies

ArneTR · 2023-07-24T14:02:30Z

ArneTR
Jul 24, 2023
Maintainer Author

Another idea on this topic:

If you assume the host machine is typically loaded 50% you are likely to be more correct on average given that cloud vendors operate typically in this region.

But if you do not assume and rather give the whole spectrum, as this model is currently doing, then you are also incentvized more to use less, and on average you will likely see the same as when put as in the 50% case, because sometimes you are actually on a low machine and sometimes you are not.

But the penality for doing high CPU is bigger.

In general the question arisis if you want more reproducible results from this model (which is better for benchmarking and quantifiying your own improvements to the code) or more results that are actually closer to what your code would actually consume in the cloud.

Both cases are valid and maybe both should be an option .... however I believe that this distinction is quite complex to understand for beginners and it might be better to be opionionated ...?

0 replies

ArneTR · 2024-06-07T08:04:19Z

ArneTR
Jun 7, 2024
Maintainer Author

Also the Kepler project has published some of their approaches to understand how to split an absolute full-machine power signal to the VMs/tenants of the system.

Here is the explanation in a blog article: cncf/tag-env-sustainability#435

And here a paper released to find out the VM's power without knowing all other tenants energy data individually: https://www.computer.org/csdl/proceedings-article/mascots/2023/10387542/1TKR5vzaWXe

Although the paper provides an approach to get the info needed that we are discussing here it only works if performance counters are accesible which is not the case in cloud environments.
Still helpful if we want to bring Cloud Energy to bare metal systems

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Estimates on VMs] - Improving vHost-Ratio splitting #5

{{title}}

Replies: 3 comments

{{title}}

{{title}}

{{title}}

Select a reply

[Estimates on VMs] - Improving vHost-Ratio splitting #5

ArneTR Jul 1, 2023 Maintainer

Replies: 3 comments

ArneTR Jul 1, 2023 Maintainer Author

ArneTR Jul 24, 2023 Maintainer Author

ArneTR Jun 7, 2024 Maintainer Author

ArneTR
Jul 1, 2023
Maintainer

ArneTR
Jul 1, 2023
Maintainer Author

ArneTR
Jul 24, 2023
Maintainer Author

ArneTR
Jun 7, 2024
Maintainer Author