Replies: 3 comments
-
Added to this: Here some actual measurements from a 48-core machine we have:
As we can see the power draw in idle is greatly reduced and strongly non linear. However, this is not what we wanted to look at. If the model estimation would be spot on for the bare metal machine, but we would now virtualize it and assign ourselves one core than we would guess 214.62 W and then divide by 48, which equals to 4.47 W if we use the method proposed here by setting an operating point ( |
Beta Was this translation helpful? Give feedback.
-
Another idea on this topic: If you assume the host machine is typically loaded 50% you are likely to be more correct on average given that cloud vendors operate typically in this region. But if you do not assume and rather give the whole spectrum, as this model is currently doing, then you are also incentvized more to use less, and on average you will likely see the same as when put as in the 50% case, because sometimes you are actually on a low machine and sometimes you are not. But the penality for doing high CPU is bigger. In general the question arisis if you want more reproducible results from this model (which is better for benchmarking and quantifiying your own improvements to the code) or more results that are actually closer to what your code would actually consume in the cloud. Both cases are valid and maybe both should be an option .... however I believe that this distinction is quite complex to understand for beginners and it might be better to be opionionated ...? |
Beta Was this translation helpful? Give feedback.
-
Also the Kepler project has published some of their approaches to understand how to split an absolute full-machine power signal to the VMs/tenants of the system. Here is the explanation in a blog article: cncf/tag-env-sustainability#435 And here a paper released to find out the VM's power without knowing all other tenants energy data individually: https://www.computer.org/csdl/proceedings-article/mascots/2023/10387542/1TKR5vzaWXe Although the paper provides an approach to get the info needed that we are discussing here it only works if performance counters are accesible which is not the case in cloud environments. |
Beta Was this translation helpful? Give feedback.
-
At the moment the model uses the
vhost-ratio
parameter to split the energy in a virtualized system.An example:
vhost-ratio
parameter of1/40
This is the mechanism we account for at the moment when the model is in a virtualized system.
However, this is most likely not the case. The machine as a whole is most likely rather on a 20-70% utilization, as it is normal with cloud vendors.
The picture from VM ware shows non-hyperscaler datacenters. Hyperscalers report however a higher utilization.
I propose setting a new variable that is the "bare-metal-utilization" and then rather using the
vhost-ratio
as a factor to shift that a little.An example:
bare-metal-utilization
to0.5
, which means 50%The downside with that approach is that it is even one more extra assumption. However assuming that no one else on the machine, as we did before, is most likely wrong.
On a very high-core machine the resulting values will then change only in very small quantities, which is probably more closer to reality, but also incentivises users less to reduce CPU consumption as the effects are smaller.
This is an idea and I would like to discuss it. Especially if there are logical errors in it ...
Beta Was this translation helpful? Give feedback.
All reactions