Replies: 5 comments
-
@ArneTR Welcome into the discussion! A quick tour on estimators. There are two of them: ratio based and model based, i.e. linear regression or external models that run in the sidecar container. The decision of which estimator to use at runtime can be found in this line. You are spot on the ratio based approach: the energy is attributed based on their metrics. CPU instruction is the default metric but it is tunable in config files. For instance, one can use cpu time as the metric to split, that'll be the scaphandre model. But I am not convinced that cpu time is the right metric. As in Equation 1 in this research paper (which I also mentioned in the OSS NA talk) , cpu time only explains part of the energy usage, the nature of the cpu instructions accounts for the delta between different types of workloads when their cpu time are the same. Recommendations, ideas, feedback are always welcome! |
Beta Was this translation helpful? Give feedback.
-
Hi @ArneTR, thank you for your message and interest in the project.
The CPU's dynamic energy consumption is mainly determined by the operations it performs at a given voltage and temperature. The workload can cause the CPU to execute fewer or more instructions in a cycle (or time interval). Therefore, the number of instructions executed gives a more accurate measure of CPU utilization, although it is not the most precise metric because each instruction may consist of a different number of micro-operations. However, the number of CPU instructions is often more accessible than the number of micro-operations since not all CPU architectures expose the hardware counters for micro-operations.
We're currently enhancing the power model, which involves taking into account other metrics beyond just CPU instructions, such as cache utilization. However, simply using a ratio-based approach to combine these metrics is not enough, as we need to determine which metrics consume more energy (and might vary for different architectures). Therefore, we're exploring a regression approach that assigns weights to each metric, allowing us to combine them effectively. This is a work in progress and will be available soon. |
Beta Was this translation helpful? Give feedback.
-
Thanks for the detailed answers. I do fully agree that using CPU-Time is a problematic proxy for splitting the energy, because the value has limited absolute meaning nowadays with features like DVFS and also that waiting time in linux is often mal-attributed due to interconnect congestions. However I think that using CPU-Instructions alone may be also problematic as it does not account for HALT-Instructions in Spinlocks or similar. The paper you linked seems a bit old to me ... < 2005. This means that Intel Turbo Boost was not even present in CPUs and many other features also. So an instruction becomes more costly of the frequency is high. The most comprehensive and semi-up-to-date work I have seen so far are the papers from Norbert Schmitt and Vincent Weaver (https://ieeexplore.ieee.org/author/37086242658 / https://ieeexplore.ieee.org/author/37400530100) A good detailed dive is however in a doctoral thesis from the Würzburg university: https://opus.bibliothek.uni-wuerzburg.de/files/17847/vonKistowski_Joakim_Dissertation.pdf He is not specifically splitting the energy for the CPU component as such, but the model proposed for the total energy suprisingly results in less PMUs giving a better result (Table 15.9) We ourselves are currently investigating making multiple models in parallel (CPU-Time and CPU-Instructions-Retired). |
Beta Was this translation helpful? Give feedback.
-
@ArneTR All good discussions! Kepler loves models :D The CPU time/instruction/frequency/architecture are all available to build a ML model (linear or not) through the model server and consume the model through kepler estimator. It is an open architecture for researchers too. |
Beta Was this translation helpful? Give feedback.
-
Hey @marceloamaral just wanted to follow up. Are the papers you mentioned at the time in this discussion that you are writing now somewhere public to share? Would be very interested! |
Beta Was this translation helpful? Give feedback.
-
Hey Kepler Team, hey Kepler community,
first off: thank you all for the great work on this tool. I am pretty stoked that such a cutting-edge tool for power measurement in Kubernetes exists.
I am currently trying to understand the mechanism how Kepler splits the energy out to attribute it to the processes.
I found the ratio.go file ,which does the splitting in line 87 (
kepler/pkg/model/estimator/local/ratio.go
Line 87 in 2af9a20
To my understanding you are splitting by only one value per metric. For CPU this is "config.CoreUsageMetric", which is defined in the config.go as CPUInstruction. (
kepler/pkg/config/config.go
Line 74 in 8cca218
Is that correct?
In a talk (https://www.youtube.com/live/xzfTU_Wa7rU?feature=share&t=570) @rootfs mentioned that your decisions for accounting and modeling are based on research papers. I found no links to the papers though. Can you share somewhere which papers your work is based on and why you use this splitting technique exactly?
Why for instance is it not CPU-Cycles, or CPU-Time?
Another popular open-source tool, Scaphandre, https://github.com/hubblo-org/scaphandre for instance uses splitting by time.
Beta Was this translation helpful? Give feedback.
All reactions