Sources / Design decisions for energy splitting #548

ArneTR · 2023-02-25T20:36:06Z

ArneTR
Feb 25, 2023

Hey Kepler Team, hey Kepler community,

first off: thank you all for the great work on this tool. I am pretty stoked that such a cutting-edge tool for power measurement in Kubernetes exists.

I am currently trying to understand the mechanism how Kepler splits the energy out to attribute it to the processes.

I found the ratio.go file ,which does the splitting in line 87 (

kepler/pkg/model/estimator/local/ratio.go

Line 87 in 2af9a20

    
           containerPkgEnergy := getEnergyRatio(containerResUsage, nodeTotalResUsage, pkgDynPower, containerNumber)

)

To my understanding you are splitting by only one value per metric. For CPU this is "config.CoreUsageMetric", which is defined in the config.go as CPUInstruction. (

kepler/pkg/config/config.go

Line 74 in 8cca218

CoreUsageMetric = getConfig("CORE_USAGE_METRIC", CPUInstruction)

)

Is that correct?

In a talk (https://www.youtube.com/live/xzfTU_Wa7rU?feature=share&t=570) @rootfs mentioned that your decisions for accounting and modeling are based on research papers. I found no links to the papers though. Can you share somewhere which papers your work is based on and why you use this splitting technique exactly?
Why for instance is it not CPU-Cycles, or CPU-Time?

Another popular open-source tool, Scaphandre, https://github.com/hubblo-org/scaphandre for instance uses splitting by time.

rootfs · 2023-02-25T23:53:18Z

rootfs
Feb 25, 2023
Maintainer

@ArneTR Welcome into the discussion!

A quick tour on estimators. There are two of them: ratio based and model based, i.e. linear regression or external models that run in the sidecar container.

The decision of which estimator to use at runtime can be found in this line.

You are spot on the ratio based approach: the energy is attributed based on their metrics. CPU instruction is the default metric but it is tunable in config files. For instance, one can use cpu time as the metric to split, that'll be the scaphandre model. But I am not convinced that cpu time is the right metric. As in Equation 1 in this research paper (which I also mentioned in the OSS NA talk) , cpu time only explains part of the energy usage, the nature of the cpu instructions accounts for the delta between different types of workloads when their cpu time are the same.

Recommendations, ideas, feedback are always welcome!

0 replies

marceloamaral · 2023-02-27T02:38:01Z

marceloamaral
Feb 27, 2023
Maintainer

Hi @ArneTR, thank you for your message and interest in the project.
Regarding the papers, we are currently writing some paper with architecture and power model details and validation. We will be happy to share when it is ready and become public available.

Why CPU instruction instead of Cycles or time.

The CPU's dynamic energy consumption is mainly determined by the operations it performs at a given voltage and temperature. The workload can cause the CPU to execute fewer or more instructions in a cycle (or time interval). Therefore, the number of instructions executed gives a more accurate measure of CPU utilization, although it is not the most precise metric because each instruction may consist of a different number of micro-operations. However, the number of CPU instructions is often more accessible than the number of micro-operations since not all CPU architectures expose the hardware counters for micro-operations.

There are two power model: ratio based and model based

We're currently enhancing the power model, which involves taking into account other metrics beyond just CPU instructions, such as cache utilization. However, simply using a ratio-based approach to combine these metrics is not enough, as we need to determine which metrics consume more energy (and might vary for different architectures). Therefore, we're exploring a regression approach that assigns weights to each metric, allowing us to combine them effectively. This is a work in progress and will be available soon.

0 replies

ArneTR · 2023-02-27T11:49:56Z

ArneTR
Feb 27, 2023
Author

Thanks for the detailed answers.

I do fully agree that using CPU-Time is a problematic proxy for splitting the energy, because the value has limited absolute meaning nowadays with features like DVFS and also that waiting time in linux is often mal-attributed due to interconnect congestions.

However I think that using CPU-Instructions alone may be also problematic as it does not account for HALT-Instructions in Spinlocks or similar.

The paper you linked seems a bit old to me ... < 2005. This means that Intel Turbo Boost was not even present in CPUs and many other features also. So an instruction becomes more costly of the frequency is high.

The most comprehensive and semi-up-to-date work I have seen so far are the papers from Norbert Schmitt and Vincent Weaver (https://ieeexplore.ieee.org/author/37086242658 / https://ieeexplore.ieee.org/author/37400530100)
As usual in academics no working code is supplied with the papers.

A good detailed dive is however in a doctoral thesis from the Würzburg university: https://opus.bibliothek.uni-wuerzburg.de/files/17847/vonKistowski_Joakim_Dissertation.pdf

He is not specifically splitting the energy for the CPU component as such, but the model proposed for the total energy suprisingly results in less PMUs giving a better result (Table 15.9)

We ourselves are currently investigating making multiple models in parallel (CPU-Time and CPU-Instructions-Retired).

0 replies

rootfs · 2023-02-27T14:00:31Z

rootfs
Feb 27, 2023
Maintainer

@ArneTR All good discussions!

Kepler loves models :D The CPU time/instruction/frequency/architecture are all available to build a ML model (linear or not) through the model server and consume the model through kepler estimator. It is an open architecture for researchers too.

0 replies

ArneTR · 2024-02-08T06:29:03Z

ArneTR
Feb 8, 2024
Author

Hey @marceloamaral

just wanted to follow up. Are the papers you mentioned at the time in this discussion that you are writing now somewhere public to share? Would be very interested!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sources / Design decisions for energy splitting #548

{{title}}

Replies: 5 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Sources / Design decisions for energy splitting #548

ArneTR Feb 25, 2023

Replies: 5 comments

rootfs Feb 25, 2023 Maintainer

marceloamaral Feb 27, 2023 Maintainer

ArneTR Feb 27, 2023 Author

rootfs Feb 27, 2023 Maintainer

ArneTR Feb 8, 2024 Author

ArneTR
Feb 25, 2023

rootfs
Feb 25, 2023
Maintainer

marceloamaral
Feb 27, 2023
Maintainer

ArneTR
Feb 27, 2023
Author

rootfs
Feb 27, 2023
Maintainer

ArneTR
Feb 8, 2024
Author