Measure energy consumption for each individual container / reduce overhead of request generators #562

davidkopp · 2023-11-27T20:58:17Z

davidkopp
Nov 27, 2023

GMT can capture the CPU & memory energy consumption for the entire system via RAPL. However, there is currently no way to get the energy consumption of individual containers. That means, that the resulting energy value may include the consumption of containers that are only needed for executing the test but are not relevant for production. For example in test scenarios with Puppeteer or any request/load generators, they are producing overhead that should ideally be filtered out.
In this measurement of the Wordpress Minimal Sample for example, Puppeteer needs up to 30 % of the CPU utilization for a short time during runtime. So I assume, there is a relevant energy consumption of Puppeteer that ideally should be filtered out. Or is it on purpose, that the consumption of Puppeteer (representing a user) is part of the standard usage scenario?

I assume it is intended that GMT doesn't include such a feature of calculating the energy consumption for individual containers based on the total energy consumption.
What is the reason behind it?

RAPL container MSR (draft in the documentation repository):

In order to split the energy usage to the container level we us IC splitting.
This is to be preferred over time splitting as outlined in this source for instance: https://www.brendangregg.com/blog/2017-05-09/cpu-utilization-is-wrong.html

Is there already something implemented with IC splitting or is this an idea for the future?
Does IC stand for "Instructions per Cycle?

In general, the question for me is how to do energy measurements with GMT without the overhead of a request generator. Is this possible or should the goal be more like reducing the energy consumption of the request generator? I'm currently testing with JMeter to execute a test plan that represents a standard usage scenario. I'm not sure yet, if JMeter could be too heavy for the use with GMT.

ArneTR · 2023-11-28T07:30:47Z

ArneTR
Nov 28, 2023
Maintainer

Hey David,

thanks for bringing this up. Actually this is a long running discussion in many of the current running repositories for energy tooling, like Scaphandre and also Kepler, which method to actually use and also if to use a method at all.

The philosophy of the GMT is: The usage scenario should contain all the components to reflect the actual use case of the software. This means if there is a client / trigger involved, the trigger should reflect the real case.
We often times use a Puppeteer container with a browser as a container, as this is the actual scenario a web app will be used in.

However I understand the wish to have an isolated view of the components and only add them up later on.

The problem involved in all of this is also the idle case, as it technically must be added to the cost, because without an underlying hardware, there is no working software. And you have a base offset through the idle time always.

I hope you see where this discussion is going. Should the JMeter / Puppeteer than not actually be a separate machine with separate idle energy that has also to be added up? Is it even fair to have everything on one machine as we do it currently?

Having now broadened the picture a bit I would like to put some more info on the technical implementation if one wants to go the route of splitting the energy on a box.

Here we have done a deep dive on the fact of energy splitting and how it is currently done in Scaphandre, vs. the Kepler approach: https://www.green-coding.berlin/case-studies/cpu-utilization-usefulness/

The gist is: Both approaches (by CPU % and by IPC) deliver very different results. Depending on the edge case and CPU configuration they can be extremely different.
The most sensible route seems IPC splitting for CPU energy. Page Faults for DRAM energy.

If you want to split the machine energy one has to also incorporate metrics for all the hardware buses, hard disk, GPU, PSU etc., which to my knowledge is at least in some parts an unsolved problem.

My opinion on this for the moment at least would be that integrating CPU and DRAM is probably more helpful than misleading and the split should be done on a direct energy metric for these components like RAPL.
Splitting the PSU energy is I guess more misleading and will lead you to a cloud-carbon-footprint accuracy where you get a number, but have no clue how good it really is.

Implementation wise this would be best done with eBPF, as the metrics are collectedt without much overhead depending on your granularity.

We personally will probably address this feature in 2024. But are very happy to take a PR or provide some help if you are keen on making an implementation on this.

Also I made this lengthy answer so that I hope more people will enter the discussion and also provide insights to how to address for instance something like the PSU energy.
Last but not least I am also very interested on your opinion on this and specifically if you think that having a philosophy about how people should measure and nudge them into the framework is a more sensible approach or just to provide everything that is technically possible even if it is not fully clear if the splitting factor is the best to choose.

0 replies

ArneTR · 2024-05-30T14:31:53Z

ArneTR
May 30, 2024
Maintainer

Some work is now done here: #795

And a kernel plugin that might provide this functionality natively on Linux is done here: https://github.com/orgs/green-kernel/discussions

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Measure energy consumption for each individual container / reduce overhead of request generators #562

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

Measure energy consumption for each individual container / reduce overhead of request generators #562

davidkopp Nov 27, 2023

Replies: 2 comments

ArneTR Nov 28, 2023 Maintainer

ArneTR May 30, 2024 Maintainer

davidkopp
Nov 27, 2023

ArneTR
Nov 28, 2023
Maintainer

ArneTR
May 30, 2024
Maintainer