Pluggable metrics collection framework #1154

jiere · 2024-01-02T08:21:11Z

jiere
Jan 2, 2024
Collaborator

In current Kepler architecture, power related metrics are collected through either Golang method call(i.e. RAPL/MSR/hwmon, etc) or REST API call (i.e. Redfish), and the container level energy consumption estimation depends on eBPF, eBPF is always enabled globally.

Due to various of workloads characteristics, some are real-time sensitive(i.e. 5G workloads), some are low-latency specific(i.e. edge computing network workloads), some are no direct power meter related(i.e. some accelerators or network workloads), etc. How could Kepler support specific metrics collection for such workloads?

One possible solution behavior is as Telegraf does: let workload owners/ISVs provide their metrics plugins, framework itself supports various of input plugins and give configurable processing of those metrics and output to metrics consumers such as Prometheus.

Such pluggable framework may need some architectural design change in Kepler.

When such pluggable framework available, workload owners/ISVs may also have their own power modeling methodologies based on their metrics collected by Kepler, and we need to figure out more flexible and scalable way to deal with model train stuffs then.

marceloamaral · 2024-01-15T21:03:16Z

marceloamaral
Jan 15, 2024
Maintainer

The challenge in collecting metrics from different modules lies in synchronization. Even though the metrics are exported to Prometheus, Prometheus establishes timestamps based on its internal time during metric collection, and these timestamps are not linked to the time on the node when the metric was collected.

Is there any issue with collecting eBPF metrics?
When it comes to accelerators or netFPGAs, can we gather resource utilization and power consumption metrics using an API, as provided by NVIDIA for GPUs?

0 replies

jiere · 2024-01-16T02:59:40Z

jiere
Jan 16, 2024
Collaborator Author

Thanks for your comments, @marceloamaral . Let me give more details and background information for this idea.
eBPF metrics collection itself is good, even though it may bring overhead, we community keeps making improvement on it, we can move on.
The synchronization issue you mentioned really exists, but that might be another topic not included in my original initiative to raise this thread.

The initiative or requirement for this thread is for those real time or low-latency workloads. i.e Some 5G RAN workloads, some DPDK/VPP/SPDK workloads, etc. In those workloads, they cannot afford the overhead of eBPF, and they may have no needs to calculate resource utilization based on eBPF.

The co-existence of RT and non-RT workloads in the same cluster may not be valid use case, but if Kepler could provide configurable interface to integrate those RT workloads' metrics, that will be good for those workloads deployment together with Kepler and Openshift.
When non-RT workloads running in the cluster, Kepler just works as currently, while if there are only RT workloads running, Kepler can be switched in some way to adapt the workloads by configuration.

There have been other threads for similar discussion in our community, one recent case is the ticket raised last week.

The co-deployment of Telegraf and Kepler can be an option, but the co-existence compatibility risk can not be ignored and needs carefully validation and detailed documentation.
If Kepler is more open and pluggable on metrics collection, those workloads' power telemetry data may not depends on other frameworks which bring extra deployment and maintain efforts.

Due to metrics collecting methodology, currently we Kepler supports both Golang routines call(such as RAPL) and RESTful API call(such as Redfish), I agree with you that API call is flexible and extensible way for Kepler, but it requires the code change on workloads' side, sometimes it is not acceptable on customers point of view since it can be regarded as somehow intrusive design. Redfish is an ideal example which depends on DMTF's efforts to implement and standardize those REST APIs for IPMI raw metrics.

Still take Telegraf as example here, I think the input plugin design idea of it is beneficial to Kepler, the metrics source can be Linux SYSFS, Unix Domain Socket, output file of specific libraries/tools run, etc. Kepler framework code may not need to care about those collection logic and offload such logic to the plugin itself, Kepler focuses on the input metrics' processing/Aggregation/Output. All of these logics can be configurable. Here are some example configurations.

Still take some low-latency workloads as example, now we at lease find two major issues there:

To guarantee low latency, there are usually "isolated cpu core" design there which stop Linux OS putting interrupts onto those cores and scheduling other processes to the isolated cores. eBPF methodology may have issue for such cases.
Some low latency polling cores will not produce meaningful OS measurements for utilization when running polling workloads. i.e. in some DPDK workloads, the polling process' cpu utilization is always 100%, even though it is idle actually.

As a work around – Kepler can gather server/CPU/Package level metrics and estimate the power for the isolated cores = Total CPU Power – Power estimate for non-isolated cores, the latter one depends on specific plugins to retrieve.

0 replies

marceloamaral · 2024-01-16T14:32:47Z

marceloamaral
Jan 16, 2024
Maintainer

@jiere I agree that we can improve the kepler architecture. In fact, @rootfs initiated a discussion to modify the Kepler architecture at #1088. The main idea is to separate a daemon to collect metrics and another one to estimate the power consumption.

@jiere, could you please provide an example of the metrics you are considering? We need metrics related to the system that are applicable to all applications. Metrics specific to a particular workload, which don't correlate with other applications, may not be suitable for estimating energy consumption.
Example of system metric is the CPU time, which can come from eBPF, cGroup, etc.

I agree with you that API call is flexible and extensible way for Kepler, but it requires the code change on workloads' side

Which code changes in the workload?

While currently, we use individual files and environment variables for configurations (to disable metrics and configure redfish), I agree that consolidating all configurations into one file, as you pointed in the Telegraf file, could be more efficient.

DPDK workloads, the polling process' cpu utilization is always 100%, even though it is idle actually.

Although DPDK runtime is using 100% of a CPU core, I do not expect the workload itself use 100% in idle mode.
So, only the background process (i.e, the DPDK runtime) will be always consuming energy.

the metrics source can be Linux SYSFS, Unix Domain Socket

As far as I understood, Telegraf seems to be collecting cgroup metrics.
So, we can make it optional to collect either cgroup or eBPF metrics, focusing on obtaining only the process CPU time.
Btw, I'm not 100% sure, but I believe CPU time is accounted for isolated CPUs in the cgroup status.

We decided to remove cGroup metric collection from the default deployment due to the high overhead. Therefore, we may need to conduct a study to determine which overhead is higher in RT Kernels: collecting cGroup metrics or eBPF metrics.
Specially with the eBPF lower sampling rate.

Note that, currently Kepler cannot disable the eBPF metrics, it will require a big code change. Kepler finds new processes via the eBPF code, we might need to figure out another logic to discover a new process running in the system to be able to disable the eBPF metric collection.

1 reply

jiere Jan 16, 2024
Collaborator Author

could you please provide an example of the metrics you are considering

Intel RDT metrics which is using pqos command call in Telegraf.

Intel DPDK metrics which is using either ethdev command call or UDS access.

Metrics specific to a particular workload, which don't correlate with other applications, may not be suitable for estimating energy consumption.

AFAIK, the background of #1166 is that particular workloads have their specific energy metrics and specific energy consumption estimation models, how could them run together with Kepler and Prometheus in the same cluster?

I agree with you that API call is flexible and extensible way for Kepler, but it requires the code change on workloads' side

Which code changes in the workload?

Who will provide the API? For example, if metrics are generated through command call, no API concept, we need coding to call the command, right? Or let the metrics generated through REST API handling and encapsulate the command call inside the handler

As far as I understood, Telegraf seems to be collecting cgroup metrics.
So, we can make it optional to collect either cgroup or eBPF metrics, focusing on obtaining only the process CPU time.
Btw, I'm not 100% sure, but I believe CPU time is accounted for isolated CPUs in the cgroup status.

Telegraf is more generic than Kepler, not only deals with energy related metrics, here we just take consideration of those energy consumption interest workloads, CPU time might not be regarded as suitable metrics for power modeling in the workloads owners' point of view, they might have their candidate metrics, just as the RDT/DPDK I mentioned above.

Kepler finds new processes via the eBPF code, we might need to figure out another logic to discover a new process running in the system to be able to disable the eBPF metric collection.

Agree, I will talk to customers/stakeholders to check more details on this, AFAIK, they might have their logic to distinguish the process w/o eBPF's help.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pluggable metrics collection framework #1154

{{title}}

Replies: 3 comments 1 reply

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Pluggable metrics collection framework #1154

jiere Jan 2, 2024 Collaborator

Replies: 3 comments · 1 reply

marceloamaral Jan 15, 2024 Maintainer

jiere Jan 16, 2024 Collaborator Author

marceloamaral Jan 16, 2024 Maintainer

jiere Jan 16, 2024 Collaborator Author

jiere
Jan 2, 2024
Collaborator

Replies: 3 comments 1 reply

marceloamaral
Jan 15, 2024
Maintainer

jiere
Jan 16, 2024
Collaborator Author

marceloamaral
Jan 16, 2024
Maintainer

jiere Jan 16, 2024
Collaborator Author