Providing Caliper Metrics to Prometheus #1353

davidkel · 2022-05-24T10:43:58Z

This doesn't cover the caliper capability that can extract prometheus data into a report as defined by the benchmark configuration file, however it may true to include caliper data sent to prometheus

There are 2 ways to get caliper data to prometheus and both are configured through a benchmark file. There is a scrape method which requires the following info

metricPath: override for the metrics path to be scraped (default /metrics).
scrapePort: override for the port to be used when configuring the scrape sever (default 3000).
processMetricCollectInterval: time interval for default metrics collection, enabled when present
defaultLabels: object of key:value pairs to augment the default labels applied to the exposed metrics during collection.
histogramBuckets: override for the histogram to be used for collection of caliper_tx_e2e_latency

as well as a push method which requires a prometheus push gateway server and the following config

pushInterval: push interval in milliseconds
pushUrl: URL for Prometheus Push Gateway
processMetricCollectInterval: time interval for default metrics collection, enabled when present
defaultLabels: object of key:value pairs to augment the default labels applied to the exposed metrics during collection.
histogramBuckets: override for the histogram to be used for collection of caliper_tx_e2e_latency

These are at the worker level which causes a some issues

If we use the scrape method and we have 10 workers say then if the workers are launched via a forked process then each worker listens on it's own unique port based on the scrapePort in the config (also 3000 is not a great default port). That means we have to configure prometheus to scape from 10 sources, making changing the number of workers (or running a different benchmark with a different worker count) arduous as it requires you to change the prometheus configuration However it may be possible to prometheus service discovery to help (it supports Azure VMs, EC2 Instances and docker for example, but nothing I can see yet for general VMs)
If we use non forked workers then there is a problem, if we have multiple workers on a VM how do we ensure ports don't clash ? you could have a unique benchmark file for the worker specifying a different scrape port, you could run each worker in a docker container and remap the same port to expose a diferrent port as an example but on the whole it's a bit horrible

It makes more sense to have a single scrape port being made available from the manager process and it would be good to expose the individual worker stats as well as the combined stats as viewed and output by the default manager observer and would basically remove the push gateway and make caliper manager effectively take the role of the push gateway

This would also be a great way to graph caliper's take on how it is loading the SUT, the question is should the be part of the benchmark file configuration ? I don't think so personally but I think it's currently there for convenience for the scrape and push methods now whereas really they are worker configuration details.

My proposal would be to

keep the push gateway mechanism for a worker
introduce a scrape mechanism at the manager
Find a recipe that can make scraping directly from workers a viable option in multiple environments eg Native VMs, K8s etc

The text was updated successfully, but these errors were encountered:

davidkel · 2022-06-28T20:02:12Z

When we move the scrape mechanism from worker to manager we will lose the ability to scrape system metrics for individual workers which the prometheus client provides as it exposes them directly via prometheus. We would want to still capture the same information and forward this worker information back to the manager to collate and scrape so maybe #1043 can help with capturing those metrics so we can forward them back

davidkel added enhancement New feature or request component/core Related to the core code-base labels May 24, 2022

davidkel mentioned this issue May 31, 2022

Add support for capturing caliper metrics into prometheus and a separate grafana dashboard davidkel/provision-performance#4

Open

This was referenced Aug 6, 2022

Add Prometheus scrape target server to Caliper manager #1434

Closed

Collect Prometheus metrics in manager using messages #1438

Closed

CaptainIRS mentioned this issue Sep 21, 2022

Add TxObserver for Prometheus manager #1448

Merged

davidkel added the epic label Oct 7, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Providing Caliper Metrics to Prometheus #1353

Providing Caliper Metrics to Prometheus #1353

davidkel commented May 24, 2022 •

edited

Loading

davidkel commented Jun 28, 2022

Providing Caliper Metrics to Prometheus #1353

Providing Caliper Metrics to Prometheus #1353

Comments

davidkel commented May 24, 2022 • edited Loading

davidkel commented Jun 28, 2022

davidkel commented May 24, 2022 •

edited

Loading