Caliper report does not show all Txns submitted when using Prometheus monitor #826

MarcoMazzoni · 2020-05-06T08:47:07Z

Caliper report won't print out ALL the transactions submitted by the Workers when using Prometheus monitor in a distributed deployment: this issue has been experienced when testing a permissioned Ethereum network, but it can be an issue with every SUT.

Context

The reason why the Caliper report won't print out ALL the transactions submitted by the Workers when using Prometheus is the following:

When the CaliperLocalClient (Worker) executes doTest(), at the beginning of the function it sets the txUpdateTime, which can be configured in caliper.yaml (otherwise the default is 1000ms)
The txUpdateTime is the frequency the CaliperLocalClient calls the txUpdate() function, which will push transactions statistics on the Prometheus push gateway
Inside doTest(), the Client will call either runFixedNumber() or runDuration(): these functions execute the workloads (submit txns) and when all the Promises resolve, they set the endTime variable
When these functions completed, some clean up is performed inside doTest() and clearUpdateInter(txUpdateInter) executes the last txUpdate() call
The endTime variable is sent along with the results to the Master, where it is returned to the RoundOrchestrator
The RoundOrchestrator now wants to build the report, and calls await this.report.processPrometheusTPSResults({start, end}, roundConfig, index), where end is the endTime recorded above: through this function, queries are issued to Prometheus in order to retrieve the Caliper txn metrics at time = endTime
The problem is that endTime is an instant in time PRIOR to the last txUpdate() performed by the Worker, thus it's gonna retrieve the metrics about all transactions except the last bunch of them, that have been pushed by the CaliperLocalClient in the last txUpdate().

Now, the higher is the update frequency from worker to Prometheus (set by the user through txupdatetime in caliper.yaml), the lower will be the probability of running into this issue.
But in the majority of cases this "inconsistency" can happen.

Example of Faulty Execution and Expected Behavior

To make a concrete and trivial example, let's assume a Worker has to submit 20 txns at a fixed rate of 5 txns per second, and let's assume a txUpdateTime (frequency of pushes to Prometheus) of 3 sec:

START-TIME
[0 sec] --> push_to_Prometheus(Submitted: -, Succ: -, Failed: -)
[1 sec] --> 5 txn are submitted
[2 sec] --> 5 txn are submitted
[3 sec] --> 5 txn are submitted && push_to_Prometheus(Submitted: 15, Succ: 15, Failed: 0)
[4 sec] --> 5 txn are submitted
END-TIME
[5-sec] --> clearUpdateInterval() and make last push_to_Prometheus(Submitted: 20, Succ: 20, Failed: 0)

When all Workers are done with this process, the Master is going to choose the earliest endTime and the latest startTime among them, but this does not affect the occurrency of the issue.
As you can see, when the Master is gonna query Prometheus to retrieve metrics at END-TIME, it is going to receive just 15 txns submitted, missing the last batch of 5 txns.

The Expected Behavior is that 20 submitted txns are returned in the Caliper report.

Possible Fix

This can be solved by trivially tell the RoundOrchestrator to make a query at endTime + txUpdateTime: in this way, we can be sure that at this point in time the last push to Prometheus has been performed, and we have all the 20 txns in the report. More in detail the fix can be implemented as follows:

Change the round-orchestrator.js by:
1a. Adding in the constructor the txUpdateTime property, by reading the corresponding user parameter in caliper.yaml, like this: this.txUpdateTime = Config.get(Config.keys.TxUpdateTime, 1000);
1b. Setting the new endTime for issuing queries, before calling await this.report.processPrometheusTPSResults() (lines 191-198) , like this:

    // Build the report
    // - TPS
     let idx;
     if (this.monitorOrchestrator.hasMonitor('prometheus')) {
           let query = {start: start, end: end + this.txUpdateTime};
           idx = await this.report.processPrometheusTPSResults(query, roundConfig, index);
      } else {
           idx = await this.report.processLocalTPSResults(results, roundConfig);
      }

Change the finally block of doTest(test) method in caliper-local-client.js not to make a transaction reset too early (just comment out the this.txReset()). The reasons to do this are:

If we call txReset() right after the test finished, we push to Prometheus 'caliper_txn_success' = 0, 'caliper_txn_failure' = 0 and 'caliper_txn_pending' = 0, therefore every query to the monitor after this moment is returning 0 values: not good because the query time can be after the txReset time.
The txReset() is already called in beforeTest(msg), thus calling it again after the test is unnecessary, because every beforeTest() is performed prior to each corresponding doTest(), therefore the txns metrics are clean when the doTest() is performed.

Steps to Reproduce

Deploy a benchmark using Docker with the following services:

mqtt broker
caliper master
2 (or more) caliper workers
prom/prometheus
prom/pushgateway
prom/node-exporter
google/cadvisor

Run a simple benchmark against your SUT by setting the benchconfig.yaml to execute a round with a txNumber of your choice and fixed rate control. Set the observer interval and monitor interval as you want.
Set the txupdatetime in caliper.yaml to be greater than the monitor interval. Anyways you are free to try whatever value.

Your Environment

Node v10.13.0
npm 6.4.1
OS: Ubuntu 18.04

The text was updated successfully, but these errors were encountered:

aklenik added the enhancement New feature or request label May 6, 2020

Gnnng mentioned this issue Jun 30, 2020

Force use local report when prometheus monitor is used advanca/caliper#1

Merged

nklincoln mentioned this issue Aug 3, 2020

Add txObservers and make use of TXN Stats Collector object #943

Merged

nklincoln closed this as completed in #943 Aug 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Caliper report does not show all Txns submitted when using Prometheus monitor #826

Caliper report does not show all Txns submitted when using Prometheus monitor #826

MarcoMazzoni commented May 6, 2020

Caliper report does not show all Txns submitted when using Prometheus monitor #826

Caliper report does not show all Txns submitted when using Prometheus monitor #826

Comments

MarcoMazzoni commented May 6, 2020

Context

Example of Faulty Execution and Expected Behavior

Possible Fix

Steps to Reproduce

Your Environment