Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Caliper report does not show all Txns submitted when using Prometheus monitor #826

Closed
MarcoMazzoni opened this issue May 6, 2020 · 0 comments · Fixed by #943
Closed
Labels
enhancement New feature or request

Comments

@MarcoMazzoni
Copy link

Caliper report won't print out ALL the transactions submitted by the Workers when using Prometheus monitor in a distributed deployment: this issue has been experienced when testing a permissioned Ethereum network, but it can be an issue with every SUT.

Context

The reason why the Caliper report won't print out ALL the transactions submitted by the Workers when using Prometheus is the following:

  1. When the CaliperLocalClient (Worker) executes doTest(), at the beginning of the function it sets the txUpdateTime, which can be configured in caliper.yaml (otherwise the default is 1000ms)
  2. The txUpdateTime is the frequency the CaliperLocalClient calls the txUpdate() function, which will push transactions statistics on the Prometheus push gateway
  3. Inside doTest(), the Client will call either runFixedNumber() or runDuration(): these functions execute the workloads (submit txns) and when all the Promises resolve, they set the endTime variable
  4. When these functions completed, some clean up is performed inside doTest() and clearUpdateInter(txUpdateInter) executes the last txUpdate() call
  5. The endTime variable is sent along with the results to the Master, where it is returned to the RoundOrchestrator
  6. The RoundOrchestrator now wants to build the report, and calls await this.report.processPrometheusTPSResults({start, end}, roundConfig, index), where end is the endTime recorded above: through this function, queries are issued to Prometheus in order to retrieve the Caliper txn metrics at time = endTime
  7. The problem is that endTime is an instant in time PRIOR to the last txUpdate() performed by the Worker, thus it's gonna retrieve the metrics about all transactions except the last bunch of them, that have been pushed by the CaliperLocalClient in the last txUpdate().

Now, the higher is the update frequency from worker to Prometheus (set by the user through txupdatetime in caliper.yaml), the lower will be the probability of running into this issue.
But in the majority of cases this "inconsistency" can happen.

Example of Faulty Execution and Expected Behavior

To make a concrete and trivial example, let's assume a Worker has to submit 20 txns at a fixed rate of 5 txns per second, and let's assume a txUpdateTime (frequency of pushes to Prometheus) of 3 sec:

START-TIME
[0 sec] --> push_to_Prometheus(Submitted: -, Succ: -, Failed: -)
[1 sec] --> 5 txn are submitted
[2 sec] --> 5 txn are submitted
[3 sec] --> 5 txn are submitted && push_to_Prometheus(Submitted: 15, Succ: 15, Failed: 0)
[4 sec] --> 5 txn are submitted
END-TIME
[5-sec] --> clearUpdateInterval() and make last push_to_Prometheus(Submitted: 20, Succ: 20, Failed: 0)

When all Workers are done with this process, the Master is going to choose the earliest endTime and the latest startTime among them, but this does not affect the occurrency of the issue.
As you can see, when the Master is gonna query Prometheus to retrieve metrics at END-TIME, it is going to receive just 15 txns submitted, missing the last batch of 5 txns.

The Expected Behavior is that 20 submitted txns are returned in the Caliper report.

Possible Fix

This can be solved by trivially tell the RoundOrchestrator to make a query at endTime + txUpdateTime: in this way, we can be sure that at this point in time the last push to Prometheus has been performed, and we have all the 20 txns in the report. More in detail the fix can be implemented as follows:

  1. Change the round-orchestrator.js by:
    1a. Adding in the constructor the txUpdateTime property, by reading the corresponding user parameter in caliper.yaml, like this: this.txUpdateTime = Config.get(Config.keys.TxUpdateTime, 1000);
    1b. Setting the new endTime for issuing queries, before calling await this.report.processPrometheusTPSResults() (lines 191-198) , like this:
    // Build the report
    // - TPS
     let idx;
     if (this.monitorOrchestrator.hasMonitor('prometheus')) {
           let query = {start: start, end: end + this.txUpdateTime};
           idx = await this.report.processPrometheusTPSResults(query, roundConfig, index);
      } else {
           idx = await this.report.processLocalTPSResults(results, roundConfig);
      }
  1. Change the finally block of doTest(test) method in caliper-local-client.js not to make a transaction reset too early (just comment out the this.txReset()). The reasons to do this are:
  • If we call txReset() right after the test finished, we push to Prometheus 'caliper_txn_success' = 0, 'caliper_txn_failure' = 0 and 'caliper_txn_pending' = 0, therefore every query to the monitor after this moment is returning 0 values: not good because the query time can be after the txReset time.
  • The txReset() is already called in beforeTest(msg), thus calling it again after the test is unnecessary, because every beforeTest() is performed prior to each corresponding doTest(), therefore the txns metrics are clean when the doTest() is performed.

Steps to Reproduce

  1. Deploy a benchmark using Docker with the following services:
  • mqtt broker
  • caliper master
  • 2 (or more) caliper workers
  • prom/prometheus
  • prom/pushgateway
  • prom/node-exporter
  • google/cadvisor
  1. Run a simple benchmark against your SUT by setting the benchconfig.yaml to execute a round with a txNumber of your choice and fixed rate control. Set the observer interval and monitor interval as you want.

  2. Set the txupdatetime in caliper.yaml to be greater than the monitor interval. Anyways you are free to try whatever value.

Your Environment

  • Node v10.13.0
  • npm 6.4.1
  • OS: Ubuntu 18.04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants