You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Caliper report won't print out ALL the transactions submitted by the Workers when using Prometheus monitor in a distributed deployment: this issue has been experienced when testing a permissioned Ethereum network, but it can be an issue with every SUT.
Context
The reason why the Caliper report won't print out ALL the transactions submitted by the Workers when using Prometheus is the following:
When the CaliperLocalClient (Worker) executes doTest(), at the beginning of the function it sets the txUpdateTime, which can be configured in caliper.yaml (otherwise the default is 1000ms)
The txUpdateTime is the frequency the CaliperLocalClient calls the txUpdate() function, which will push transactions statistics on the Prometheus push gateway
Inside doTest(), the Client will call either runFixedNumber() or runDuration(): these functions execute the workloads (submit txns) and when all the Promises resolve, they set the endTime variable
When these functions completed, some clean up is performed inside doTest() and clearUpdateInter(txUpdateInter) executes the last txUpdate() call
The endTime variable is sent along with the results to the Master, where it is returned to the RoundOrchestrator
The RoundOrchestrator now wants to build the report, and calls await this.report.processPrometheusTPSResults({start, end}, roundConfig, index), where end is the endTime recorded above: through this function, queries are issued to Prometheus in order to retrieve the Caliper txn metrics at time = endTime
The problem is that endTime is an instant in time PRIOR to the lasttxUpdate() performed by the Worker, thus it's gonna retrieve the metrics about all transactions except the last bunch of them, that have been pushed by the CaliperLocalClient in the last txUpdate().
Now, the higher is the update frequency from worker to Prometheus (set by the user through txupdatetime in caliper.yaml), the lower will be the probability of running into this issue.
But in the majority of cases this "inconsistency" can happen.
Example of Faulty Execution and Expected Behavior
To make a concrete and trivial example, let's assume a Worker has to submit 20 txns at a fixed rate of 5 txns per second, and let's assume a txUpdateTime (frequency of pushes to Prometheus) of 3 sec:
START-TIME
[0 sec] --> push_to_Prometheus(Submitted: -, Succ: -, Failed: -)
[1 sec] --> 5 txn are submitted
[2 sec] --> 5 txn are submitted
[3 sec] --> 5 txn are submitted && push_to_Prometheus(Submitted: 15, Succ: 15, Failed: 0)
[4 sec] --> 5 txn are submitted
END-TIME
[5-sec] --> clearUpdateInterval() and make last push_to_Prometheus(Submitted: 20, Succ: 20, Failed: 0)
When all Workers are done with this process, the Master is going to choose the earliest endTime and the latest startTime among them, but this does not affect the occurrency of the issue.
As you can see, when the Master is gonna query Prometheus to retrieve metrics at END-TIME, it is going to receive just 15 txns submitted, missing the last batch of 5 txns.
The Expected Behavior is that 20 submitted txns are returned in the Caliper report.
Possible Fix
This can be solved by trivially tell the RoundOrchestrator to make a query at endTime + txUpdateTime: in this way, we can be sure that at this point in time the last push to Prometheus has been performed, and we have all the 20 txns in the report. More in detail the fix can be implemented as follows:
Change the round-orchestrator.js by:
1a. Adding in the constructor the txUpdateTime property, by reading the corresponding user parameter in caliper.yaml, like this: this.txUpdateTime = Config.get(Config.keys.TxUpdateTime, 1000);
1b. Setting the new endTime for issuing queries, before calling await this.report.processPrometheusTPSResults() (lines 191-198) , like this:
// Build the report
// - TPS
let idx;
if (this.monitorOrchestrator.hasMonitor('prometheus')) {
let query = {start: start, end: end + this.txUpdateTime};
idx = await this.report.processPrometheusTPSResults(query, roundConfig, index);
} else {
idx = await this.report.processLocalTPSResults(results, roundConfig);
}
Change the finally block of doTest(test) method in caliper-local-client.js not to make a transaction reset too early (just comment out the this.txReset()). The reasons to do this are:
If we call txReset() right after the test finished, we push to Prometheus 'caliper_txn_success' = 0, 'caliper_txn_failure' = 0 and 'caliper_txn_pending' = 0, therefore every query to the monitor after this moment is returning 0 values: not good because the query time can be after the txReset time.
The txReset() is already called in beforeTest(msg), thus calling it again after the test is unnecessary, because every beforeTest() is performed prior to each corresponding doTest(), therefore the txns metrics are clean when the doTest() is performed.
Steps to Reproduce
Deploy a benchmark using Docker with the following services:
mqtt broker
caliper master
2 (or more) caliper workers
prom/prometheus
prom/pushgateway
prom/node-exporter
google/cadvisor
Run a simple benchmark against your SUT by setting the benchconfig.yaml to execute a round with a txNumber of your choice and fixed rate control. Set the observer interval and monitor interval as you want.
Set the txupdatetime in caliper.yaml to be greater than the monitor interval. Anyways you are free to try whatever value.
Your Environment
Node v10.13.0
npm 6.4.1
OS: Ubuntu 18.04
The text was updated successfully, but these errors were encountered:
Caliper report won't print out ALL the transactions submitted by the Workers when using Prometheus monitor in a distributed deployment: this issue has been experienced when testing a permissioned Ethereum network, but it can be an issue with every SUT.
Context
The reason why the Caliper report won't print out ALL the transactions submitted by the Workers when using Prometheus is the following:
CaliperLocalClient
(Worker) executesdoTest()
, at the beginning of the function it sets the txUpdateTime, which can be configured incaliper.yaml
(otherwise the default is 1000ms)CaliperLocalClient
calls thetxUpdate()
function, which will push transactions statistics on the Prometheus push gatewaydoTest()
, the Client will call eitherrunFixedNumber()
orrunDuration()
: these functions execute the workloads (submit txns) and when all the Promises resolve, they set the endTime variabledoTest()
andclearUpdateInter(txUpdateInter)
executes the lasttxUpdate()
callendTime
variable is sent along with the results to the Master, where it is returned to theRoundOrchestrator
await this.report.processPrometheusTPSResults({start, end}, roundConfig, index)
, where end is the endTime recorded above: through this function, queries are issued to Prometheus in order to retrieve the Caliper txn metrics at time = endTimetxUpdate()
performed by the Worker, thus it's gonna retrieve the metrics about all transactions except the last bunch of them, that have been pushed by theCaliperLocalClient
in the lasttxUpdate()
.Now, the higher is the update frequency from worker to Prometheus (set by the user through
txupdatetime
incaliper.yaml
), the lower will be the probability of running into this issue.But in the majority of cases this "inconsistency" can happen.
Example of Faulty Execution and Expected Behavior
To make a concrete and trivial example, let's assume a Worker has to submit 20 txns at a fixed rate of 5 txns per second, and let's assume a txUpdateTime (frequency of pushes to Prometheus) of 3 sec:
START-TIME
[0 sec] --> push_to_Prometheus(Submitted: -, Succ: -, Failed: -)
[1 sec] --> 5 txn are submitted
[2 sec] --> 5 txn are submitted
[3 sec] --> 5 txn are submitted && push_to_Prometheus(Submitted: 15, Succ: 15, Failed: 0)
[4 sec] --> 5 txn are submitted
END-TIME
[5-sec] --> clearUpdateInterval() and make last push_to_Prometheus(Submitted: 20, Succ: 20, Failed: 0)
When all Workers are done with this process, the Master is going to choose the earliest endTime and the latest startTime among them, but this does not affect the occurrency of the issue.
As you can see, when the Master is gonna query Prometheus to retrieve metrics at END-TIME, it is going to receive just 15 txns submitted, missing the last batch of 5 txns.
The Expected Behavior is that 20 submitted txns are returned in the Caliper report.
Possible Fix
This can be solved by trivially tell the RoundOrchestrator to make a query at
endTime + txUpdateTime
: in this way, we can be sure that at this point in time the last push to Prometheus has been performed, and we have all the 20 txns in the report. More in detail the fix can be implemented as follows:round-orchestrator.js
by:1a. Adding in the constructor the
txUpdateTime
property, by reading the corresponding user parameter incaliper.yaml
, like this:this.txUpdateTime = Config.get(Config.keys.TxUpdateTime, 1000);
1b. Setting the new endTime for issuing queries, before calling
await this.report.processPrometheusTPSResults()
(lines 191-198) , like this:finally
block ofdoTest(test)
method incaliper-local-client.js
not to make a transaction reset too early (just comment out thethis.txReset()
). The reasons to do this are:txReset()
right after the test finished, we push to Prometheus'caliper_txn_success' = 0
,'caliper_txn_failure' = 0
and'caliper_txn_pending' = 0
, therefore every query to the monitor after this moment is returning 0 values: not good because the query time can be after the txReset time.txReset()
is already called inbeforeTest(msg)
, thus calling it again after the test is unnecessary, because everybeforeTest()
is performed prior to each correspondingdoTest()
, therefore the txns metrics are clean when thedoTest()
is performed.Steps to Reproduce
Run a simple benchmark against your SUT by setting the
benchconfig.yaml
to execute a round with atxNumber
of your choice and fixed rate control. Set the observer interval and monitor interval as you want.Set the
txupdatetime
incaliper.yaml
to be greater than the monitor interval. Anyways you are free to try whatever value.Your Environment
The text was updated successfully, but these errors were encountered: