Add an aggregate metric for the theoretical write capacity #28
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The new metric emits a sample of the number of logs per second boltdb could store if all those log write operations looked like the one currently being measured. That means that another operation would have the same number of logs per batch and that the actual txn Commit took the same amount of time. While no two operations will be identical, taking the average of the sample/summary emitted should provide a good picture of what Consul could handle with the current types of write operations being performed.
It is expected that this value will fluctuate with changes in size of the logs flowing through consul and how many logs get batched into one storage op.
If someone wanted to monitor this I think they would want to know when the actual write rate exceeds 75% of this metrics value. That could be due to an increased number of writes, or a degradation in disk performance which causes similar writes to slow down. Regardless of the cause, if you are getting close to the limit or see a drastic change in the metric it could be indicative of another issue which requires investigation.