Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mas i1801 2iqueryperf #1802

Merged
merged 14 commits into from
Nov 11, 2021
Merged

Mas i1801 2iqueryperf #1802

merged 14 commits into from
Nov 11, 2021

Conversation

martinsumner
Copy link
Contributor

Adds stats to secondary index queries. Adds to riak stats for each vnode fold, and adds an optional log to the riak_kv_index_fsm.

The optional log can be enabled by setting riak_kv.log_index_fsm to true (or enabling this via riak.conf).

Dependencies are updated to included the revised riak_core_coverage_plan:initiate_plan/5 function which offers significant performance advantages for larger ring_sizes.

@martinsumner
Copy link
Contributor Author

basho/riak_test#1359

Rather than just option to log, stats will be updated as with get_fsm and put_fsm.

Stats include response time histogram, but also a histogram of result counts.

This should be useful for identifying when there are issues related to very large result sets being returned, and establish any non-linear relationship between query time and result counts.
the monitoring of these times has switched to riak_core

e.g.

worker_af4_pool_queuetime_mean"
worker_af4_pool_queuetime_100
worker_af4_pool_worktime_mean
worker_af4_pool_worktime_100

worker_vnode_pool_queuetime_mean
worker_vnode_pool_queuetime_100
worker_vnode_pool_worktime_mean
worker_vnode_pool_worktime_100
Previously only possible to set this via advanced.config.

The default has been reduced, as with a large pool, it is possible that context switching between a vast number of concurrent folds could be slower overall than simply queueing some folds to allow others to complete.

Even on volume tests with 1% of transactions using 2i queries, a pool size of 4 is never exhausted.
rebar.config Outdated
@@ -47,16 +47,16 @@
]}.

{deps, [
{riak_core, {git, "https://github.com/basho/riak_core.git", {tag, "riak_kv-3.0.8"}}},
{riak_core, {git, "https://github.com/basho/riak_core.git", {branch, "mas-i1801-monitorworkerq"}}},
Copy link
Contributor

@ThomasArts ThomasArts Nov 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

work needed before merge (I mean, don't forget to reset to right branches)

end,
TotalResults = length(LastResults) + ResultsSent,
DownTheWire =
case TotalResults > MaxResults of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need TotalResults. The function lists:sublist takes care of this, taking only the MaxResult - ResultSent is sufficient, making the code substantially easier:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
case TotalResults > MaxResults of

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MaxResults, it turns out can be the atom all as well as an integer(), and this relies on the fact any integer() > atom() is false.

So if we just have lists:sublist we get badarith.

But it is a bit worse, as it is also possible to construct a query whereby MaxResults is undefined, but a page-sort query is still requested. undefined isn't expected by dialyzer.

Going to make the case of all vs integer() more specific, and ensure that not integer() is always all

false ->
LastResults
end,
TotalResults = length(LastResults) + ResultsSent,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
TotalResults = length(LastResults) + ResultsSent,

TotalResults = length(LastResults) + ResultsSent,
DownTheWire =
case TotalResults > MaxResults of
true ->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
true ->

DownTheWire =
case TotalResults > MaxResults of
true ->
lists:sublist(LastResults, MaxResults - ResultsSent);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
lists:sublist(LastResults, MaxResults - ResultsSent);
lists:sublist(LastResults, MaxResults - ResultsSent),

case TotalResults > MaxResults of
true ->
lists:sublist(LastResults, MaxResults - ResultsSent);
false ->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
false ->

true ->
lists:sublist(LastResults, MaxResults - ResultsSent);
false ->
LastResults
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
LastResults

lists:sublist(LastResults, MaxResults - ResultsSent);
false ->
LastResults
end,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
end,

@@ -263,6 +263,11 @@ do_update({put_fsm_time, Bucket, Microsecs, Stages, PerBucket, CRDTMod}) ->
ok = create_or_update([P, ?APP, node, puts, Type, time], Microsecs, histogram),
ok = do_stages([P, ?APP, node, puts, Type, time], Stages),
do_put_bucket(PerBucket, {Bucket, Microsecs, Stages, Type});
do_update({index_fsm_time, Microsecs, ResultCount}) ->
P = ?PFX,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, but I like 3 copies of ?PFX in arguments below better... Renaming a macro makes it not as easy to see that a macro is used.

In line 272 below the macro is also used, so more in line with that code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The convention in the module is that if ?PFX is required more than once in a function, it is called then assigned rather than repeatedly called. It might be a performance thing (this a very active piece of code, ignoring excess calls to get_env), or perhaps a consistency thing (so that the prefix can't change across a set of updates.

Without knowing why this is the convention, I don't wish to change it.

@martinsumner
Copy link
Contributor Author

Full list of stats added by this PR:

index_fsm_complete

  • this increments every time a 2i query complete (there is already a counter for every time one is started)

index_fsm_results_mean
index_fsm_results_median
index_fsm_results_95
index_fsm_results_99
index_fsm_results_100

  • a histogram of the numbers of results seen in 2i queries on this node

index_fsm_time_mean
index_fsm_time_median,
index_fsm_time_95,
index_fsm_time_99
index_fsm_time_100

  • a histogram of the time taken for 2i queries on this node

worker_vnode_pool_queuetime_mean"
worker_vnode_pool_queuetime_100
worker_vnode_pool_worktime_mean
worker_vnode_pool_worktime_100

  • the max and mean time spent on each vnode running queries, both queueing to be run and actually running

worker_af1_pool_queuetime_mean"
worker_af1_pool_queuetime_100
worker_af1_pool_worktime_mean
worker_af1_pool_worktime_100
..... (i.e. repeated for af2_pool etc)

  • as above, but for each worker pool which can run queries (i.e. those for aae_folds, rebuilds etc)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants