Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RPC delay metrics #2293

Merged
merged 5 commits into from
Jul 3, 2024
Merged

RPC delay metrics #2293

merged 5 commits into from
Jul 3, 2024

Conversation

xemul
Copy link
Contributor

@xemul xemul commented Jun 12, 2024

The new metrics is total time it takes an RPC message to travel back and forth between verb caller and server. It implicitly consists of the time request and response messages spend in seastar and kernel queues on both sides and classical network RTT.

Calculated for verbs that require response only. First, client negotiates with the server that response frame includes the "handler duration" value, which is the handler callback execution time. Then client notices duration between sending request and receiving response minus handler execution time received from server. This is called "delay sample". Total number of samples and their duration summary are exported as counters in the metrics domain.

refs: #323
refs: #1753 (for what metrics domain is)

@xemul xemul changed the title RPC rount-trip time metrics RPC round-trip time metrics Jun 21, 2024
@xemul
Copy link
Contributor Author

xemul commented Jun 24, 2024

@scylladb/seastar-maint , please review

@nyh
Copy link
Contributor

nyh commented Jun 24, 2024

Can you write a bit about the motivation of this specific metric? Is it something you have seen in some other API and found useful (if so, a reference would be nice)? What do you want to use this new feature for?

What bothers me is that the subtraction you're doing - latency measured by client minus latency measured by server - isn't exactly the RTT of anything: It isn't the RTT of the full request (because that would include the handling time, not subtracting it), and it's not the "classic" network RTT because it includes includes queueing time on both client and server.

@xemul
Copy link
Contributor Author

xemul commented Jun 24, 2024

When looking for sources of network latency in scylladb/scylladb#9505 it was good to know how much time it takes for RPC frame to pass through seastar(send)-kernel(send)-network-kernel(recv)-seastar(recv)-and_back (for the record, one of the problematic places found was in seastar(send) -- #1067), so the idea to collect it as metrics popped up.

What bothers me is that the subtraction you're doing - latency measured by client minus latency measured by server - isn't exactly the RTT of anything

Why not?

  1. (cln) send_entry()
  2. (cln) request is queued into client outgoing queue
  3. (cln) request is written into the socket
  4. network processing, whatever it means
  5. (srv) request is read from socket
  6. (srv) call handler
  7. (srv) response is queued into client outgoing queue
  8. (srv) response is written into the socket
  9. network processing again
  10. (cln) response is read from socket
  11. (cln) find and wake so called "outstanding" entry
  12. (cln) response is parsed and returned back

If I measure timestamps of point 1 and 13, their difference is the total time it took request to process. If I measure the time step 7 took, and subtract it from that calculated difference, I get exactly the time it took the request to travel through seastars and kernels on both sides and the network in between.

and it's not the "classic" network RTT because it includes includes queueing time on both client and server.

But it includes the queuing time on both -- client and server, unless I mis-read the code and placed the timestamps not in points 1 and 13 as described above

@@ -487,7 +490,7 @@ private:
private:
future<> negotiate_protocol(feature_map map);
void negotiate(feature_map server_features);
future<std::tuple<int64_t, std::optional<rcv_buf>>>
future<std::tuple<int64_t, std::optional<uint32_t>, std::optional<rcv_buf>>>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to document what the optional means.

Note: microseconds can overflow 32 bits for an rpc call.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's ~1 hour. Are there handlers that last more?

@@ -430,6 +430,7 @@ class client : public rpc::connection, public weakly_referencable<client> {
struct reply_handler_base {
timer<rpc_clock_type> t;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now the client has two durations using which it can estimate the
rount-trip time of a packet -- the duration between sending request and
receiving response and the time it too handler to execute on the server
side. Difference is the RTT value.

What about time waiting in the send queue (usually none) or the receive queue? Or waiting for the rpc semaphore?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new metrics include all of these.
Or does your question really mean "lets adding those metrics too"?

Copy link
Member

@avikivity avikivity Jun 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean that the difference is not RTT, it's RTT + waits in queues we don't (and can't) measure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the definition of RTT I get used to

Round-trip time (RTT) is the duration it takes for a network request to go from a starting point to a destination and back again to the starting point

and "starting point" here is whoever calls send_helper::operator(). It doesn't tell if there are or there are no queues/delays/throttlers/whatever on the way of the request.

I'm fine to rename it to be clear, but how do you define RTT then?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RTT is the network time without any queues (which ping measures if it doesn't have a bad day).

Do you subtract service time from the measured round trip?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RTT is the network time without any queues (which ping measures if it doesn't have a bad day).

The "without any queues" is pretty optimistic requirement, I would say. I admit that outgoing is likely to take queue-less paths even in the kernel, but incoming all consists of queues.

Do you subtract service time from the measured round trip?

Depends on what "service time" is. If it's duration of the verb handler, then of course, that's the point (9bbe38c)

@@ -953,8 +964,13 @@ namespace rpc {
                           _error = true;
                       } else if (it != _outstanding.end()) {
                           auto handler = std::move(it->second);
+                          auto ht = std::get<1>(msg_id_and_data); // ht stands for "handler time"
                           _outstanding.erase(it);
                           (*handler)(*this, msg_id, std::move(data.value()));
+                          if (ht) {
+                              _stats.rtt_samples++;
+                              _stats.rtt_total += (rpc_clock_type::now() - handler->start) - std::chrono::microseconds(*ht);
+                          }
                       } else if (msg_id < 0) {
                           try {
                               std::rethrow_exception(unmarshal_exception(data.value()));

src/rpc/rpc.cc Outdated
res += m._c._stats.rtt_total;
}
return res.count();
}, sm::description("Total RTT time in seconds"), { domain_l }),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this mean? It will be an average of this-AZ RPC + other-AZ RPC + other-region RPC. Let's say the average is 10ms. What does it mean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, no, it won't be averaged between different DCs. The metrics is per metrics domain, Scylla configures metrics domains per-DC and per-verb-group-index

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's how: scylladb/scylladb#15785

The domain as this patch defines it includes two strings:

First, the datacenter the server lives in. This is because grouping metrics for connections to different datacenters makes little sense for several reasons. For example -- packet delays will differ for local-DC vs cross-DC traffic and mixing those latencies together is pointless. Another example -- the amount of traffic may also differ for local- vs cross-DC connections e.g. because of different usage of enryption and/or compression.

Second, each verb-idx gets its own domain. That's to be able to analyze e.g. query-related traffic from gossiper one. For that the existing isolation cookie is taken as is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah wow

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I share Avi's wow - I didn't even know a concept of "metric domains" existed. As the number of Scylla and Seastar developers grows, we need to try to be more diligent about documenting things that other developers will need to know. When I wrote Scylla's docs/dev/metrics.md, I spent many hours trying to figure out all the details of how metrics were used in Scylla and explain them all (I didn't write any of the metrics code, so I had to dig and talk to Amnon to find the details). But clearly, in the 6 years that have passed (time flies...) things have changed. We should update this document - and document more of Scylla's basic utilities that every developer needs to know. Moreover, we should perhaps think of documenting the metrics feature in Seastar's documentation, not just in Scylla's...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I wrote Scylla's docs/dev/metrics.md, I spent many hours trying to figure out all the details of how metrics were used in Scylla and explain them all (I didn't write any of the metrics code, so I had to dig and talk to Amnon to find the details).

It could be a nice blog post as well

We should update this document - and document more of Scylla's basic utilities that every developer needs to know. Moreover, we should perhaps think of documenting the metrics feature in Seastar's documentation, not just in Scylla's...

Is your proposal to update scylla metrics.md with the ... "details" of metrics that come from seastar?
In general I agree, it's worth describing seastar's metrics in more details than just code (and code comments).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would have been nice to have in Scylla metrics.md become a "book" that documents in length and in depth the meaning of every single metric (such documentation is sorely missing, I can rarely understand anything in the metrics because of the opaque one-sentence descriptions of metric), but what I focused in the current metrics.md is in the overall structure of all metrics and how to use them. For example, I described the metric labels "shard" and "instance" that every metric has. If recently we added per-scheduling-group labels, and per-"domain" (whatever that is) labels that applies to many different metrics, it would be nice to explain them. If Seastar brings its own set of RPC metrics, would be nice to explain it.
In general, I wouldn't complain if more documentation gets written :-)

xemul added 5 commits June 25, 2024 10:04
Ths code should stick to response_frame_headroom constant, not introduce
magic numbers on its own.

Signed-off-by: Pavel Emelyanov <[email protected]>
Note the time it took handler (server-side) callback to run and
propagate it via optional argument down to .respond() method. Error
handlers call respond() too, pass this argument disengaged in this case.

Signed-off-by: Pavel Emelyanov <[email protected]>
Introduce new negotiation parameter that tells if the client wants
server to send it back the handler execution duration. This is for
compatibility with older clients that expect server reply to be of some
fixed (shorter) size.

New response frame, if negotiated, includes the 32 bit handler duration
value in microseconds right next to the existing bits. If no duration
was provided (sending error response, for example), the value is 0xF..F.
After parsing, the duration is returned as optional value and 0xF..F
maps to disengaged one.

For now client makes no use of it, next patch will.

Signed-off-by: Pavel Emelyanov <[email protected]>
Now the client has two durations using which it can estimate the
rount-trip time of a packet -- the duration between sending request and
receiving response and the time it too handler to execute on the server
side. Difference is the delay value.

Those values from different calls are accumulated in a client counter as
well as the total number of delay samples received (errors and old
servers don't send it). Metrics export both as counters, so one can rate
it over time and monitor average delay.

Signed-off-by: Pavel Emelyanov <[email protected]>
By default RPC client uses new frames, that include handler execution
time. Tune up one simple test to validate that legacy frames still work
too.

Signed-off-by: Pavel Emelyanov <[email protected]>
@xemul xemul changed the title RPC round-trip time metrics RPC delay metrics Jun 25, 2024
@xemul
Copy link
Contributor Author

xemul commented Jun 25, 2024

upd:

  • renamed rtt to delay
  • updated doc/rpc.md with new negotiation parameter and extended response frame
  • handle potential 32bit overflow of microseconds duration

@xemul xemul requested review from nyh and avikivity July 3, 2024 06:28
@avikivity avikivity merged commit 826bc0b into scylladb:master Jul 3, 2024
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants