Capture usage telemetry on queries #25916

pauldix · 2025-01-26T21:34:32Z

We'll want to capture some telemetry on queries. Specifically, we'll want a histogram for the reporting interval (1h) for name - [buckets]:

query_response_time_ms - [10, 30, 50, 100, 200, 400, 800, 2000, +]
query_file_total - [1, 2, 6, 12, 24, 140, 280, 432, 500, 1000, +]

We'll also want a counter for the number of query requests that returned with an error due to the file limit being hit.

The text was updated successfully, but these errors were encountered:

mona-influx · 2025-01-27T13:54:20Z

when the data lands in s3, can we make sure the json files have a specific prefix (like query_) (or go into a different folder). it'll be more efficent on my end to be able to sort the json for processing/combining if they have a prefix/are in a separate folder vs having the script have to open each file and determine which event type it is based on the json structure

pauldix · 2025-01-27T14:59:58Z

Hmmm, that's going to be a bit more difficult as it'll require changes on the backend. I was thinking this would be a change in InfluxDB and use the standard telemetry backend. Unless this is already supported @praveen-influx?

praveen-influx · 2025-01-27T15:42:28Z

Currently it only supports writing to a single file. Can we have them added to the existing json fields @mona-influx - so query_response_time_ms and query_file_total will be separate fields in the payload we send over already to the server? Sending a separate file will require more work as Paul says.

Having said that I'm not sure what is the efficient format for histogram, if I sent something like below,

{
  "query_response_time_ms_1h": [
    [[0, 10], 100]
    [[11, 30], 200]
    [[31, 50], 2]
    [[51, 100], 10]
    [[101, 200], 11]
    [[201, 400], 12]
    [[401, 800], 13]
    [[801, 2000], 0]
    [[2001, 1000000], 0]
  ]
}

In this case for example 100 queries finished in 0 - 10ms. They're overall totals for the whole 1h period. It's not the same format as prometheus - but if this works, then it should be straight forward on the influxdb side and no changes will be needed on the server side.

mona-influx · 2025-01-27T17:00:14Z

oh okay I misunderstood! I thought it was going to be a completely new event but go through the same architecture. if you're just going to be adding additional fields to the same payload, that's great. thanks!

hiltontj · 2025-01-27T20:22:29Z

@praveen-influx - FWIW the histogram prometheus metrics are reported by omitting the lower bound on the interval. For that, though, the values reported need to be cumulative, e.g.,

{
  "query_response_time_ms_1h": {
    "<10": 100,
    "<30": 300,
    "<50": 302,
    "<100": 312,
// ...
  }
}

Would correspond to your example above. The count in the [31, 50] bucket is determined by subtracting <30 from <50.

Not sure its worth the change for the number of bytes saved 🤷

praveen-influx · 2025-01-28T12:58:11Z

Thanks @hiltontj - I just came up with something to give an idea that it'd be included in the current payload. I don't know if any of our metrics collector mechanism (that feeds the /metrics API call) can be reused if we chose to do it the prometheus way.

@mona-influx, do you have a preference here? Pre-calculated frequency for each interval (or bin) or a cumulative one as per Trevor's suggestion?

mona-influx · 2025-01-28T16:24:44Z

honestly it's not a big deal either way - pre-calc would be slightly quicker but I can very easily work with cumulative if that saves y'all a lot of work 😄

pauldix added the v3 label Jan 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Capture usage telemetry on queries #25916

Capture usage telemetry on queries #25916

pauldix commented Jan 26, 2025

mona-influx commented Jan 27, 2025

pauldix commented Jan 27, 2025

praveen-influx commented Jan 27, 2025

mona-influx commented Jan 27, 2025

hiltontj commented Jan 27, 2025

praveen-influx commented Jan 28, 2025

mona-influx commented Jan 28, 2025

Capture usage telemetry on queries #25916

Capture usage telemetry on queries #25916

Comments

pauldix commented Jan 26, 2025

mona-influx commented Jan 27, 2025

pauldix commented Jan 27, 2025

praveen-influx commented Jan 27, 2025

mona-influx commented Jan 27, 2025

hiltontj commented Jan 27, 2025

praveen-influx commented Jan 28, 2025

mona-influx commented Jan 28, 2025