Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Capture usage telemetry on queries #25916

Open
pauldix opened this issue Jan 26, 2025 · 7 comments
Open

Capture usage telemetry on queries #25916

pauldix opened this issue Jan 26, 2025 · 7 comments
Labels

Comments

@pauldix
Copy link
Member

pauldix commented Jan 26, 2025

We'll want to capture some telemetry on queries. Specifically, we'll want a histogram for the reporting interval (1h) for name - [buckets]:

  • query_response_time_ms - [10, 30, 50, 100, 200, 400, 800, 2000, +]
  • query_file_total - [1, 2, 6, 12, 24, 140, 280, 432, 500, 1000, +]

We'll also want a counter for the number of query requests that returned with an error due to the file limit being hit.

@pauldix pauldix added the v3 label Jan 26, 2025
@mona-influx
Copy link

when the data lands in s3, can we make sure the json files have a specific prefix (like query_) (or go into a different folder). it'll be more efficent on my end to be able to sort the json for processing/combining if they have a prefix/are in a separate folder vs having the script have to open each file and determine which event type it is based on the json structure

@pauldix
Copy link
Member Author

pauldix commented Jan 27, 2025

Hmmm, that's going to be a bit more difficult as it'll require changes on the backend. I was thinking this would be a change in InfluxDB and use the standard telemetry backend. Unless this is already supported @praveen-influx?

@praveen-influx
Copy link
Contributor

Currently it only supports writing to a single file. Can we have them added to the existing json fields @mona-influx - so query_response_time_ms and query_file_total will be separate fields in the payload we send over already to the server? Sending a separate file will require more work as Paul says.

Having said that I'm not sure what is the efficient format for histogram, if I sent something like below,

{
  "query_response_time_ms_1h": [
    [[0, 10], 100]
    [[11, 30], 200]
    [[31, 50], 2]
    [[51, 100], 10]
    [[101, 200], 11]
    [[201, 400], 12]
    [[401, 800], 13]
    [[801, 2000], 0]
    [[2001, 1000000], 0]
  ]
}

In this case for example 100 queries finished in 0 - 10ms. They're overall totals for the whole 1h period. It's not the same format as prometheus - but if this works, then it should be straight forward on the influxdb side and no changes will be needed on the server side.

@mona-influx
Copy link

oh okay I misunderstood! I thought it was going to be a completely new event but go through the same architecture. if you're just going to be adding additional fields to the same payload, that's great. thanks!

@hiltontj
Copy link
Contributor

@praveen-influx - FWIW the histogram prometheus metrics are reported by omitting the lower bound on the interval. For that, though, the values reported need to be cumulative, e.g.,

{
  "query_response_time_ms_1h": {
    "<10": 100,
    "<30": 300,
    "<50": 302,
    "<100": 312,
// ...
  }
}

Would correspond to your example above. The count in the [31, 50] bucket is determined by subtracting <30 from <50.

Not sure its worth the change for the number of bytes saved 🤷

@praveen-influx
Copy link
Contributor

Thanks @hiltontj - I just came up with something to give an idea that it'd be included in the current payload. I don't know if any of our metrics collector mechanism (that feeds the /metrics API call) can be reused if we chose to do it the prometheus way.

@mona-influx, do you have a preference here? Pre-calculated frequency for each interval (or bin) or a cumulative one as per Trevor's suggestion?

@mona-influx
Copy link

honestly it's not a big deal either way - pre-calc would be slightly quicker but I can very easily work with cumulative if that saves y'all a lot of work 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants