-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Capture usage telemetry on queries #25916
Comments
when the data lands in s3, can we make sure the json files have a specific prefix (like |
Hmmm, that's going to be a bit more difficult as it'll require changes on the backend. I was thinking this would be a change in InfluxDB and use the standard telemetry backend. Unless this is already supported @praveen-influx? |
Currently it only supports writing to a single file. Can we have them added to the existing json fields @mona-influx - so Having said that I'm not sure what is the efficient format for histogram, if I sent something like below, {
"query_response_time_ms_1h": [
[[0, 10], 100]
[[11, 30], 200]
[[31, 50], 2]
[[51, 100], 10]
[[101, 200], 11]
[[201, 400], 12]
[[401, 800], 13]
[[801, 2000], 0]
[[2001, 1000000], 0]
]
}
In this case for example 100 queries finished in |
oh okay I misunderstood! I thought it was going to be a completely new event but go through the same architecture. if you're just going to be adding additional fields to the same payload, that's great. thanks! |
@praveen-influx - FWIW the histogram prometheus metrics are reported by omitting the lower bound on the interval. For that, though, the values reported need to be cumulative, e.g., {
"query_response_time_ms_1h": {
"<10": 100,
"<30": 300,
"<50": 302,
"<100": 312,
// ...
}
} Would correspond to your example above. The count in the Not sure its worth the change for the number of bytes saved 🤷 |
Thanks @hiltontj - I just came up with something to give an idea that it'd be included in the current payload. I don't know if any of our metrics collector mechanism (that feeds the @mona-influx, do you have a preference here? Pre-calculated frequency for each interval (or bin) or a cumulative one as per Trevor's suggestion? |
honestly it's not a big deal either way - pre-calc would be slightly quicker but I can very easily work with cumulative if that saves y'all a lot of work 😄 |
We'll want to capture some telemetry on queries. Specifically, we'll want a histogram for the reporting interval (1h) for
name
-[buckets]
:query_response_time_ms
-[10, 30, 50, 100, 200, 400, 800, 2000, +]
query_file_total
-[1, 2, 6, 12, 24, 140, 280, 432, 500, 1000, +]
We'll also want a counter for the number of query requests that returned with an error due to the file limit being hit.
The text was updated successfully, but these errors were encountered: