-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[feature request] MIN(time), MAX(time), FIRST(time), LAST(time), query only time field #5793
Comments
+1 |
@ericremoreynolds can you expand? Can you give an example of finding the maximum time? I think the maximum time is just |
I guess you might want to do a query like SELECT min(time) FROM whatever_measurement WHERE some_condition without having to explicitly reference some non-time column in the select clause. I don't know if that makes sense? |
Wouldn't the minimum time just be the first time for a field? In situations where you don't have a
|
Yeah sure but maybe in some situations you don't want to reference a
specific field but just query the timestamp of the first/last point
regardless of which fields it has values for.
Just a thought, i guess there are indeed approximate workarounds as you say.
|
@pauldix what do you think about this? I've seen a few other issues where people don't really seem to care about which field they're selecting on. While it will be slower than a single column access, do we want to try and support this kind of query? |
@jsternberg Thanks for considering this. But why should the query be slower? In worst case, a "SELECT MIN(dt) FROM cpu" would perform exactly the same internal work as "SELECT FIRST(value) FROM cpu", which also returns the timestamp. But as the value is not returned, it should theoretically even be (marginally) faster (less values to be transferred and thus less encoding/decoding of results). However, it would make code better readable as the query states what we want to select. So without knowing a lot about the internal handling of queries, I would guess that converting MIN(dt),MAX(dt) to equivalent/existing FIRST/LAST queries should not result in additional overhead or performance loss. |
@plinyGo because InfluxDB is a columnar database. So to do this you'd be doing hitting every field and then taking the aggregates against those. Because we don't store these statistics as a whole, you'd end up doing range scans for every field under that measurement. @jsternberg doing this as a query would be a horrible experience because of this. The only tractable way I can think of to do this would be to keep these summary statistics, but we don't have anything on the roadmap for that yet. I think it's difficult enough to implement that we'd really need to think about what the need is here and how we could engineer this to be performant. |
I want to measure lag time itself, so having time, aggregates of time, and time arithmetic expressions in my SELECT clause all make sense. There is literally no other field I'm interested in. Completely different visualizations will concern themselves with the other fields. But my use case is visualizing the change in lag over time. So how do I even get lag? Let's say I have several sensors as clients that report measured fields and their metadata tags once per minute. However, if there is memory, computational, or network latency contentions, my process puts backpressure on chiming in to Influx. The result for a sensor might look like reporting in at 1:02:03, 1:03:04, 1:06:11, 1:08:45, 1:18:12, 11:19:13, etc. My console wants to visualize lag over time and alert when any sensor is behind by more than 15 minutes. A SingleStat in Grafana would be nice. Green when <10m, Orange when >=10m, and Red when >=15m. When my alarm goes off, a nice thing would be a graph to see if lag on that sensor is spiky vs. creeping up over time vs. how it fares vs. other sensors. This can help me differentiate between a troublesome sensor vs. network-to-Influx issues for all sensors. What I don't want is the sensor itself querying Influx, doing the time logic, and then reporting those values into a different measurement to Influx. The lag detection has to be decoupled from the sensor itself. I can't detect a troublesome sensor if I'm depending on it to reliably report its own lag! I can have a dedicated client/process do this for all sensors on their behalf, but I'm having trouble justifying this when I feel Grafana or a Continuous Query should be able to do this. What I need is: This lets me Template and Repeat on sensor_id. I can't do |
+1 |
We really need this type of query. Unability to get the primary key - this is an absurd. |
+1 |
I'm going to close this out. There is no time without a corresponding field. Under the covers Influx is a columnar database where a column is:
That combination represents a series or a Thus it makes no sense to grab the time without a field. If you want to get the -- first value and time returned
select usage_user from cpu limit 1
-- last value and time returned
select usage_user from cpu order by time desc limit 1 You can also add a |
Please reconsider the close issue decision: |
is this issue still on go? I need to know the period of time in a group by query
But it doesn't it sum price or return max and min of time!
Edit: I had to add a timestamp to values to get min and max working !!! (not the best way to achieve a simple thing) |
I have a similar issue relating to time, when doing a query that is time bounded I don't want the beginning timeframe timestamp but to select which timestamp (first/last) for the series to be reported. select last(value) from table where time >= now() - 3h group by "host" will always give me a time as now-3h. It's not showing me the first or last timestamp but its repeating the now()-3h value for each host. When looking at the data programmatically this does nothing to help understand the true timeframe the data represents. |
@rickalm: the query you show works for me (influxdb 1.5.2, Ubuntu 16.04) In the following example, I stopped collecting SNMP data from a particular host by IP address (10.12.255.1) and started collecting it by name instead (gw.example.com), so the "agent_host" tag changed.
The switchover was made about 40 minutes ago. If I reduce the time range then that host drops out completely:
In your case, maybe there are additional time series which are messing up the query. I suggest you try |
+1 |
just use Clickhouse |
Having the same issue as well with aggregate queries. Without being able to get the min or max of the timestamps (or even just the first or last of them), I would need to do the aggregation manually outside of influx to preserve that data. As it is now, aggregate queries completely mangle the original timestamps:
|
I have a problem of similar matter.
I thing a lot of people have similar problems. |
I would also like to be able to work with min and max of the timestamps of a query. I use influx to log statistics from machine lines in manufacturing, such as machine cycle time per minute. The KPI's machine speed over time, total machine cycles per day and total run-time per day, and displaying the result on a Grafana dashboard. Getting the total machine cycles is fairly easy but I cant figure out a good way to display the time span between the first and last registrations per day. Maybe working directly with Influx isn't the best approach. Pulling the data into a SQL database and then do the calculations from there. |
I also make statistics in manufacturing :-) A parameter may be used - one is already used to define the time unit, why not make a second one? Maybe, it even may be the default behaviour of ELAPSED() when a single data point is passed to it. |
+1 |
1 similar comment
+1 |
Use case: I want to be able to select the first and last timestamps per day of my solar panel measurements. The panels obviously don't produce at night, and thus the converter shuts off and stops sending data. For checking availability of the data through a cron-job I want to know at what time the solar panel switches on for each day in the year (and then using this to do some sinus-fitting). I find it interesting behavior that the time is truncated to the start of the group, although I can see there are arguments for this. Maybe it is possible to have two time-columns? One for the start of the group and one for the actual select data point (be it first(), last(), min(), max() or perhaps others) For my use-case I found a bit of a work-around, only valid in case there is a data column which is only increasing within the time group (for example the kWh produced that day): use the |
any update?... |
I really think minimum stored time and maximum stored time (or first and last) is very much required for time series database without having to know the filed name- it is quite frustrating not to get this feature HIGHEST priority. This is the reason I switched to promotheus |
any solution? |
+1 |
+1 |
+1 |
With my query |
For anyone else who finds this in 2022, this is actually pretty straightforward if you are able to use Flux to query rather than InfluxQL. We have a number of IoT devices writing to buckets and we cannot rely on any particular device supporting a specific measurement. I can pull the timestamp of the most recent measurement with:
|
Your query reports wrong value when when the number of records becomes large enough, your |
+1 |
1 similar comment
+1 |
If someone still cares about this in 2024, do yourself a favor, check out Clickhouse. |
These all return null when I tried them, but it would be very useful to be able to query for these.
Also "at least 1 non-time field must be queried" seems like a strange limitation.
The text was updated successfully, but these errors were encountered: