[feature request] MIN(time), MAX(time), FIRST(time), LAST(time), query only time field #5793

ericremoreynolds · 2016-02-23T10:14:36Z

These all return null when I tried them, but it would be very useful to be able to query for these.

Also "at least 1 non-time field must be queried" seems like a strange limitation.

dx034 · 2016-03-09T08:08:59Z

+1
I need to keep statistics to check data quality, min?time and max_time are crucial for that.

jsternberg · 2016-03-15T13:54:53Z

@ericremoreynolds can you expand? Can you give an example of finding the maximum time? I think the maximum time is just last() of some point. I think #5926 might be what you need.

ericremoreynolds · 2016-03-15T18:23:19Z

I guess you might want to do a query like

SELECT min(time) FROM whatever_measurement WHERE some_condition

without having to explicitly reference some non-time column in the select clause.

I don't know if that makes sense?

jsternberg · 2016-03-15T18:41:12Z

Wouldn't the minimum time just be the first time for a field? In situations where you don't have a GROUP BY time(...), you can do this to find the minimum time of a field.

SELECT first(value) FROM cpu

ericremoreynolds · 2016-03-15T18:46:01Z

Yeah sure but maybe in some situations you don't want to reference a specific field but just query the timestamp of the first/last point regardless of which fields it has values for. Just a thought, i guess there are indeed approximate workarounds as you say.

jsternberg · 2016-03-15T18:47:53Z

@pauldix what do you think about this? I've seen a few other issues where people don't really seem to care about which field they're selecting on. While it will be slower than a single column access, do we want to try and support this kind of query?

dx034 · 2016-03-15T18:56:04Z

@jsternberg Thanks for considering this. But why should the query be slower? In worst case, a "SELECT MIN(dt) FROM cpu" would perform exactly the same internal work as "SELECT FIRST(value) FROM cpu", which also returns the timestamp. But as the value is not returned, it should theoretically even be (marginally) faster (less values to be transferred and thus less encoding/decoding of results). However, it would make code better readable as the query states what we want to select.

So without knowing a lot about the internal handling of queries, I would guess that converting MIN(dt),MAX(dt) to equivalent/existing FIRST/LAST queries should not result in additional overhead or performance loss.

pauldix · 2016-03-15T20:13:56Z

@plinyGo because InfluxDB is a columnar database. So to do this you'd be doing hitting every field and then taking the aggregates against those. Because we don't store these statistics as a whole, you'd end up doing range scans for every field under that measurement.

@jsternberg doing this as a query would be a horrible experience because of this. The only tractable way I can think of to do this would be to keep these summary statistics, but we don't have anything on the roadmap for that yet.

I think it's difficult enough to implement that we'd really need to think about what the need is here and how we could engineer this to be performant.

toddiuszho · 2016-04-27T23:39:44Z

I want to measure lag time itself, so having time, aggregates of time, and time arithmetic expressions in my SELECT clause all make sense. There is literally no other field I'm interested in. Completely different visualizations will concern themselves with the other fields. But my use case is visualizing the change in lag over time.

So how do I even get lag? Let's say I have several sensors as clients that report measured fields and their metadata tags once per minute. However, if there is memory, computational, or network latency contentions, my process puts backpressure on chiming in to Influx. The result for a sensor might look like reporting in at 1:02:03, 1:03:04, 1:06:11, 1:08:45, 1:18:12, 11:19:13, etc.

My console wants to visualize lag over time and alert when any sensor is behind by more than 15 minutes. A SingleStat in Grafana would be nice. Green when <10m, Orange when >=10m, and Red when >=15m. When my alarm goes off, a nice thing would be a graph to see if lag on that sensor is spiky vs. creeping up over time vs. how it fares vs. other sensors. This can help me differentiate between a troublesome sensor vs. network-to-Influx issues for all sensors.

What I don't want is the sensor itself querying Influx, doing the time logic, and then reporting those values into a different measurement to Influx. The lag detection has to be decoupled from the sensor itself. I can't detect a troublesome sensor if I'm depending on it to reliably report its own lag! I can have a dedicated client/process do this for all sensors on their behalf, but I'm having trouble justifying this when I feel Grafana or a Continuous Query should be able to do this.

What I need is: SELECT last(now() - time) FROM sensors WHERE sensor_id =~ /$sensor_id$/ AND time < now() - 4h

This lets me Template and Repeat on sensor_id.

I can't do SELECT last(some_field) FROM sensors ... because I want to visualize the value of the time difference (lag size/duration) across time, not the value of some_field. You can't cheat and declare victory just because there's a timestamp somewhere in the JSON result. I need the difference returned as a first-class citizen so tools can visualize on it.

TaylanT · 2016-07-14T22:42:21Z

+1

dondublon · 2016-11-28T07:23:07Z

We really need this type of query. Unability to get the primary key - this is an absurd.

joelmarty · 2016-12-19T15:12:41Z

+1

pauldix · 2016-12-19T15:59:10Z

I'm going to close this out. There is no time without a corresponding field. Under the covers Influx is a columnar database where a column is:

measurement,tagset,field

# for example
cpu,host=serverA,region=west,usage_user

That combination represents a series or a column. Within that range are value, time tuples. This isn't like a row oriented database where time is stored as a column value per row. Time is stored for each series (or column).

Thus it makes no sense to grab the time without a field. If you want to get the first or last time, which is equivalent to min and max respectively, do it by asking for a specific field like this:

-- first value and time returned
select usage_user from cpu limit 1
-- last value and time returned
select usage_user from cpu order by time desc limit 1

You can also add a where time > ... and time < ... clause to find the first and last times in a given range. So you can already get the summary statistics you're looking for.

DaRoli · 2017-08-10T06:27:52Z

Please reconsider the close issue decision:
How can I get the min(time) and max(time) if I query a time range with grouping, e.g. GROUP BY time(1d)?
The query result will always use startOfDay as returned timestamp of the daily aggregated entries, thus I have no possibility to know the time of my first and last item within the aggregation.

mdaliyan · 2018-04-11T05:40:19Z

is this issue still on go? I need to know the period of time in a group by query

select min(time), max(time), sum(item_price) as price, sum(view) as view from statistics group by item_invoice

But it doesn't it sum price or return max and min of time!

name: statistics
tags: item_invoice=
time min max price view
---- --- --- ----- ----
0             7000     83

name: statistics
tags: item_invoice=067a1302-bac4-42cb-409f-c047b14417c3
time min max price view
---- --- --- ----- ----
0              2500    7

name: statistics
tags: item_invoice=cb9ed298-e355-4f6f-52ba-41b15f4afca7
time min max price view
---- --- --- ----- ----
0               6500   4

Edit: I had to add a timestamp to values to get min and max working !!! (not the best way to achieve a simple thing)

rickalm · 2018-05-09T16:05:34Z

I have a similar issue relating to time, when doing a query that is time bounded I don't want the beginning timeframe timestamp but to select which timestamp (first/last) for the series to be reported.

select last(value) from table where time >= now() - 3h group by "host"

will always give me a time as now-3h. It's not showing me the first or last timestamp but its repeating the now()-3h value for each host. When looking at the data programmatically this does nothing to help understand the true timeframe the data represents.

candlerb · 2018-05-26T15:18:27Z

@rickalm: the query you show works for me (influxdb 1.5.2, Ubuntu 16.04)

In the following example, I stopped collecting SNMP data from a particular host by IP address (10.12.255.1) and started collecting it by name instead (gw.example.com), so the "agent_host" tag changed.

> select last(ifHCInOctets) from interface where time >= now() - 3h group by agent_host
name: interface
tags: agent_host=10.12.255.1
time                 last
----                 ----
2018-05-26T14:35:40Z 41823387483

name: interface
tags: agent_host=gw.example.com
time                 last
----                 ----
2018-05-26T15:09:40Z 41827702059

name: interface
tags: agent_host=sw1.example.com
time                 last
----                 ----
2018-05-26T15:08:41Z 0

The switchover was made about 40 minutes ago. If I reduce the time range then that host drops out completely:

> select last(ifHCInOctets) from interface where time >= now() - 30m group by agent_host
name: interface
tags: agent_host=gw.example.com
time                 last
----                 ----
2018-05-26T15:09:40Z 41827702059

name: interface
tags: agent_host=sw1.example.com
time                 last
----                 ----
2018-05-26T15:08:41Z 20981708378

In your case, maybe there are additional time series which are messing up the query. I suggest you try group by * instead of group by "host"; this may show you additional time series that you weren't expecting. group by "host" will give you the last() entry across all time series which have the same "host" tag.

cby016 · 2018-12-11T16:52:22Z

+1

Tonsofattraction · 2019-02-02T12:27:54Z

just use Clickhouse

chuckination · 2019-08-01T20:11:17Z

Having the same issue as well with aggregate queries. Without being able to get the min or max of the timestamps (or even just the first or last of them), I would need to do the aggregation manually outside of influx to preserve that data.

As it is now, aggregate queries completely mangle the original timestamps:

> select last(time), count(count) as count from count_progress_list where time >= 1564689300000000000 and time <= 1564689600000000000
name: count_progress_list
-------------------------
time                    last    count
1564689300000000000             1281

ccucumber · 2019-08-08T13:27:00Z

I have a problem of similar matter.
I want to watch for first datapoint occurence in time range, to calculate e.g. when user switched the monitoring device on today.
Returned value may be a number of minutes between start of time range and timestamp of first catched datapoint.
Solution may involve ELAPSED(), but both seems to be impossible:

calling ELAPSED on series having only one point
combining two single-point series (merging FIRST() of the series and FIRST() of timerange filled by zeros)

I thing a lot of people have similar problems.

JC-85 · 2019-08-13T17:06:16Z

I would also like to be able to work with min and max of the timestamps of a query.

I use influx to log statistics from machine lines in manufacturing, such as machine cycle time per minute. The KPI's machine speed over time, total machine cycles per day and total run-time per day, and displaying the result on a Grafana dashboard.

Getting the total machine cycles is fairly easy but I cant figure out a good way to display the time span between the first and last registrations per day. Maybe working directly with Influx isn't the best approach. Pulling the data into a SQL database and then do the calculations from there.
But I would much rather be able to do proper statistical analysis directly on the measurements.

ccucumber · 2019-08-13T17:39:21Z

I also make statistics in manufacturing :-)
I think a solution that follows the spirit of InfluxQL the most is extending the ELAPSED() function,
allowing calculation of time elapsed not from datapoint to datapoint, but also from start of time range to datapoint and from datapoint to end of time range.

A parameter may be used - one is already used to define the time unit, why not make a second one? Maybe, it even may be the default behaviour of ELAPSED() when a single data point is passed to it.

thepycoder · 2019-10-07T13:24:30Z

+1

Kevin-Mc-Callister · 2019-11-04T01:19:47Z

+1

amfasis · 2019-11-29T21:13:04Z

Use case: I want to be able to select the first and last timestamps per day of my solar panel measurements. The panels obviously don't produce at night, and thus the converter shuts off and stops sending data. For checking availability of the data through a cron-job I want to know at what time the solar panel switches on for each day in the year (and then using this to do some sinus-fitting).

I find it interesting behavior that the time is truncated to the start of the group, although I can see there are arguments for this. Maybe it is possible to have two time-columns? One for the start of the group and one for the actual select data point (be it first(), last(), min(), max() or perhaps others)

For my use-case I found a bit of a work-around, only valid in case there is a data column which is only increasing within the time group (for example the kWh produced that day): use the bottom(field_key, 1) or top(field_key, 1) functions. Sharing for those interested!

let4be · 2020-01-22T15:29:25Z

any update?...

meyers007 · 2020-06-27T14:55:19Z

I really think minimum stored time and maximum stored time (or first and last) is very much required for time series database without having to know the filed name- it is quite frustrating not to get this feature HIGHEST priority. This is the reason I switched to promotheus

rasoul707 · 2020-07-23T07:21:19Z

any solution?

loblab · 2020-08-17T15:24:22Z

+1
Need first(time) ~ last(time) group by tag

heinzlea-eca · 2020-12-01T08:30:40Z

+1

Swantu · 2021-02-21T12:33:35Z

+1

toddiuszho · 2021-06-23T05:39:21Z

I'm going to close this out. There is no time without a corresponding field. Under the covers Influx is a columnar database where a column is:
measurement,tagset,field

# for example
cpu,host=serverA,region=west,usage_user
That combination represents a series or a column. Within that range are value, time tuples. This isn't like a row oriented database where time is stored as a column value per row. Time is stored for each series (or column).

Thus it makes no sense to grab the time without a field. If you want to get the first or last time, which is equivalent to min and max respectively, do it by asking for a specific field like this:
-- first value and time returned
select usage_user from cpu limit 1
-- last value and time returned
select usage_user from cpu order by time desc limit 1
You can also add a where time > ... and time < ... clause to find the first and last times in a given range. So you can already get the summary statistics you're looking for.

With my query SELECT last(now() - time) FROM sensors WHERE sensor_id =~ /$sensor_id$/ AND time < now() - 4h I am indeed using a non-time field to query with!! The WHERE already efficiently pinpointed exactly what data to fetch. Yes, even by column store standards. Fetching is done. I've already satisfied the "at least 1 non-time field must be queried" requirement. I'm asking in the SELECT clause to whittle the result set down some before sending back to the client.

theGOTOguy · 2022-01-19T08:01:28Z

For anyone else who finds this in 2022, this is actually pretty straightforward if you are able to use Flux to query rather than InfluxQL. We have a number of IoT devices writing to buckets and we cannot rely on any particular device supporting a specific measurement. I can pull the timestamp of the most recent measurement with:

from(bucketID:"your_bucket_id_here") |> range(start: 0) |> sort(columns: ["_time"], desc: true) |> limit(n:1)

max0x7ba · 2022-04-29T00:47:34Z

For anyone else who finds this in 2022, this is actually pretty straightforward if you are able to use Flux to query rather than InfluxQL. We have a number of IoT devices writing to buckets and we cannot rely on any particular device supporting a specific measurement. I can pull the timestamp of the most recent measurement with:
from(bucketID:"your_bucket_id_here") |> range(start: 0) |> sort(columns: ["_time"], desc: true) |> limit(n:1)

Your query reports wrong value when when the number of records becomes large enough, your range(start: 0) only hides the problem. Look up group and last functions.

DanaJomar · 2023-10-30T12:43:26Z

+1

Duocervisia · 2024-10-18T10:27:44Z

+1

earwin · 2024-10-18T10:36:20Z

If someone still cares about this in 2024, do yourself a favor, check out Clickhouse.

jsternberg added area/queries need more info labels Mar 15, 2016

pauldix closed this as completed Dec 19, 2016

phemmer mentioned this issue Oct 16, 2017

Support a sample function: Nth point in interval #8969

Closed

jeroenst mentioned this issue Jan 16, 2021

No data displayed when zooming in and using fill(previous) grafana/grafana#28665

Closed

[feature request] MIN(time), MAX(time), FIRST(time), LAST(time), query only time field #5793

[feature request] MIN(time), MAX(time), FIRST(time), LAST(time), query only time field #5793

Comments

ericremoreynolds commented Feb 23, 2016

dx034 commented Mar 9, 2016

jsternberg commented Mar 15, 2016

ericremoreynolds commented Mar 15, 2016

jsternberg commented Mar 15, 2016

ericremoreynolds commented Mar 15, 2016 via email

jsternberg commented Mar 15, 2016

dx034 commented Mar 15, 2016

pauldix commented Mar 15, 2016

toddiuszho commented Apr 27, 2016

TaylanT commented Jul 14, 2016

dondublon commented Nov 28, 2016

joelmarty commented Dec 19, 2016

pauldix commented Dec 19, 2016

DaRoli commented Aug 10, 2017

mdaliyan commented Apr 11, 2018 • edited Loading

rickalm commented May 9, 2018

candlerb commented May 26, 2018

cby016 commented Dec 11, 2018

Tonsofattraction commented Feb 2, 2019

chuckination commented Aug 1, 2019

ccucumber commented Aug 8, 2019

JC-85 commented Aug 13, 2019

ccucumber commented Aug 13, 2019 • edited Loading

thepycoder commented Oct 7, 2019

Kevin-Mc-Callister commented Nov 4, 2019

amfasis commented Nov 29, 2019

let4be commented Jan 22, 2020

meyers007 commented Jun 27, 2020

rasoul707 commented Jul 23, 2020

loblab commented Aug 17, 2020

heinzlea-eca commented Dec 1, 2020

Swantu commented Feb 21, 2021

toddiuszho commented Jun 23, 2021

theGOTOguy commented Jan 19, 2022

max0x7ba commented Apr 29, 2022 • edited Loading

DanaJomar commented Oct 30, 2023

Duocervisia commented Oct 18, 2024

earwin commented Oct 18, 2024

mdaliyan commented Apr 11, 2018 •

edited

Loading

ccucumber commented Aug 13, 2019 •

edited

Loading

max0x7ba commented Apr 29, 2022 •

edited

Loading