Add support using intermediate results in filter functions #425

adamperlin · 2018-07-03T22:07:05Z

From ifql created by nathanielc : influxdata/ifql#200

SELECT mean(value)
FROM cpu
WHERE time > now() - 24h
GROUP BY time(1h), host
SORDER max(value) 5m desc
SLIMIT 5

The equivalent IFQL query is:

topN = from(db:"telegraf")
|> filter(fn: (r) => r._measurement == "cpu" and r._field == "usage_idle")
|> range(start:-5m)
|> group(by:["host"])
|> max()
|> sort(cols:["_value"])
|> limit(n:5)

from(db:"telegraf")
|> filter(fn: (r) => r.host in topN.host and r._measurement == "cpu" and r._field == "usage_idle")
|> range(start:-24h)
|> window(every:1h)
|> mean()

influxdata/influxdb#1819
influxdata/influxdb#7894
influxdata/influxdb#2157

The text was updated successfully, but these errors were encountered:

nathanielc · 2018-07-17T22:37:59Z

We discussed this heavily at InfluxDays London. One possible implementation is to transform all in queries to an inner join. Maybe the planner can pick from inner join and simple array look up.

One challenge will be expressing these operations in the query Spec.

nathanielc · 2018-12-10T16:55:14Z

See #298 the IR approach will make this possible

nathanielc · 2019-06-24T22:31:44Z

See #1321

ojdo · 2022-04-05T16:03:25Z

I am currently trying to accomplish what OP has shown above with InfluxDB 2.1.1. Is there any "clutch" or work-around I can use to replace the missing in operator? I have found that highestAverage makes it simpler to derive the topN tables/groups to keep, but am unable to filter the original (i.e. not aggregated in time) series using that list.

Use case: reduce a large number of monitored traffic flows to the 10, 20, 50 most active ones in a graph.

nathanielc · 2022-04-06T18:16:43Z

Have a look at this function in Flux https://docs.influxdata.com/flux/v0.x/stdlib/universe/findrecord/ and its related functions.

Closing this issue as its now generally possible to use intermediate results in queries.

Thanks for pinging on this issue.

ojdo · 2022-04-07T09:45:01Z

@nathanielc thanks for that prompt feedback! As it was not quite straightforward to discover this pattern for myself (the community forum helped very much, though), I'll document here how to do it:

import "sampledata"

N = 1

topN = sampledata.int()
|> highestAverage(n: N, groupColumns: ["tag"])
|> findColumn(column: "tag", fn: (key) => true)

sampledata.int()
|> filter(fn: (r) => contains(value: r.tag, set: topN))

nathanielc transferred this issue from another repository Dec 10, 2018

nathanielc added team/query and removed team/query labels Jul 1, 2019

russorat added enhancement New feature or request func/filter labels Mar 4, 2020

nathanielc closed this as completed Apr 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support using intermediate results in filter functions #425

Add support using intermediate results in filter functions #425

adamperlin commented Jul 3, 2018

nathanielc commented Jul 17, 2018

nathanielc commented Dec 10, 2018

nathanielc commented Jun 24, 2019

ojdo commented Apr 5, 2022

nathanielc commented Apr 6, 2022

ojdo commented Apr 7, 2022

Add support using intermediate results in filter functions #425

Add support using intermediate results in filter functions #425

Comments

adamperlin commented Jul 3, 2018

nathanielc commented Jul 17, 2018

nathanielc commented Dec 10, 2018

nathanielc commented Jun 24, 2019

ojdo commented Apr 5, 2022

nathanielc commented Apr 6, 2022

ojdo commented Apr 7, 2022