Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support using intermediate results in filter functions #425

Closed
adamperlin opened this issue Jul 3, 2018 · 6 comments
Closed

Add support using intermediate results in filter functions #425

adamperlin opened this issue Jul 3, 2018 · 6 comments
Labels

Comments

@adamperlin
Copy link
Contributor

From ifql created by nathanielc : influxdata/ifql#200

SELECT mean(value)
FROM cpu
WHERE time > now() - 24h
GROUP BY time(1h), host
SORDER max(value) 5m desc
SLIMIT 5

The equivalent IFQL query is:

topN = from(db:"telegraf")
|> filter(fn: (r) => r._measurement == "cpu" and r._field == "usage_idle")
|> range(start:-5m)
|> group(by:["host"])
|> max()
|> sort(cols:["_value"])
|> limit(n:5)

from(db:"telegraf")
|> filter(fn: (r) => r.host in topN.host and r._measurement == "cpu" and r._field == "usage_idle")
|> range(start:-24h)
|> window(every:1h)
|> mean()

influxdata/influxdb#1819
influxdata/influxdb#7894
influxdata/influxdb#2157

@nathanielc
Copy link
Contributor

We discussed this heavily at InfluxDays London. One possible implementation is to transform all in queries to an inner join. Maybe the planner can pick from inner join and simple array look up.

One challenge will be expressing these operations in the query Spec.

@nathanielc
Copy link
Contributor

See #298 the IR approach will make this possible

@nathanielc nathanielc transferred this issue from another repository Dec 10, 2018
@nathanielc
Copy link
Contributor

See #1321

@russorat russorat added enhancement New feature or request func/filter labels Mar 4, 2020
@ojdo
Copy link

ojdo commented Apr 5, 2022

I am currently trying to accomplish what OP has shown above with InfluxDB 2.1.1. Is there any "clutch" or work-around I can use to replace the missing in operator? I have found that highestAverage makes it simpler to derive the topN tables/groups to keep, but am unable to filter the original (i.e. not aggregated in time) series using that list.

Use case: reduce a large number of monitored traffic flows to the 10, 20, 50 most active ones in a graph.

@nathanielc
Copy link
Contributor

Have a look at this function in Flux https://docs.influxdata.com/flux/v0.x/stdlib/universe/findrecord/ and its related functions.

Closing this issue as its now generally possible to use intermediate results in queries.

Thanks for pinging on this issue.

@ojdo
Copy link

ojdo commented Apr 7, 2022

@nathanielc thanks for that prompt feedback! As it was not quite straightforward to discover this pattern for myself (the community forum helped very much, though), I'll document here how to do it:

import "sampledata"

N = 1

topN = sampledata.int()
|> highestAverage(n: N, groupColumns: ["tag"])
|> findColumn(column: "tag", fn: (key) => true)

sampledata.int()
|> filter(fn: (r) => contains(value: r.tag, set: topN))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants