-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fill(previous) should find most recent value, even if outside query time range #6878
Comments
Are there any good work-arounds for this right now? I'm collecting sparse data and trying to graph it using fill(previous), which results in the first several values being null because the previous value falls outside the desired query time range. The only thing I can think to do right now is to execute a second query to get last(field) with the time ending at the start time of the above query, then use that result to fill in the null values. |
@jwheeler-gs that's the best workaround for now |
what happens if |
I... have no idea. I will have to get back to you on that. |
If there was no previous data, wouldn't it make sense to just return null until the first value is encountered? That would be the same as the current behavior but only in the case where there are no previous values. |
@jwheeler-gs yes, that is the current behavior and would be for this too. The issue is what happens when there is no data in that interval at all. Right now, fill will not fill a series that doesn't exist in the interval. @beckettsean was asking what would happen if |
Aaah, I get it now. That's actually going to be a possible case for my use as well. I'd expect (hope for) fill to return that previous value for the entire interval. But then again, I'd also expect it to not return any data past the current point in time. I'm using fill to provide data directly to a chart which can be adjusted to show any time window, past, present, or even future (where future data is filled in using prediction data from another dataset). What I really want is to see the previous value up to some specified point (the present timestamp) and then no data past that. This is probably a bit much to ask of influx and is outside the scope of what it really needs to provide. That can still be handled easily enough on the receiving end by nulling out all values past the current timestamp. |
Would it be possible to add fill(previous) for non-grouped queries? Something like
I know I can do
but that slows down the request considerably, and I get loads of empty lines for times in the time range where there are no measurements. |
@retorquere the issue with having |
@beckettsean I will defer to your superior knowledge on the matter of course, but conceptually, I'd figure it would return exactly the same points as with a regular non-grouped selected, just with the nulls filled in by the value in the column in one of the rows already selected. |
@retorquere InfluxDB does not store nulls. There are no nulls returned in a non-grouped query. |
If I submit this however:
I get this (where I'd love for there to be a way to have those nulls be replaced by 5)
|
That's an interesting use case, where one field is more densely populated On Thu, Aug 25, 2016 at 1:10 PM, retorquere [email protected]
Sean Beckett |
+1 for this feature. |
+1 for this feature! |
Yes please ! This could be release useful ! |
Any updates on this? or anyone know a workaround or something? Currently needing this function |
+1 for this feature. Matter of fact, I thought "fill(previous)" and "fill(linear)" would do the job already. |
Wait so does Prometheus have this issue or not? This particular feature is critical for my use case of monitoring smart home equipment as they are only uploaded to the DB on changes and not on a regular interval. Also, is there any way to import/copy an InfluxDB into a Prometheus DB? I also like how the official FAQ touches on this issue, even referencing this issue thread, but just leave it at that with no update. |
I am not sure to be honest. Maybe the Prometheus connector of Home Assistant is different to the Influx connector and writes the same value regularly. It did work in this combination (HA + Prometheus) so i did not investigated further. |
related issue: influxdata/flux#702 |
Add the last value to the request, an example that works in grafana |
@Kuzj - your function only works if you're querying the latest data, since it doesn't have the time filter in the second query. If I was querying some time in the past, it would give me the time-frame I'm asking for, plus the most recent value - not something I want. IE. I ask for data from the 1st of April to the 23rd of April, but I'm running the query in December - I end up getting not only the data I asked for, but one point of data for today. The only way you can make your query work is if you put the upper time threshold in the second query. And then it's still very inefficient, since you're running two queries where one really should be enough. It's simply a case of returning data from the time index before and the time index after your search range, and is a core feature of most time-series historian products out there in the commercial space that support compressed or sparse data (eDNA, Wonderware Historian, OSI PI, IP21 etc.). |
@Kuzj - And a minor note - with the GROUP BY statement you've got there, there is a distinct chance that your 2nd query will be out-of-phase with your 1st, resulting in that last timestamp not being regularly-spaced compared to the timestamps from the outer query. Ie. the samples in your time range begin at 10:45:22, while the samples in your entire history (which need to be scanned for that 2nd result to work) begin at 15:53:48, meaning that your last sample will be out-of-step with the others. If the system natively supported dealing with sparse data, it could help avoid this by resampling/back-filling/forward-filling the sparse data before the GROUP BY is performed. |
@FifTyz Kudos for your work-around! I have a setpoint of an airco unit that produces sparse data. I tried your workaround using a normal time series graph: It kind of works. The graph is drawing okay, but the tooltip is showing the wrong data. It looks like the tooltip data is in reverse order, while the graph does show it right. The data in de query explorer is also showing the data in descending order: I tried putting a 'Sort by' transformation on column 'Time', but that breaks the work-around. It's really unfortunate the community has to deal with this issue for nearly 6 years!! |
It is really sad that this issue is still not resolved. In my previous company in 2021 we migrated from InfluxDB to VictoriaMetrics. And now in my new company we also use InfluxDB that is having constant stability problems. Our sysops team prepared proof of concept with VictoriaMetrics. It supports InfluxDB protocol and free clustering! |
|
I cant beleave this is an Issue that i open since about 5 years. At least provide a query that works to join first, last and the data with the query language of Influxdb2 (the one with the piping) would be nice. |
Still no news?? would be good to have fill(previous) working as expected... |
@FifTyz interesting idea, but A) That is a very complex query, and there should be a MUCH simpler method for such an obvious use case, and B) I often query for large numbers of channels, and I've found that performance for queries like this scales HORRIBLY. |
@FifTyz First of all, thank you for this elaborate post with excellent comments! |
@OptrixAU
@Kortenbach I don't understand why would you need it after graph stop. Can you give a print screen with a graph example? Maybe I can help... |
@FifTyz Sorry, no printscreen, but I'll try to explain. First the easy one (you just need the record before the start of the graph) Now suppose that it's not an on/off status but a capacity indicator (0..100%). The capacity changes a few percent per minute and is logged every 5 minutes. If the graph ends one minute before the last log then the last 4 minutes of the graph will show no line. If Influx would look ahead it would find the log that's 1 minute after the graph ends and use interpolation to estimate the capacity at the end of the graph. I must admit that case B is a lot less annoying but if you're going to address the problem then it should be taken into consideration I think. |
Oh,
|
@Kortenbach I believe the forward-loading is going to be significantly less common. There may be some fringe cases, but you can't really be confident about when the change actually occured, so interpolating into the future might be misleading - particularly over longer time frames. A lot of devices (like LORA sensors) will send an update on an exception or major change, so appreciable changes would be captured. For me, I'd rather show the value that I KNOW it was at the last sample time rather than the value it possibly had. If you honestly need the higher level of detail, better to actually sample at that detail rather than extrapolate. It's probably worth noting that industrial historians like OSI PI understand the expected sample time from each measurement channel, so they can tell when interpolation is desirable - so when you've got a sensor that samples every 10 seconds it will interpolate samples that are ~10 seconds apart, but won't interpolate samples that are 20 seconds apart. But Influx doesn't have this kind of facility. |
@OptrixAU I agree. The previous sample is much more important than the one after. |
@Kortenbach, with this solution, if you have the following time series:
Assuming you want to graph time between 2022-10-06 00:00:00 to 2022-10-06 01:00:00, with this approach, with the prefix query, you will select record with time 2022-10-05 23:59:00 with value 23 and set it's time with map() to 2022-10-06 00:00:00, start of the graph, and with the main query you will select from record 2022-10-06 00:00:00 with value 21 below and union them together. So now you will have two records with time 2022-10-06 00:00:00 like this:
You will have two records with same time at the begining of the graph, which is definitely inaccurate. |
Thanks for pointing that out! |
Is there any progress here? Im dealing with workarounds that dont really work as I need.
I need to have one value every second, thats why I added the aggregate there. But the fill previous doesnt take the datapoint from the first query in account. Moving the aggregate after the union leads to extreme query times. (dont even know if it works it takes so long.) |
Did you look at your prev data in a table or explore? From were did you get column _stop?! It does not exist in your prev query.
Or you could start filling from previous data, without union, as I mention above:
|
Alltough Influx allows irregular time series, downsampling does not work really: if you query a time span it may omit a measurement from result, if now data point was encountered in this span. Likewise, if no measurement occured in the earliest grouped by intervali(s) empty fields will be returned. There is frustrated discussion in influxdata/influxdb#6878 and no progress since ~6 years. Will evaluate QuestDB in a side branch.
Alltough Influx allows irregular time series, downsampling does not work really: if you query a time span it may omit a measurement from result, if now data point was encountered in this span. Likewise, if no measurement occured in the earliest grouped by intervali(s) empty fields will be returned. There is frustrated discussion in influxdata/influxdb#6878 and no progress since ~6 years. Will evaluate QuestDB in a side branch.
Alltough Influx allows irregular time series, downsampling does not work really: if you query a time span it may omit a measurement from result, if now data point was encountered in this span. Likewise, if no measurement occured in the earliest grouped by intervali(s) empty fields will be returned. There is frustrated discussion in influxdata/influxdb#6878 and no progress since ~6 years. Will evaluate QuestDB in a side branch.
Alltough Influx allows irregular time series, downsampling does not work really: if you query a time span it may omit a measurement from result, if now data point was encountered in this span. Likewise, if no measurement occured in the earliest grouped by intervali(s) empty fields will be returned. There is frustrated discussion in influxdata/influxdb#6878 and no progress since ~6 years. Will evaluate QuestDB in a side branch.
Feature Request
Proposal: [Description of the feature]
When executing
fill(previous)
the query should always have a value forprevious
, even if there is no point with that field in the query time range.Current behavior: [What currently happens]
Note the null value for the 16:10-16:15 bucket, despite there being a point at 16:09 with a value.
Desired behavior: [What you would like to happen]
Use case: [Why is this important (helps with prioritizing requests)]
Currently customers have to know when the last value was recorded in order to make sure that point is included in the time range. For irregular series that's a significant burden. If the system can always find the most recent value regardless of the lower time bound, then many state change queries become useful.
The text was updated successfully, but these errors were encountered: