-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fill should force a return of values, even if the query has none #6967
Comments
Whenever this comes up, the problem is that we don't know what series to return. In the use case given by the example, fill seems intuitive because you have no tags so there is only one series. In most use cases, there's more than one series. We can always use the index to determine what series exist, but then Until there is a way to determine what the user wants, I'm not sure there's anything we can do for this. My ears are open if there's some good way to do this because the current workflow that's missing is necessary, but I just haven't been able to come up with anything. |
@jsternberg thanks for reminding us of the tags issue. Obviously if we could return all the tags that the user intends, even though there are no tags in the interval, that would be ideal. However, InfluxDB cannot (yet) read minds, so it's also impractical. That leaves us with two choices, as I see it:
I suspect that the latter is less surprising to most people. At least it returns something, rather than nothing. It may not be the entirety of what is desired, but it does respect the intent of Is there a reason we can't just return one series, using only the measurement name and with no tag set, but respecting the |
I think the hard part would be determining which series to use. If they have a condition and it matches two series, should we output those or return nothing? This also becomes more complicated when you add multiple tags. I can constrain the query to only look at one tag through a If either of these is worth exploring though, we can add it as an exploratory option for 1.1 and see what it looks like to determine if it fits our users' needs. I just have a lingering concern that it will create an extra edge case that will be confusing to explain. |
@jsternberg I'm advocating for not even trying to determine the series. In essence, execute the query. If there are no buckets in the return, artificially create them with the E.g.
becomes
|
👍 |
I don't don't view these 2 ideas as exclusive of each other. In fact I think they support each other, and is completely intuitive. |
Hm, I don't necessarily think so, but I want to explore the idea a bit more before saying for certain. The first thing is I think you may have some misunderstandings or using the wrong words so I'm just going to clarify a few things so that we can be on the same page. Fields aren't a part of a series. Tags are a part of a series. But I do think you bring up a good point. I just don't think it works when I try to make it more generic to all circumstances. If we assume that a measurement always has at least one series (the one where all tags are empty), this works well when you don't use
If I use the following query:
Does this have one or two series? I would say that this has one series ( Is this the behavior you mean? If this is what you mean, I'll think about it a little more. This may be possible, but I need to think if there are any other mitigating circumstances and if the special exception for when there are no series is worth it. |
I am also running into the same issue - trying to use FILL(0) on a time range that has no data. I was expecting to have all buckets with a value of 0, but instead I get nothing. As far as the issue that @jsternberg is bringing up, to me it would seem intuitive to still return the buckets, albeit with no tags i.e.:
If there is data in this time range with a "host" tag, then we can safely assume a single series with host=server01, and we can fill each bucket accordingly. This is the current normal behavior. If there is no data in this time range, then there are no values for the "host" tag, so we can safely assume a single series with host=null and FILL each bucket accordingly.
I would say you wouldn't drop the series with the empty tag key, rather you wouldn't create it to begin with. You would only create the empty tag series if the query turned up no results. It seems pretty straightforward and intuitive to me. Just my 2 cents. |
Same problem here. Fixing this would help a lot. |
This incorporates some feedback received from a spike implementation of host status. For one, the uptime query (now named deltaUptime to better indicate this is a change in uptime) is now packed into the query to fetch the rest of the data for the host page. Also, a "show tag values" query has been added here to fetch hosts that will escape the time range selected by the group by in the deltaUptime query (see this issue: influxdata/influxdb#6967). While not implemented in this commit, it's possible now that we could show different treatments for those hosts that haven't been seen "recently" (as defined by the time selection on the deltaUptime query) and those that have.
Hi guys, there are some news on that? We are collecting metrics from SNMP Devices most of error related metrics are permanently '0': i.e: dot3Stats, ifInErrors/Discards. In order to save a lot of disk space we implemented a filter that it only sends and store data to InfluxDB when the value is different than zero on same measurement, and always report a nonzero metric on measurement to let Influx create those field names. The expected behaviour after that implementation is that InfluxDB would fill those null values with '0' with the fill(0) statement but it is explained comments above, when there is no data it returns nothing. |
Hi. Same question here? Is it something which we need as our alert in kapacitor never closed due to lacking point in query detecting failed. The Rgds. |
We have no progress on this. If you have any kind of idea that gets around the problem listed earlier in this thread for why we can't do this, we can debate if it works or not. |
We are having the same issue. Our
(Here We expect I understand that there is a generalization issues, as described above, however, it should be solved somehow. In terms I'm familiar with, my understanding of the issue is as follows: If in my query above, I didn't include the tag An example of this could be as follows:
This produces output for plenty of instruments, but only those with at least one datapoint (i.e., trade) in the period. My suggestion is to add a flag 'missing_option' to the fill syntax: (Scroll right.)
(edit: Most of the above syntax was taken from documentation. The The 'missing_option' takes either the value 'ignore' or 'keep'. 'ignore' is the default, using the current solution where tag values which aren't represented with datapoints are just ignored. 'keep' on the other hand spits out filled values for all tag values (in my case instruments) which have ever been produced. We will client side handle filtering of the ones we don't care about. Example usage:
Also notice that the 'missing_option' is optional and defaults to the current behavior, thus this would be backward compatible. Anyway. Those are my two cents. Whatever is done, something should be done with this. |
I'd also like to see this fixed. AndreCAndersen's suggestion seems like a good solution |
I'd be happy either of the suggestions from @beckettsean or @AndreCAndersen. 👍 I'm running into many situations where reliable results can't be calculated (easily) because of this. This seems most obvious with count() queries, where you'll never get a 0 back for a single time range query, and only get 0's back in a data series if some elements have data to count. I think if #6412 weren't locked from all the +1 silliness, it would show a lot of support. But allowing fill() to always fill values would fix things and do what (I think) most people expect. |
The solution suggested by @AndreCAndersen and @narciero should be the default. When using InfluxDB with some tools that don't allow editing of the data, this feature would come really handy. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Mr. Stale Bot was warned. No reason to close this out. There isn't much to talk about, it just needs a fix. |
hi,
i use "ping" on telegraf |
This is still an urgent problem that prevents the creation of alerts in Grafana based on lack of data. |
It's now 4 years ago that this issue was raised and we still don't have any solution but people asking for it. |
@jsternberg any thoughts on the proposed change by @AndreCAndersen and @narciero above? |
@jsternberg @beckettsean or others, any thoughts on this? |
I'm struggling with the measurement-series gaps and was looking for a solution. |
we still need a way to fill table with no data |
While I was waiting for this feature I built and sold 2 startups that utilized time-series DBs, migrated from Influx to Clickhouse in the first one, had to re-learn Influx to try v2 which also was disappointing, while I was trying v2 the company decided to rewrite everything and announce v3. |
I'm really confused.... |
Any news on this? Don't want Stale Bot getting antsy again. |
its been 8 years now 🙂. The expression's "Drop non-numeric values" or "Replace non-numeric data" don't work either. |
I figured out a way to fill the entire dataset using the fill() function as long as you can rely on an alternative tag from that measurement that you know has data. As long as the ConstantTag has data for each interval then this will correctly output the values for ReadTag whenever valid, and zeroes all other times, including when the time being displayed has no data for ReadTag Select count("B")*mean("A") FROM |
Feature Request
Could be considered a bug, really a matter of perspective.
Proposal: [Description of the feature]
If there is a
fill()
clause attached to the query, the query should return all buckets specified, even if there are no matching points.Current behavior: [What currently happens]
If the query has nothing but null results, the
fill()
clause is ignored.Simple measurement with one point from yesterday:
Selecting a
SUM
with aGROUP BY time()
andfill()
clauses returns all expected buckets, with thefill()
clause applied to buckets with no data:However, if every bucket has no data,
fill()
is not invoked.Desired behavior: [What you would like to happen]
fill()
should be used to populate all buckets that don't otherwise have data, including when all buckets are null.Use case: [Why is this important (helps with prioritizing requests)]
This is not expected behavior and leads to user confusion when they expect
fill()
to always be applied:https://groups.google.com/d/msgid/influxdb/6b233752-edd2-4523-9022-5c83dbcae344%40googlegroups.com
#6412
#6953
The text was updated successfully, but these errors were encountered: