fill should force a return of values, even if the query has none #6967

beckettsean · 2016-07-06T16:27:57Z

Feature Request

Could be considered a bug, really a matter of perspective.

Proposal: [Description of the feature]
If there is a fill() clause attached to the query, the query should return all buckets specified, even if there are no matching points.

Current behavior: [What currently happens]
If the query has nothing but null results, the fill() clause is ignored.

Simple measurement with one point from yesterday:

> select * from fillfoo
name: fillfoo
-------------
time            value
2016-07-05T20:20:00Z    42

>

Selecting a SUM with a GROUP BY time() and fill() clauses returns all expected buckets, with the fill() clause applied to buckets with no data:

> select sum(value) from fillfoo where time > now() - 21h group by time(4h) fill(0)
name: fillfoo
-------------
time            sum
2016-07-05T16:00:00Z    0
2016-07-05T20:00:00Z    42
2016-07-06T00:00:00Z    0
2016-07-06T04:00:00Z    0
2016-07-06T08:00:00Z    0
2016-07-06T12:00:00Z    0
2016-07-06T16:00:00Z    0

However, if every bucket has no data, fill() is not invoked.

> select sum(value) from fillfoo where time > now() - 20h group by time(4h) fill(0)
>

Desired behavior: [What you would like to happen]
fill() should be used to populate all buckets that don't otherwise have data, including when all buckets are null.

> select sum(value) from fillfoo where time > now() - 20h group by time(4h) fill(0)
name: fillfoo
-------------
time            sum
2016-07-06T00:00:00Z    0
2016-07-06T04:00:00Z    0
2016-07-06T08:00:00Z    0
2016-07-06T12:00:00Z    0
2016-07-06T16:00:00Z    0

Use case: [Why is this important (helps with prioritizing requests)]

This is not expected behavior and leads to user confusion when they expect fill() to always be applied:

https://groups.google.com/d/msgid/influxdb/6b233752-edd2-4523-9022-5c83dbcae344%40googlegroups.com
#6412
#6953

The text was updated successfully, but these errors were encountered:

jsternberg · 2016-07-06T18:07:23Z

Whenever this comes up, the problem is that we don't know what series to return. In the use case given by the example, fill seems intuitive because you have no tags so there is only one series. In most use cases, there's more than one series. We can always use the index to determine what series exist, but then fill() may return filled in values for series that are long since dead. If you use AWS autoscaling, Mesos containers, Docker, etc. you might have many different series for all of the different instances you have. If you only want the last five minutes, should we fill in values for the last 5 minutes on a series that hasn't been relevant in a year? This seem unintuitive, but we would be required to do that for this to work. If we only use series that have a value within the last 5 minutes, you get the current behavior.

Until there is a way to determine what the user wants, I'm not sure there's anything we can do for this. My ears are open if there's some good way to do this because the current workflow that's missing is necessary, but I just haven't been able to come up with anything.

beckettsean · 2016-07-06T18:19:06Z

@jsternberg thanks for reminding us of the tags issue. Obviously if we could return all the tags that the user intends, even though there are no tags in the interval, that would be ideal. However, InfluxDB cannot (yet) read minds, so it's also impractical.

That leaves us with two choices, as I see it:

return nothing
return buckets for a single series

I suspect that the latter is less surprising to most people. At least it returns something, rather than nothing. It may not be the entirety of what is desired, but it does respect the intent of fill().

Is there a reason we can't just return one series, using only the measurement name and with no tag set, but respecting the fill() parameters?

jsternberg · 2016-07-06T18:36:13Z

I think the hard part would be determining which series to use. If they have a condition and it matches two series, should we output those or return nothing? This also becomes more complicated when you add multiple tags. I can constrain the query to only look at one tag through a WHERE condition, but maybe there's another tag that is contained within that series. I guess we could try allowing it only if there's one series that matches the conditions, but it seems like it might be unintuitive. I'm also not 100% certain the code is in a way where we can physically do that so I'm not promising anything yet.

If either of these is worth exploring though, we can add it as an exploratory option for 1.1 and see what it looks like to determine if it fits our users' needs. I just have a lingering concern that it will create an extra edge case that will be confusing to explain.

beckettsean · 2016-07-06T19:05:28Z

@jsternberg I'm advocating for not even trying to determine the series. In essence, execute the query. If there are no buckets in the return, artificially create them with the fill() condition for the series with no tags. Just return it with measurement only as the series.

E.g.

> select max(value) from fillfill where time > now() - 10h group by time(1h), * fill(7)
name: fillfill
tags: tag1=foo, tag2=bar
time            max
----            ---
2016-07-06T09:00:00Z    7
2016-07-06T10:00:00Z    42
2016-07-06T11:00:00Z    7
2016-07-06T12:00:00Z    7
2016-07-06T13:00:00Z    7
2016-07-06T14:00:00Z    7
2016-07-06T15:00:00Z    7
2016-07-06T16:00:00Z    7
2016-07-06T17:00:00Z    7
2016-07-06T18:00:00Z    7
2016-07-06T19:00:00Z    7

> select max(value) from fillfill where time > now() - 8h group by time(1h), * fill(7)
>

becomes

> select max(value) from fillfill where time > now() - 10h group by time(1h), * fill(7)
name: fillfill
tags: tag1=foo, tag2=bar
time            max
----            ---
2016-07-06T09:00:00Z    7
2016-07-06T10:00:00Z    42
2016-07-06T11:00:00Z    7
2016-07-06T12:00:00Z    7
2016-07-06T13:00:00Z    7
2016-07-06T14:00:00Z    7
2016-07-06T15:00:00Z    7
2016-07-06T16:00:00Z    7
2016-07-06T17:00:00Z    7
2016-07-06T18:00:00Z    7
2016-07-06T19:00:00Z    7

> select max(value) from fillfill where time > now() - 8h group by time(1h), * fill(7)
name: fillfill
time            max
----            ---
2016-07-06T11:00:00Z    7
2016-07-06T12:00:00Z    7
2016-07-06T13:00:00Z    7
2016-07-06T14:00:00Z    7
2016-07-06T15:00:00Z    7
2016-07-06T16:00:00Z    7
2016-07-06T17:00:00Z    7
2016-07-06T18:00:00Z    7
2016-07-06T19:00:00Z    7

enumag · 2016-08-23T12:34:26Z

👍

phemmer · 2016-09-03T17:11:29Z

jsternberg: I guess we could try allowing it only if there's one series that matches the conditions, but it seems like it might be unintuitive.

beckettsean: I'm advocating for not even trying to determine the series. In essence, execute the query. If there are no buckets in the return, artificially create them with the fill() condition for the series with no tags. Just return it with measurement only as the series.

I don't don't view these 2 ideas as exclusive of each other. In fact I think they support each other, and is completely intuitive.
The way I look at it, there is always at least one series. When you GROUP BY something, a grouping is all the available permutations of the different fields in the GROUP BY clause. If all fields are null (such as when they don't exist because there's no data), that's still one permutation (a.k.a. a bucket), thus you can do a fill() on the selected column for that permutation.
And in our case, we will always have a minimum of one non-null field, the time field, as it's artificially created by influxdb.

jsternberg · 2016-09-03T17:38:00Z

Hm, I don't necessarily think so, but I want to explore the idea a bit more before saying for certain. The first thing is I think you may have some misunderstandings or using the wrong words so I'm just going to clarify a few things so that we can be on the same page.

Fields aren't a part of a series. Tags are a part of a series. GROUP BY doesn't operate on fields. There is also no time field except one that is implicitly created from a field being written. If there are no fields, there is no time since we don't have a record of any data being inserted at that time. Each field is put into its own columnar section and they are merged when multiple fields are requested. This is useful for selecting a single field since you don't have to decode all of the fields at a certain time to get one field.

But I do think you bring up a good point. I just don't think it works when I try to make it more generic to all circumstances. If we assume that a measurement always has at least one series (the one where all tags are empty), this works well when you don't use GROUP BY tagkey. Imagine that I have one point written.

cpu,host=server01 value=2 10000

If I use the following query:

SELECT sum(value) FROM cpu WHERE time >= now() - 30m GROUP BY time(5m), host

Does this have one or two series? I would say that this has one series (cpu,host=server01), but if we always assume there is at least one series that has an empty value for every tag key, we would get two series (cpu and cpu,host=server01) and would have to return values for both. I guess we could say, "If there are any values for this GROUP BY tagkey and there are none for the empty, drop the series with an empty tag key", but then we're starting to get to special exceptions in behavior which is the part I find unintuitive.

Is this the behavior you mean? If this is what you mean, I'll think about it a little more. This may be possible, but I need to think if there are any other mitigating circumstances and if the special exception for when there are no series is worth it.

cnelissen · 2016-10-06T21:56:45Z

I am also running into the same issue - trying to use FILL(0) on a time range that has no data. I was expecting to have all buckets with a value of 0, but instead I get nothing.

As far as the issue that @jsternberg is bringing up, to me it would seem intuitive to still return the buckets, albeit with no tags i.e.:

cpu,host=server01 value=2 10000

SELECT sum(value) FROM cpu WHERE time >= now() - 30m GROUP BY time(5m), host

If there is data in this time range with a "host" tag, then we can safely assume a single series with host=server01, and we can fill each bucket accordingly. This is the current normal behavior.

If there is no data in this time range, then there are no values for the "host" tag, so we can safely assume a single series with host=null and FILL each bucket accordingly.

I guess we could say, "If there are any values for this GROUP BY tagkey and there are none for the empty, drop the series with an empty tag key", but then we're starting to get to special exceptions in behavior which is the part I find unintuitive.

I would say you wouldn't drop the series with the empty tag key, rather you wouldn't create it to begin with. You would only create the empty tag series if the query turned up no results. It seems pretty straightforward and intuitive to me. Just my 2 cents.

jimaek · 2016-10-19T12:34:34Z

Same problem here. Fixing this would help a lot.

This incorporates some feedback received from a spike implementation of host status. For one, the uptime query (now named deltaUptime to better indicate this is a change in uptime) is now packed into the query to fetch the rest of the data for the host page. Also, a "show tag values" query has been added here to fetch hosts that will escape the time range selected by the group by in the deltaUptime query (see this issue: influxdata/influxdb#6967). While not implemented in this commit, it's possible now that we could show different treatments for those hosts that haven't been seen "recently" (as defined by the time selection on the deltaUptime query) and those that have.

sbengo · 2017-01-25T13:32:01Z

Hi guys, there are some news on that?

We are collecting metrics from SNMP Devices most of error related metrics are permanently '0': i.e: dot3Stats, ifInErrors/Discards.

In order to save a lot of disk space we implemented a filter that it only sends and store data to InfluxDB when the value is different than zero on same measurement, and always report a nonzero metric on measurement to let Influx create those field names.

The expected behaviour after that implementation is that InfluxDB would fill those null values with '0' with the fill(0) statement but it is explained comments above, when there is no data it returns nothing.

AlexGrs · 2017-02-13T15:48:44Z

Hi.

Same question here? Is it something which we need as our alert in kapacitor never closed due to lacking point in query detecting failed. The join() in kapacitor is not executed and the alert is never closed.

Rgds.

jsternberg · 2017-02-13T15:51:00Z

We have no progress on this. If you have any kind of idea that gets around the problem listed earlier in this thread for why we can't do this, we can debate if it works or not.

andre-c-andersen · 2017-03-28T13:26:08Z

We are having the same issue. Our trades measurement, which include individual stock trades with trade size/volume and price (etc), doesn't have entries for all time periods we query:

select
    sum(volume) as trade_volume,
    count(price) as trade_count
from trades
where
    instrument_ticker = '%s' and exchange_ticker = '%s' and
    '%s' <= time and time < '%s'
group by time(%s) fill(0);

(Here instrument_ticker and exchange_ticker are tags)

We expect trade_volume and trade_count to be filled with zero for time bins/bars which have no trades (say every 5 minute "bar"). This works fine when there is at least one datapoint in the queried period. However, if the instrument and day we're looking at for some reason doesn't have any trades this returns no entries. This is just plain unintuitive and pushes the client side application to have to account for it. As it stands we now have to implement a special check for when influx doesn't return something in our client, such that we can emulate that influx returned something.

I understand that there is a generalization issues, as described above, however, it should be solved somehow.

In terms I'm familiar with, my understanding of the issue is as follows: If in my query above, I didn't include the tag instrument_ticker, thus asking for many instruments in one exchange, what should it return for instruments which are long gone and closed shop, i.e., doesn't have any trades anymore?

An example of this could be as follows:

select sum(volume) as trade_volume
from trades 
where 
    exchange_ticker = '%s' and 
    '%s' <= time and time < '%s' 
group by instrument_ticker, time(%s) fill(0)

This produces output for plenty of instruments, but only those with at least one datapoint (i.e., trade) in the period.

My suggestion is to add a flag 'missing_option' to the fill syntax:

(Scroll right.)

 SELECT <function>(<field_key>) FROM_clause WHERE <time_range> GROUP BY time(<time_interval>[,<offset_interval>])[,<tag_key>] [fill(<fill_option>[,<missing_option>])]

(edit: Most of the above syntax was taken from documentation. The time(...) and tag_key parts seems to be wrong. Tried to fix it here, but should probably be updated in docs. In any case, the most right part is what I am suggesting.)

The 'missing_option' takes either the value 'ignore' or 'keep'. 'ignore' is the default, using the current solution where tag values which aren't represented with datapoints are just ignored. 'keep' on the other hand spits out filled values for all tag values (in my case instruments) which have ever been produced. We will client side handle filtering of the ones we don't care about.

Example usage:

select sum(volume) as trade_volume
from trades 
where 
    exchange_ticker = '%s' and 
    '%s' <= time and time < '%s' 
group by instrument_ticker, time(%s) fill(0, keep)

Also notice that the 'missing_option' is optional and defaults to the current behavior, thus this would be backward compatible.

Anyway. Those are my two cents. Whatever is done, something should be done with this.

IvorNL · 2017-05-09T09:22:50Z

I'd also like to see this fixed. AndreCAndersen's suggestion seems like a good solution

Codelica · 2017-06-09T17:40:53Z

I'd be happy either of the suggestions from @beckettsean or @AndreCAndersen. 👍 I'm running into many situations where reliable results can't be calculated (easily) because of this. This seems most obvious with count() queries, where you'll never get a 0 back for a single time range query, and only get 0's back in a data series if some elements have data to count. I think if #6412 weren't locked from all the +1 silliness, it would show a lot of support. But allowing fill() to always fill values would fix things and do what (I think) most people expect.

icosac · 2019-10-14T13:41:18Z

The solution suggested by @AndreCAndersen and @narciero should be the default. When using InfluxDB with some tools that don't allow editing of the data, this feature would come really handy.

stale · 2020-01-12T13:41:49Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Codelica · 2020-01-12T16:49:53Z

Mr. Stale Bot was warned. No reason to close this out. There isn't much to talk about, it just needs a fix.

depfryer · 2020-02-28T11:11:58Z

hi,
you can also have this problem when you want to show a column who haven't data
example :
(host 10.10.10.41 is down)


> show tag values with key = url
name: ping
key value
--- -----
url 10.10.10.41
url www.google.com

> select * from "ping" WHERe time >= now() - 2h and url='10.10.10.41'
name: ping
time                 average_response_ms errors host     maximum_response_ms minimum_response_ms packets_received packets_transmitted percent_packet_loss percent_reply_loss reply_received result_code url
----                 ------------------- ------ ----     ------------------- ------------------- ---------------- ------------------- ------------------- ------------------ -------------- ----------- ---
2020-02-28T09:02:18Z                            host                                         0                1                   100                 100                0              0           10.10.10.41
2020-02-28T09:02:24Z                            host                                         0                1                   100                 100                0              0           10.10.10.41
2020-02-28T09:02:34Z                            host                                         0                1                   100                 100                0              0           10.10.10.41
2020-02-28T09:02:44Z                            host                                         0                1                   100                 100                0              0           10.10.10.41
2020-02-28T09:02:54Z                            host                                         0                1                   100                 100                0              0           10.10.10.41
2020-02-28T09:03:04Z                            host                                         0                1                   100                 100                0              0           10.10.10.41
2020-02-28T09:03:14Z                            host                                         0                1                   100                 100                0              0           10.10.10.41
2020-02-28T09:03:24Z                            host                                         0                1                   100                 100                0              0           10.10.10.41
2020-02-28T09:03:31Z                     100    host                                                                                                                                    2           10.10.10.41
>
> SELECT last("average_response_ms") FROM "ping" WHERE time >= now() - 2h GROUP BY time(30m), url fill(null)
name: ping
tags: url=www.google.com
time                 last
----                 ----
2020-02-28T08:30:00Z
2020-02-28T09:00:00Z 138
2020-02-28T09:30:00Z
2020-02-28T10:00:00Z
2020-02-28T10:30:00Z
>

i use "ping" on telegraf
10.10.10.41 appear in tag, have measurement (but not in all field) and so not appear a the end.....

M0rdecay · 2020-06-03T12:34:35Z

This is still an urgent problem that prevents the creation of alerts in Grafana based on lack of data.

SimonVillage · 2020-08-11T10:38:33Z

It's now 4 years ago that this issue was raised and we still don't have any solution but people asking for it.

dgnorton · 2020-08-11T17:12:31Z

@jsternberg any thoughts on the proposed change by @AndreCAndersen and @narciero above?

shafferj · 2020-10-17T14:52:05Z

@jsternberg @beckettsean or others, any thoughts on this?

cifwig · 2021-04-29T07:56:50Z

I'm struggling with the measurement-series gaps and was looking for a solution.
After some research I have found this:
https://blog.timescale.com/blog/sql-functions-for-time-series-analysis/
May be, their solution gives you an idea for an approach.

lb-ronyeh · 2021-08-23T11:43:37Z

we still need a way to fill table with no data

jimaek · 2021-08-23T12:02:32Z

While I was waiting for this feature I built and sold 2 startups that utilized time-series DBs, migrated from Influx to Clickhouse in the first one, had to re-learn Influx to try v2 which also was disappointing, while I was trying v2 the company decided to rewrite everything and announce v3.
I really recommend staying away from Influx at this point. Anything else will work better.

tgruse · 2021-10-04T13:27:59Z

I'm really confused....
Any feedback information ?

ayunema · 2022-09-15T19:04:27Z

Any news on this? Don't want Stale Bot getting antsy again.

rwader-swi · 2024-06-11T11:14:10Z

its been 8 years now 🙂. The expression's "Drop non-numeric values" or "Replace non-numeric data" don't work either.

thatfireguy7 · 2025-02-21T03:24:34Z

I figured out a way to fill the entire dataset using the fill() function as long as you can rely on an alternative tag from that measurement that you know has data. As long as the ConstantTag has data for each interval then this will correctly output the values for ReadTag whenever valid, and zeroes all other times, including when the time being displayed has no data for ReadTag

Select count("B")*mean("A") FROM
(SELECT mean("ReadTag") as A FROM "process_data" WHERE $timeFilter GROUP BY time(5m) fill(null)),
(SELECT mean("ConstantTag") as B FROM "process_data" WHERE $timeFilter GROUP BY time(5m) fill(null))
WHERE $timeFilter GROUP BY time(5m) fill(0)

beckettsean added area/queries area/influxql Issues related to InfluxQL query language labels Jul 6, 2016

This was referenced Jul 6, 2016

fill(previous) should find most recent value, even if outside query time range #6878

Open

FEI entry influxdata/docs.influxdata.com-ARCHIVE#522

Closed

mattheworiordan mentioned this issue Jul 13, 2016

[0.13] [Bug] Addition for empty column results in nil value irrespective of fill value #6953

Closed

beckettsean added the support label Jul 25, 2016

beckettsean mentioned this issue Aug 26, 2016

Add options to group by time to set where the time windows occur #1837

Closed

earthnut mentioned this issue Nov 3, 2016

Is there any workaround for fill(0) issue in influxDB influxdata/kapacitor#1016

Open

rossmcdonald mentioned this issue Dec 8, 2016

Spike: Use system metrics to indicate host status influxdata/chronograf#672

Merged

AlexGrs mentioned this issue Feb 13, 2017

Raise alert on same measurement but different tags influxdata/kapacitor#1194

Open

rbetts self-assigned this Jun 29, 2017

rbetts added the backlog/storage label Jun 29, 2017

rbetts modified the milestone: 1.4.0 Jun 29, 2017

stale bot removed the wontfix label Aug 28, 2019

garnier-quentin mentioned this issue Oct 24, 2019

[InfluxDB] Time calculation centreon/centreon-plugins#1690

Closed

stale bot added the wontfix label Jan 12, 2020

stale bot removed the wontfix label Jan 12, 2020

timhallinflux assigned 8none1 and unassigned rbetts Jan 12, 2020

jcmcken mentioned this issue Mar 18, 2020

Kapacitor issue - Batch joins not working as expected, one data stream not consistent data. influxdata/kapacitor#1106

Open

M0rdecay mentioned this issue Jun 3, 2020

InfluxDB fill() function does not work in alert requests grafana/grafana#24979

Closed

8none1 assigned dgnorton and unassigned 8none1 Jun 3, 2020

dgnorton assigned davidby-influx and unassigned dgnorton Mar 24, 2021

davidby-influx added area/2.x OSS 2.0 related issues and PRs and removed 1.x labels Mar 26, 2021

docmerlin self-assigned this Oct 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fill should force a return of values, even if the query has none #6967

fill should force a return of values, even if the query has none #6967

beckettsean commented Jul 6, 2016

jsternberg commented Jul 6, 2016

beckettsean commented Jul 6, 2016

jsternberg commented Jul 6, 2016

beckettsean commented Jul 6, 2016 •

edited

Loading

enumag commented Aug 23, 2016

phemmer commented Sep 3, 2016 •

edited

Loading

jsternberg commented Sep 3, 2016

cnelissen commented Oct 6, 2016

jimaek commented Oct 19, 2016

sbengo commented Jan 25, 2017

AlexGrs commented Feb 13, 2017

jsternberg commented Feb 13, 2017

andre-c-andersen commented Mar 28, 2017 •

edited

Loading

IvorNL commented May 9, 2017

Codelica commented Jun 9, 2017

icosac commented Oct 14, 2019

stale bot commented Jan 12, 2020

Codelica commented Jan 12, 2020

depfryer commented Feb 28, 2020

M0rdecay commented Jun 3, 2020

SimonVillage commented Aug 11, 2020

dgnorton commented Aug 11, 2020

shafferj commented Oct 17, 2020

cifwig commented Apr 29, 2021

lb-ronyeh commented Aug 23, 2021

jimaek commented Aug 23, 2021

tgruse commented Oct 4, 2021

ayunema commented Sep 15, 2022

rwader-swi commented Jun 11, 2024 •

edited

Loading

thatfireguy7 commented Feb 21, 2025

fill should force a return of values, even if the query has none #6967

fill should force a return of values, even if the query has none #6967

Comments

beckettsean commented Jul 6, 2016

Feature Request

jsternberg commented Jul 6, 2016

beckettsean commented Jul 6, 2016

jsternberg commented Jul 6, 2016

beckettsean commented Jul 6, 2016 • edited Loading

enumag commented Aug 23, 2016

phemmer commented Sep 3, 2016 • edited Loading

jsternberg commented Sep 3, 2016

cnelissen commented Oct 6, 2016

jimaek commented Oct 19, 2016

sbengo commented Jan 25, 2017

AlexGrs commented Feb 13, 2017

jsternberg commented Feb 13, 2017

andre-c-andersen commented Mar 28, 2017 • edited Loading

IvorNL commented May 9, 2017

Codelica commented Jun 9, 2017

icosac commented Oct 14, 2019

stale bot commented Jan 12, 2020

Codelica commented Jan 12, 2020

depfryer commented Feb 28, 2020

M0rdecay commented Jun 3, 2020

SimonVillage commented Aug 11, 2020

dgnorton commented Aug 11, 2020

shafferj commented Oct 17, 2020

cifwig commented Apr 29, 2021

lb-ronyeh commented Aug 23, 2021

jimaek commented Aug 23, 2021

tgruse commented Oct 4, 2021

ayunema commented Sep 15, 2022

rwader-swi commented Jun 11, 2024 • edited Loading

thatfireguy7 commented Feb 21, 2025

beckettsean commented Jul 6, 2016 •

edited

Loading

phemmer commented Sep 3, 2016 •

edited

Loading

andre-c-andersen commented Mar 28, 2017 •

edited

Loading

rwader-swi commented Jun 11, 2024 •

edited

Loading