Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Heatmap visualization does not show empty buckets with date histogram #100301

Closed
Tracked by #103633
nickpeihl opened this issue May 18, 2021 · 9 comments · Fixed by #118855
Closed
Tracked by #103633

Heatmap visualization does not show empty buckets with date histogram #100301

nickpeihl opened this issue May 18, 2021 · 9 comments · Fixed by #118855
Labels
blocked bug Fixes for quality problems that affect the customer experience Feature:Heatmap Heatmap visualization impact:medium Addressing this issue will have a medium level of impact on the quality/strength of our product. Team:Visualizations Visualization editors, elastic-charts and infrastructure

Comments

@nickpeihl
Copy link
Member

Kibana version:

master

Browser version:
Firefox 88.0.1

Original install method (e.g. download page, yum, from source, etc.):
from source

Describe the bug:

The Heatmap visualization does not show empty buckets from the y-axis when a date histogram is defined on the x-axis. Originally reported in Discuss.

Steps to reproduce:

  1. Load the ecommerce sample data
  2. Create a heatmap visualization where the y-axis is a histogram on the day_of_week_i field and enable "Show empty buckets".
  3. Add a date histogram on the x-axis below the y-axis
  4. Set the timepicker to exclude certain days of the week, for example, choose a Friday to Monday range.
  5. Notice the y-axis does not show the empty buckets.
  6. The empty buckets only show when the x-axis date histogram is disabled or removed.

Expected behavior:

I would expect empty buckets on the y-axis to appear when "Show empty buckets" is on and a date histogram is on the x-axis. The elasticsearch response includes the empty buckets, but they are not shown in the visualization.

Screenshots (if relevant):

heatmap-bug.mp4
@nickpeihl nickpeihl added bug Fixes for quality problems that affect the customer experience Team:Visualizations Visualization editors, elastic-charts and infrastructure labels May 18, 2021
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-app (Team:KibanaApp)

@timroes timroes added Feature:Heatmap Heatmap visualization impact:medium Addressing this issue will have a medium level of impact on the quality/strength of our product. labels Jul 22, 2021
@flash1293
Copy link
Contributor

It's hard to fix this in a nice way because with the given agg config there are no 0 rows in the resulting data table, so the chart doesn't even now about the missing lanes in the heatmap. One workaround is to use custom defined ranges instead of histogram (which would work for this case as the possible days of week are limited). I don't have a good idea how to fix this issue, I will move it into triaging state for now and get back to it later.

@flash1293 flash1293 added triage_needed and removed impact:medium Addressing this issue will have a medium level of impact on the quality/strength of our product. labels Oct 13, 2021
@flash1293
Copy link
Contributor

flash1293 commented Nov 4, 2021

@markov00 Do you think it would make sense to fix this in the visualization itself (elastic-charts in this case)? Basically allowing a non-categorical axis for the y axis in heatmaps and passing in the interval and the chart fills in the missing rows even if they don't have data.

@markov00
Copy link
Member

markov00 commented Nov 9, 2021

@flash1293 we can implement that in chart, we already did it for date_histogram.
The complexity of how ES build and rounds buckets is not an easy task to port into our repo, but we can try.
I will love to know if/why we can't just supply min_doc_count:0 for such cases, it was suggested here: #8411 and, except for a "possible" but not fully confirmed huge memory consumption, it seems to be a valid suggestion.

@flash1293
Copy link
Contributor

@markov00 This doesn't help because of the way we build the table based on the data. An empty bucket in the histogram looks like this:

  {
            "1": {
              "buckets": []
            },
            "key": 1,
            "doc_count": 0
          },

The empty "buckets" is where the date histogram buckets would go, but Elasticsearch isn't sending them to us (because the histogram bucket is empty) so we can't turn it into rows of the table.

We could change the nesting and do the date histogram first, then the histogram agg as a sub aggregation, but this doesn't catch all cases because it still requires at least one date histogram bucket to have data for the histogram bucket left and right from the empty one like this:

 "buckets": [
                {
                  "metric": {
                    "value": 567
                  },
                  "key": 12,
                  "doc_count": 1
                },
                {
                  "key": 13,
                  "doc_count": 0
                }
                {
                  "metric": {
                    "value": 567
                  },
                  "key": 14,
                  "doc_count": 1
                }
              ]

If there's no date bucket which has data for some histogram bucket lower and higher than the empty histogram bucket (in this case 13), it won't show up in the result.

In summary it's difficult to fix this on the table building / querying side as well and IMHO this looks more like a visualization concern than a data querying concern to me. The histogram case should be much easier than the date histogram case because the buckets are always uniform (no DST and calendar interval complexity) and the first data point will be enough to calculate the others.

@markov00
Copy link
Member

markov00 commented Nov 9, 2021

@flash1293 I'm pretty sure you can build your query and use extended_bounds to include every bucket in the specified bounds, with or without data/docs.

It works on both the histogram and date_histogram aggs and returns all the NxM buckets requested.

GET kibana_sample_data_logs/_search
{
  "aggs": {
    "by_time": {
      "date_histogram": {
        "field": "@timestamp",
        "calendar_interval": "1M",
        "extended_bounds": {
          "min": "2021-01-01T00:00:00.000Z",
          "max": "2022-12-31T23:59:59.999Z"
        }
      },
      "aggs": {
        "group_bytes": {
          "histogram": {
            "field": "bytes",
            "interval": 4096,
            "extended_bounds": {
              "min": "0",
              "max": "20000"
            }
          }
        }
      }
    }
  },
  "size": 0
}

or reversing the aggs

GET kibana_sample_data_logs/_search
{
  "aggs": {
    "group_bytes": {
      "histogram": {
        "field": "bytes",
        "interval": 4096,
        "min_doc_count": 0, 
        "extended_bounds": {
          "min": "0",
          "max": "20000"
        }
      },
      "aggs": {
        "by_time": {
          "date_histogram": {
            "field": "@timestamp",
            "calendar_interval": "1M",
            "extended_bounds": {
              "min": "2021-01-01T00:00:00.000Z",
              "max": "2022-12-31T23:59:59.999Z"
            }
          }
        }
      }
    }
  },
  "size": 0
}

The main drawback is that you need to specify the extended_bounds also for the histogram. This value is anyway already available from the pre-flight request that's get the max/min values to compute the histogram interval.

Having the full matrix from Elasticsearch IMHO is more reliable and represents a 1-1 copy of the ES returned data with no postprocessing in the middle: we render what we receive. We are happier if we can get always that dense dataset instead of empty ones also for timeseries and other histogram related charts, but I can see how sometimes this can influence response payload

@flash1293
Copy link
Contributor

Automated extended bounds is a great idea, I didn't think about that. I'm going to check how it could be implemented.

@stratoula
Copy link
Contributor

@flash1293 another case of this is histogram on the x-axis.
In vislib heatmap the empty buckets are shown
image (11)

while in EC
image (12)

I guess it can also be solved with automated extended bounds (?)

@flash1293 flash1293 added impact:low Addressing this issue will have a low level of impact on the quality/strength of our product. impact:medium Addressing this issue will have a medium level of impact on the quality/strength of our product. and removed triage_needed impact:low Addressing this issue will have a low level of impact on the quality/strength of our product. labels Nov 17, 2021
@flash1293
Copy link
Contributor

Blocked by #119474 - we are working on handling this much more consistently

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocked bug Fixes for quality problems that affect the customer experience Feature:Heatmap Heatmap visualization impact:medium Addressing this issue will have a medium level of impact on the quality/strength of our product. Team:Visualizations Visualization editors, elastic-charts and infrastructure
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants