Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[stack monitoring ui] apm-server events indexed to metricbeat-* not displayed in stack monitoring ui #112926

Closed
stuartnelson3 opened this issue Sep 23, 2021 · 10 comments
Assignees
Labels
bug Fixes for quality problems that affect the customer experience Feature:Stack Monitoring Team:Infra Monitoring UI - DEPRECATED DEPRECATED - Label for the Infra Monitoring UI team. Use Team:obs-ux-infra_services Team:Monitoring Stack Monitoring team

Comments

@stuartnelson3
Copy link
Contributor

Kibana version: 7.16.0-SNAPSHOT

Elasticsearch version: 7.16.0-SNAPSHOT

Server OS version: CentOS Linux release 8.4.2105 (using docker.elastic.co/kibana/kibana:7.16.0-SNAPSHOT)

Browser version: Firefox Develop 93.0b4 (64-bit)

Browser OS version: Ubuntu 21.04

Original install method (e.g. download page, yum, from source, etc.): docker container

Describe the bug: metrics ingested via metricbeat to the default metricbeat-* index are not displayed in the stack monitoring ui, even when they are documents re-indexed from a .monitoring-* index (which are displayed in the stack monitoring ui).

Steps to reproduce:

  1. Start kibana and apm-server. apm-server is configured with http.enabled: true
  2. Monitor apm-server using the metricbeat instructions found here
  3. Navigate to the stack monitoring page, see that apm-server is available and it has a populated dashboard
  4. Re-index the documents from .monitoring-* to metricbeat-*, eg.
    POST _reindex
    {
      "dest": {
        "index": "metricbeat-7.15.0-2021.09.22-000001"
      },
      "source": {
        "index": ".monitoring-beats-7-mb-2021.09.22"
      }
    }
    
  5. Delete the .monitoring-* index
  6. Navigate to the stack monitoring page, apm-server is no longer available / there's no dashboard

Expected behavior: stack monitoring ui reads from both .monitoring-* and metricbeat-* indices to populate the apm-server section

Any additional context: This stems from a conversation where it was believed that stack monitoring had been updated to read from both indices.

@stuartnelson3 stuartnelson3 added the bug Fixes for quality problems that affect the customer experience label Sep 23, 2021
@botelastic botelastic bot added the needs-team Issues missing a team label label Sep 23, 2021
@jasonrhodes jasonrhodes added Feature:Stack Monitoring Team:Infra Monitoring UI - DEPRECATED DEPRECATED - Label for the Infra Monitoring UI team. Use Team:obs-ux-infra_services Team:Monitoring Stack Monitoring team labels Sep 23, 2021
@elasticmachine
Copy link
Contributor

Pinging @elastic/stack-monitoring (Team:Monitoring)

@elasticmachine
Copy link
Contributor

Pinging @elastic/logs-metrics-ui (Team:logs-metrics-ui)

@botelastic botelastic bot removed the needs-team Issues missing a team label label Sep 23, 2021
@simianhacker
Copy link
Member

Can you paste the modules.d/beats.yml module config? If you have xpack.enabled: true it should be writing to .monitoring-*

@stuartnelson3
Copy link
Contributor Author

stuartnelson3 commented Sep 28, 2021

The issue is about how (as I was told) stack monitoring UI should read from metricbeat-* indices as well .monitoring-* indices for displaying apm-server metrics. The metrics are being written to .monitoring-*, and then re-indexed to metricbeat-*. The original index is then deleted.

the beats.yml file used is the same as the one at the link in the issue description

@simianhacker
Copy link
Member

@stuartnelson3 Sorry, I missed the part where you are reindexing the data. Here is the query we run to determine if we have APM data. You will need to replace {CLUSTER_UUID} with the UUID of your cluster. You're schema will need to match the data being queried in this request.

POST *:.monitoring-beats-6-*,*:.monitoring-beats-7-*,.monitoring-beats-6-*,.monitoring-beats-7-*,metricbeat-*,*:metricbeat-*/_search
{
  "query": {
    "bool": {
      "filter": [
        {
          "bool": {
            "should": [
              {
                "term": {
                  "type": "stats"
                }
              },
              {
                "term": {
                  "type": "beats_stats"
                }
              },
              {
                "term": {
                  "metricset.name": "stats"
                }
              },
              {
                "term": {
                  "metricset.name": "beats_stats"
                }
              }
            ]
          }
        },
        {
          "term": {
            "cluster_uuid": "{CLUSTER_UUID}"
          }
        },
        {
          "range": {
            "beats_stats.timestamp": {
              "gte": "now-15m",
              "lte": "now"
            }
          }
        },
        {
          "bool": {
            "must": {
              "term": {
                "beats_stats.beat.type": "apm-server"
              }
            }
          }
        }
      ]
    }
  },
  "aggs": {
    "total": {
      "cardinality": {
        "field": "beats_stats.beat.uuid",
        "precision_threshold": 10000
      }
    },
    "versions": {
      "terms": {
        "field": "beats_stats.beat.version"
      }
    },
    "ephemeral_ids": {
      "terms": {
        "field": "beats_stats.metrics.beat.info.ephemeral_id",
        "size": 10000
      },
      "aggs": {
        "min_events": {
          "min": {
            "field": "beats_stats.metrics.libbeat.pipeline.events.total"
          }
        },
        "max_events": {
          "max": {
            "field": "beats_stats.metrics.libbeat.pipeline.events.total"
          }
        },
        "min_mem": {
          "min": {
            "field": "beats_stats.metrics.beat.memstats.rss"
          }
        },
        "max_mem": {
          "max": {
            "field": "beats_stats.metrics.beat.memstats.rss"
          }
        }
      }
    },
    "min_events_total": {
      "sum_bucket": {
        "buckets_path": "ephemeral_ids>min_events"
      }
    },
    "max_events_total": {
      "sum_bucket": {
        "buckets_path": "ephemeral_ids>max_events"
      }
    },
    "min_mem_total": {
      "sum_bucket": {
        "buckets_path": "ephemeral_ids>min_mem"
      }
    },
    "max_mem_total": {
      "sum_bucket": {
        "buckets_path": "ephemeral_ids>max_mem"
      }
    }
  }
}

FYI, we are planning on removing metricbeat-* from Stack Monitoring index patterns via: #104271

@jasonrhodes
Copy link
Member

jasonrhodes commented Sep 29, 2021

I've been looking into this for a bit and I can easily reproduce the problem you describe, @stuartnelson3. There are a number of problems with the above query Chris posted, when it comes to the metricbeat mappings, which cause problems even when the data is directly reindexed.

  1. "cluster_uuid": "{CLUSTER_UUID}" this is probably a copy/paste error but it won't work here
  2. The metricbeat mappings don't have beats_stats, they only have beat, so beat_stats.beat.type becomes beat.type
{
  "bool": {
    "must": {
      "term": {
        "beats_stats.beat.type": "apm-server" // should be "beat.type": "apm-server"
      }
    }
  }
}
  1. beats_stats doesn't exist at all in the metricbeat mappings, so all references to those fields in all of the aggs fail (e.g. beats_stats.beat.uuid, beats_stats.beat.version, etc.)

The assumption that reading from metricbeat-* works seems to be false. We will need to look into this. @sayden @elastic/beats any insight you can provide here would be a huge help.

@jasonrhodes
Copy link
Member

OK so I think I've narrowed this down. The data stored in .monitoring indices is in an old format, field-path-wise. So things like beats_stats.beat.version, or anything in beats_stats, are "old" field paths. The Metricbeat mappings don't include those, so reindexing docs from one to the other won't work ... yet.

This will work when the aliases are installed and backported (if you run Metricbeat from master, you would get the aliases installed). This work is being tracked by this ticket: elastic/beats#26480

Here you can see where the alias is set up for beat_stats.beat.version, which will be routed to beat.stats.beat.version.

In other words, once the aliases are installed, these queries should work as expected.

@jasonrhodes
Copy link
Member

As for this point from @simianhacker:

FYI, we are planning on removing metricbeat-* from Stack Monitoring index patterns via: #104271

This is true for now. I've talked with @simitt about this and if there is no way to write to .monitoring or to a new data stream with the aliases installed, then we will have to continue reading from metricbeat for this case, but we would very much like to remove that due to large performance and error-handling costs that it introduces, for no benefit.

@stuartnelson3
Copy link
Contributor Author

@jasonrhodes I've confirmed that if we expose /stats and /state from the elastic-agent subprocess monitoring endpoint (/processes/<subprocess>/{stats,state}, elastic/beats#28165), and a managed apm-server becomes aware of its cluster_uuid (elastic/elastic-agent#145), then we can use the beat module in metricbeat with xpack.enabled to index documents into a .monitoring-* index, and populate the stack monitoring ui as normal.

Since we have a path forward, and you're planning on removing metricbeat-* from stack monitoring index patterns, I'm happy to close this issue

@jasonrhodes
Copy link
Member

@stuartnelson3 that sounds great! Then we can take a look at how to migrate to an APM monitoring "package" that indexes to data streams alongside the other work happening there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Fixes for quality problems that affect the customer experience Feature:Stack Monitoring Team:Infra Monitoring UI - DEPRECATED DEPRECATED - Label for the Infra Monitoring UI team. Use Team:obs-ux-infra_services Team:Monitoring Stack Monitoring team
Projects
None yet
Development

No branches or pull requests

4 participants