Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Searchable Snapshot] Define additional metrics needed for searchable snapshots #4968

Closed
Tracked by #5087
andrross opened this issue Oct 28, 2022 · 1 comment
Closed
Tracked by #5087
Assignees
Labels
enhancement Enhancement or improvement to existing feature or request Indexing & Search

Comments

@andrross
Copy link
Member

andrross commented Oct 28, 2022

This task focuses on defining the required metrics needed by searchable snapshots. There are a bunch of questions that need to be answered as a part of the original design discussion on caching here.

The goal of this task is to add in any define any new metrics that need to be added in terms of caching, querying, storage for searchable snapshots.

Additional open questions:

  • Do we need add/augment the profile data returned in the query response?
  • Do we need a separate slowlog or is the existing search slowlog sufficient?
@andrross andrross added enhancement Enhancement or improvement to existing feature or request Indexing & Search labels Oct 28, 2022
@kotwanikunal kotwanikunal self-assigned this Jan 9, 2023
@kotwanikunal kotwanikunal changed the title [Searchable Snapshot] Define additional search-related metrics needed for searchable snapshots [Searchable Snapshot] Define additional metrics needed for searchable snapshots Jan 23, 2023
@kotwanikunal
Copy link
Member

Path forward for the additional metrics for Searchable Snapshots. Comments and feedback are appreciated.

Overview

Searchable snapshots launched as an experimental feature with v2.4.0. As a part of GA, we will add in metrics to keep track of different components on the search capable nodes. This document outlines the metrics currently in use by OpenSearch and lays out the path forward for Searchable snapshots metrics.

Current Design Decisions

Searchable snapshots introduced indices which can be queried without downloading all the segment files onto the node, and instead fetching parts of the segment files on-demand as the query dictates. These parts, also known as blocks, will be cached on the nodes. The design decisions for caching are listed out here.

Currently Emitted Metrics

OpenSearch currently emits the following metrics around querying and caching which will serve as basis for the searchable snapshot metrics.

Metrics for querying

  1. search.query_total: Total number of searches served by the node/index in the query stage
  2. search.query_time_in_millis: Total amount of time spent serving the searches in the query stage
  3. search.fetch_total: Total number of searches served by the node/index in the fetch stage
  4. search.fetch_time_in_millis: Total amount of time spent serving the searches in the fetch stage

How to get these metrics? http://localhost:9200/<index_name>/_stats, http://localhost:9200/_nodes/stats

Metrics for caching (Query Cache)

  1. query_cache.memory_size: Total amount of memory utilized by the cache
  2. query_cache.evictions: The number of entries that were removed from the cache
  3. query_cache.hit_count: The number of entries that were located in the cache preventing further calculation/fetch
  4. query_cache.miss_count: The number of entries that were not present in the cache requiring further calculation/fetch

How to get these metrics?http://localhost:9200/_cat/nodes?h=i,qcm,qce,qchc,qcmc&v

Metrics for storage/node stats

  1. disk.total: The total amount of disk space that exists on the node
  2. disk.used: The amount of disk space that is currently in use on the node
  3. disk.avail: The amount of available disk space on the node

How to get these metrics? http://localhost:9200/_cat/nodes?h=i,dt,du,dup,d&v=true

Proposed Metrics

Metrics for querying

Query profiling provides the necessary breakdown and metrics regarding the query execution. Searchable snapshot query profiles will be updated to add download metrics which will demonstrate the amount of time spent in fetching the data as well as the number of bytes downloaded from the repository.

Future work: Expose an API to fetch node level query metrics and cache metrics for searchable snapshots.

Metrics for caching

Based on the cache metrics above, we will define the new cache metrics for search capable nodes. The structure and the data will be similar to the one used by query cache. It will consist of -

  • disk_size, evictions, hit_count, miss_count

We will update the _cat/nodes API to emit this information for search capable nodes.

Metrics for storage

The storage metrics already defined as a part of the OpenSearch service will be sufficient for defining the overall metrics. There will be updates to the MonitorService as well as the DiskThresholdDecider, DiskThresholdMonitor to ensure that the reserved space for cache is subtracted from the available storage when making allocation decisions or monitoring of metrics.
Additionally, we will also emit the reserved storage searcher.cache.reserved_size through the same node metrics as above

Slow log

Slow logs are defined per index and help the customers keep track whenever an indexing or search operation goes above the defined threshold. Docs: https://opensearch.org/docs/latest/opensearch/logs/#slow-logs
To begin with, we will utilize the current slow logs to keep track of searchable snapshot indices. The one caveat here is that we will include the network fetch times within the consumed time for the query, which might trigger thresholds.

Future work: Consume the framework defined by the above metrics for querying and caching to add additional details to the slow log around time consumption for block downloads, cache usage statistics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request Indexing & Search
Projects
Status: Done
Development

No branches or pull requests

2 participants