diff --git a/MAINTAINERS.md b/MAINTAINERS.md index 683c66959..da0922e00 100644 --- a/MAINTAINERS.md +++ b/MAINTAINERS.md @@ -1,11 +1,18 @@ -# Observability Maintainers +## Overview -## Maintainers -| Maintainer | GitHub ID | Affiliation | -|---------------|-------------------------------------------------|-------------| -| David Cui | [davidcui1225](https://github.com/davidcui1225) | Amazon | -| Eric Wei | [mengweieric](https://github.com/mengweieric) | Amazon | -| Joshua Li | [joshuali925](https://github.com/joshuali925) | Amazon | -| Shenoy Pratik | [ps48](https://github.com/ps48) | Amazon | -| Kavitha Mohan | [kavithacm] (https://github.com/kavithacm) | Amazon | -| Eugene Lee | [eugenesk24] (https://github.com/eugenesk24) | Amazon | \ No newline at end of file +This document contains a list of maintainers in this repo. See [opensearch-project/.github/RESPONSIBILITIES.md](https://github.com/opensearch-project/.github/blob/main/RESPONSIBILITIES.md#maintainer-responsibilities) that explains what the role of maintainer means, what maintainers do in this and other repos, and how they should be doing it. If you're interested in contributing, and becoming a maintainer, see [CONTRIBUTING](CONTRIBUTING.md). + +## Current Maintainers + +| Maintainer | GitHub ID | Affiliation | +| ----------------- | ------------------------------------------------- | ----------- | +| David Cui | [davidcui1225](https://github.com/davidcui1225) | Amazon | +| Eric Wei | [mengweieric](https://github.com/mengweieric) | Amazon | +| Joshua Li | [joshuali925](https://github.com/joshuali925) | Amazon | +| Shenoy Pratik | [ps48](https://github.com/ps48) | Amazon | +| Kavitha Mohan | [kavithacm](https://github.com/kavithacm) | Amazon | +| Eugene Lee | [eugenesk24](https://github.com/eugenesk24) | Amazon | +| Rupal Mahajan | [rupal-bq](https://github.com/rupal-bq) | Amazon | +| Derek Ho | [derek-ho](https://github.com/derek-ho) | Amazon | +| Lior Perry | [YANG-DB](https://github.com/YANG-DB) | Amazon | +| Peter Fitzgibbons | [pjfitzgibbons](https://github.com/pjfitzgibbons) | Amazon | diff --git a/release-notes/opensearch-observability.release-notes-2.6.0.0.md b/release-notes/opensearch-observability.release-notes-2.6.0.0.md index 8b8955eb3..a88edcd61 100644 --- a/release-notes/opensearch-observability.release-notes-2.6.0.0.md +++ b/release-notes/opensearch-observability.release-notes-2.6.0.0.md @@ -5,7 +5,7 @@ Compatible with OpenSearch and OpenSearch Dashboards Version 2.6.0 ### Infrastructure - Add publish snapshots to maven via GHA ([#1423](https://github.com/opensearch-project/observability/pull/1423)) - +- Add support for structured Metrics & Traces index using Simple Schema for Observability ([#1427](https://github.com/opensearch-project/observability/pull/1427)) ### Maintenance diff --git a/schema/README.md b/schema/README.md new file mode 100644 index 000000000..427ca3ffc --- /dev/null +++ b/schema/README.md @@ -0,0 +1,104 @@ +# Simple Schema for Observability + +## Background +Observability is the ability to measure a system’s current state based on the data it generates, such as logs, metrics, and traces. Observability relies on telemetry derived from instrumentation that comes from the endpoints and services. + +Observability telemetry signals (logs, metrics, traces) arriving from the system would contain all the necessary information needed to observe and monitor. + +Modern application can have a complicated distributed architecture that combines cloud native and microservices layers. Each layer produces telemetry signals that may have different structure and information. + +Using Simple Schema's Observability telemetry schema we can organize, correlate and investigate system behavior in a standard and well-defined manner. + +Observability telemetry schema defines the following components - **logs, traces and metrics**. + +**Logs** provide comprehensive system details, such as a fault and the specific time when the fault occurred. By analyzing the logs, one can troubleshoot code and identify where and why the error occurred. + +**Traces** represent the entire journey of a request or action as it moves through all the layers of a distributed system. Traces allow you to profile and observe systems, especially containerized applications, serverless architectures, or microservices architecture. + +**Metrics** provide a numerical representation of data that can be used to determine a service or component’s overall behaviour over time. + + +In many occasions, correlation between the logs, traces and metrics is mandatory to be able to monitor and understand how the system is behaving. In addition, the distributed nature of the application produces multiple formats of telemetry signals arriving from different components ( network router, web server, database) + +For such correlation to be possible the industry has formulated several protocols ([OTEL](https://github.com/open-telemetry), [ECS](https://github.com/elastic/ecs), [OpenMetrics](https://github.com/OpenObservability/OpenMetrics)) for communicating these signals - the Observability schemas. + +--- +## Schema Aware Components + +The role of the Observability plugin is intended to allow maximum flexibility and not imposing a strict Index structure of the data source. Nevertheless, the modern nature of distributed application and the vast amount of telemetry producers is changing this perception. + +Today many of the Observability solutions (splunk, datadog, dynatrace) recommend using a consolidated schema to represent the entire variance of log/trace/metrics producers. + +This allows monitoring, incidents investigation and corrections process to become simpler, maintainable and reproducible. + + +A Schema-Aware visualization component is a component which assumes the existence of specific index/indices name patterns and expects these indices to have a specific structure - a schema. + +As an example we can see that **Trace-Analytics** is a schema-aware visual component since it directly assumes the traces & serviceMap indices exist and expects them to follow a specific structure. + +This definition doesn’t change the existing status of visualization components which are not “Schema Aware” but it only regulates which Visual components would benefit using a schema and which will be agnostic of its content. + +Operation Panel for example, are not “Schema Aware” since they don’t assume in advanced the existence of a specific index nor do they expect the index they display to have a specific structure. + +## Data Model + +Simple Schema for Observability allows ingestion of both (OTEL/ECS) formats and internally consolidate them to best of its capabilities for presenting a unified Observability platform. + +## Observability index naming + +The Observability indices would follow the recommended for immutable data stream ingestion pattern using the [data_stream concepts](https://opensearch.org/docs/latest/opensearch/data-streams/) + +Index pattern will follow the next naming template `sso_{type}`-`{dataset}`-`{namespace}` + + - **type** - indicated the observability high level types "logs", "metrics", "traces" (prefixed by the `sso_` schema convention ) + - **dataset** - The field can contain anything that classify the source of the data - such as `nginx.access` + - **namespace** - A user defined namespace - mainly useful to allow grouping of data such as production grade, geography classification + +This strategy allows two degrees of naming freedom: dataset and namespace. For example a customer may want to route the nginx logs from two geographical areas into two different indices: + + - `sso_logs-nginx-us` + - `sso_logs-nginx-eu` + +This type of distinction also allows for creation of crosscutting queries by setting the next index query pattern `sso_logs-nginx-*` or by using a geographic based crosscutting query `sso_logs-*-eu`. + +## Data index routing + +The [ingestion component](https://github.com/opensearch-project/data-prepper) which is responsible for ingesting the Observability signals is responsible to route the data into the relevant indices. + +The `sso_{type}-{dataset}-{namespace}` combination dictates the target index, `{type}` is prefixed with the `sso_` prefix into one of the supported type: + + - Traces - `sso_traces` + - Metrics - `sso_metrics` + - Logs - `sso_logs` + +For example if within the ingested log contains the following section: +```json +{ + ... + "attributes": { + "data_stream": { + "type": "span", + "dataset": "mysql", + "namespace": "prod" + } + } +} +``` +This indicates that the target index for this observability signal should be `sso_traces`-`mysql`-`prod` index that follows uses the traces schema mapping. + +## Observability Index templates + +With the expectation of multiple Observability data providers and the need to consolidate all to a single common schema - the Observability plugin will take the following responsibilities : + + - Define and create all the signals **index templates** upon loading + - Publish a versioned schema file (Json Schema) for each signal type for general validation usage by any 3rd party + +## Observability Ingestion pipeline +The responsibility on an **Observability-ingestion-pipeline** is to create the actual `data_stream` in which it is expecting to ingest into. + +This `data_stream` will use one of the Observability ready-made index templates (Metrics,Traces and Logs) and conform with the above naming pattern (`sso_{type}`-`{dataset}`-`{namespace}`) + +**If the ingesting party has a need to update the template default index setting (shards, replicas ) it may do so before the actual creation of the data_stream.** + +### Note +It is important to mention that these new capabilities would not change or prevent existing customer usage of the system and continue to allow proprietary usage. diff --git a/schema/metrics/README.md b/schema/metrics/README.md new file mode 100644 index 000000000..78e940f51 --- /dev/null +++ b/schema/metrics/README.md @@ -0,0 +1,105 @@ +# Metrics Schema Support + +Observability refers to the ability to monitor and diagnose systems and applications in real-time, in order to understand how they are behaving and identify potential issues. +Metrics present a critical component of observability, providing quantifiable data about the performance and behavior of systems and applications. +The importance of supporting metrics structured schema lies in the fact that it enables better analysis and understanding of system behavior. + +A structured schema provides a clear, consistent format, making it easier for observability tools to process and aggregate the data. +This in turn makes it easier for engineers to understand the performance and behavior of their systems, and quickly identify potential issues. + +When metrics are unstructured, it can be difficult for observability tools to extract meaningful information from them. +For example, if the data for a particular metric is not consistently recorded in the same format, it can be difficult to compare and analyze performance data over time. +Similarly, if metrics are not consistently named or categorized, it can be difficult to understand their context and significance. + +With a structured schema in place, observability tools can automatically extract and aggregate data, making it easier to understand system behavior at a high level. +This can help teams quickly identify performance bottlenecks, track changes in system behavior over time, and make informed decisions about system performance optimization. + +## Details +The next section provides the Simple Schema for Observability support which conforms with the OTEL specification. + +- metrics.mapping presents the template mapping for creating the Simple Schema for Observability index +- metrics.schema presents the json schema validation for verification of a metrics document conforms to the mapping structure + +## Metrics +see [OTEL metrics convention](https://opentelemetry.io/docs/reference/specification/metrics/) +see [OTEL metrics protobuf](https://github.com/open-telemetry/opentelemetry-proto/tree/main/opentelemetry/proto/metrics/v1) + +Simple Schema for Observability conforms with OTEL metrics protocol which defines the next data model: + +#### Timestamp field +As part of the data-stream definition the `@timestamp` is mandatory, if the field is not present in the original signal populate this field using `ObservedTimestamp` as value. + +### Instrumentation scope +This is a logical unit of the application with which the emitted telemetry can be associated. It is typically the developer’s choice to decide what denotes a reasonable instrumentation scope. +The most common approach is to use the instrumentation library as the scope, however other scopes are also common, e.g. a module, a package, or a class can be chosen as the instrumentation scope. + +The instrumentation scope may have zero or more additional attributes that provide additional information about the scope. As an example the field +`instrumentationScope.attributes.identification` is presented will be used to determine the resource origin of the signal and can be used to filter accordingly + +### Overview +Metrics are a specific kind of telemetry data. They represent a snapshot of the current state for a set of data. +Metrics are distinct from logs or events, which focus on records or information about individual events. + +Metrics expresses all system states as numerical values; counts, current values and such. +Metrics tend to aggregate data temporally, while this can lose information, the reduction in overhead is an engineering trade-off commonly chosen in many modern monitoring systems. + +Time series are a record of changing information over time. While time series can support arbitrary strings or binary data, only numeric data is in our scope. +Common examples of metric time series would be network interface counters, device temperatures, BGP connection states, and alert states. + +### Metric streams +In a similar way to the data_stream attribute field representing the category of a trace, the metric streams are grouped into individual Metric objects, identified by: + + - The originating Resource attributes + - The instrumentation Scope (e.g., instrumentation library name, version) + - The metric stream’s name + +### Metrics +Metric object is defined by the following properties: + + - The data point type (e.g. Sum, Gauge, Histogram ExponentialHistogram, Summary) + - The metric stream’s unit + - The data point properties, where applicable: AggregationTemporality, Monotonic + +The description is also present in the metrics object but is not part of the identification fields +_- The metric stream’s description_ + + +### Data Types + +**Values:** Metric values in MUST be either floating points or integers. + +**Attributes:** Labels are key-value pairs consisting of string as keys and Any type as values (strings, object, array) + +**MetricPoint:** Each MetricPoint consists of a set of values, depending on the MetricFamily type. + +**Metric** Metrics are defined by a unique attributes (dimensions) within a MetricFamily. + +--- + +Metrics MUST contain a list of one or more MetricPoints. Metrics with the same name for a given MetricFamily SHOULD have the same set of label names in their LabelSet. + +* Metrics.name: String value representation of the matrix purpose +* Metrics.type: Valid values are "gauge", "counter","histogram", and "summary". +* Metrics.Unit: specifies MetricFamily units. + +## Metric Types + +### Gauge +Gauges are current measurements, such as bytes of memory currently used or the number of items in a queue. For gauges the absolute value is what is of interest to a user. +**_A MetricPoint in a Metric with the type gauge MUST have a single value._** +Gauges MAY increase, decrease, or stay constant over time. Even if they only ever go in one direction, they might still be gauges and not counters. + +### Counter +Counters measure discrete events. Common examples are the number of HTTP requests received, CPU seconds spent, or bytes sent. For counters how quickly they are increasing over time is what is of interest to a user. +**_A MetricPoint in a Metric with the type Counter MUST have one value called Total._** + +### Histogram / Exponential-Histogram +Histograms measure distributions of discrete events. Common examples are the latency of HTTP requests, function runtimes, or I/O request sizes. +**_A Histogram MetricPoint MUST contain at least one bucket_**, and SHOULD contain Sum, and Created values. Every bucket MUST have a threshold and a value. + +### Summary +Summaries also measure distributions of discrete events and MAY be used when Histograms are too expensive and/or an average event size is sufficient. +**_A Summary MetricPoint MAY consist of a Count, Sum, Created, and a set of quantiles._** +Semantically, Count and Sum values are counters & MUST be an integer. + + diff --git a/schema/metrics/metrics-mapping.json b/schema/metrics/metrics-mapping.json new file mode 100644 index 000000000..7f767f65d --- /dev/null +++ b/schema/metrics/metrics-mapping.json @@ -0,0 +1,288 @@ +{ + "index_patterns": [ + "sso_metrics-*-*" + ], + "data_stream": {}, + "template": { + "mappings": { + "_meta": { + "version": "0.1.0-dev" + }, + "_source": { + "enabled": true + }, + "dynamic_templates": [ + { + "attributes_map": { + "mapping": { + "type": "keyword" + }, + "path_match": "attributes.*" + } + }, + { + "resources_map": { + "mapping": { + "type": "keyword" + }, + "path_match": "resource.*" + } + }, + { + "exemplar_attributes_map": { + "mapping": { + "type": "keyword" + }, + "path_match": "exemplar.attributes.*" + } + }, + { + "instrumentation_scope_attributes_map": { + "mapping": { + "type": "keyword" + }, + "path_match": "instrumentationScope.attributes.*" + } + } + ], + "properties": { + "name": { + "type": "text", + "fields": { + "keyword": { + "type": "keyword", + "ignore_above": 256 + } + } + }, + "attributes": { + "type": "object", + "properties": { + "data_stream": { + "properties": { + "dataset": { + "ignore_above": 128, + "type": "keyword" + }, + "namespace": { + "ignore_above": 128, + "type": "keyword" + }, + "type": { + "ignore_above": 56, + "type": "keyword" + } + } + } + } + }, + "description": { + "type": "text", + "fields": { + "keyword": { + "type": "keyword", + "ignore_above": 256 + } + } + }, + "unit": { + "type": "keyword", + "ignore_above": 128 + }, + "kind": { + "type": "keyword", + "ignore_above": 128 + }, + "aggregationTemporality": { + "type": "keyword", + "ignore_above": 128 + }, + "monotonic": { + "type": "boolean" + }, + "startTime": { + "type": "date" + }, + "@timestamp": { + "type": "date" + }, + "observedTimestamp": { + "type": "date_nanos" + }, + "value": { + "properties": { + "int": { + "type": "integer" + }, + "double": { + "type": "double" + } + } + }, + "buckets": { + "properties": { + "count": { + "type": "long" + }, + "sum": { + "type": "double" + }, + "max": { + "type": "float" + }, + "min": { + "type": "float" + } + } + }, + "bucketCount": { + "type": "long" + }, + "bucketCountsList": { + "type": "long" + }, + "explicitBoundsList": { + "type": "float" + }, + "explicitBoundsCount": { + "type": "float" + }, + "quantiles": { + "properties": { + "quantile": { + "type": "double" + }, + "value": { + "type": "double" + } + } + }, + "quantileValuesCount": { + "type": "long" + }, + "positiveBuckets": { + "properties": { + "count": { + "type": "long" + }, + "max": { + "type": "float" + }, + "min": { + "type": "float" + } + } + }, + "negativeBuckets": { + "properties": { + "count": { + "type": "long" + }, + "max": { + "type": "float" + }, + "min": { + "type": "float" + } + } + }, + "negativeOffset": { + "type": "integer" + }, + "positiveOffset": { + "type": "integer" + }, + "zeroCount": { + "type": "long" + }, + "scale": { + "type": "long" + }, + "max": { + "type": "float" + }, + "min": { + "type": "float" + }, + "sum": { + "type": "float" + }, + "count": { + "type": "long" + }, + "exemplar": { + "properties": { + "time": { + "type": "date" + }, + "traceId": { + "ignore_above": 256, + "type": "keyword" + }, + "spanId": { + "ignore_above": 256, + "type": "keyword" + } + } + }, + "instrumentationScope": { + "properties": { + "name": { + "type": "text", + "fields": { + "keyword": { + "type": "keyword", + "ignore_above": 128 + } + } + }, + "version": { + "type": "text", + "fields": { + "keyword": { + "type": "keyword", + "ignore_above": 256 + } + } + }, + "droppedAttributesCount": { + "type": "integer" + }, + "schemaUrl": { + "type": "text", + "fields": { + "keyword": { + "type": "keyword", + "ignore_above": 256 + } + } + } + } + }, + "schemaUrl": { + "type": "text", + "fields": { + "keyword": { + "type": "keyword", + "ignore_above": 256 + } + } + } + } + }, + "settings": { + "index": { + "mapping": { + "total_fields": { + "limit": 10000 + } + }, + "refresh_interval": "5s" + } + } + }, + "version": 1, + "_meta": { + "description": "Observability Metrics Mapping Template" + } +} \ No newline at end of file diff --git a/schema/metrics/metrics.schema b/schema/metrics/metrics.schema new file mode 100644 index 000000000..59832a766 --- /dev/null +++ b/schema/metrics/metrics.schema @@ -0,0 +1,270 @@ +{ + "$schema": "http://json-schema.org/draft-07/schema#", + "$id": "https://opensearch.org/schemas/Metrics", + "title": "OpenTelemetry Metrics", + "type": "object", + "properties": { + "name": { + "type": "string" + }, + "attributes": { + "$ref": "/schemas/Attributes" + }, + "resource": { + "type": "object" + }, + "description": { + "type": "string" + }, + "unit": { + "type": "string" + }, + "kind": { + "type": "string", + "enum": [ + "COUNTER", + "SUM", + "GAUGE", + "HISTOGRAM", + "EXPONENTIAL_HISTOGRAM" + ] + }, + "aggregationTemporality": { + "type": "string", + "enum": [ + "AGGREGATION_TEMPORALITY_UNSPECIFIED", + "AGGREGATION_TEMPORALITY_DELTA", + "AGGREGATION_TEMPORALITY_CUMULATIVE" + ] + }, + "monotonic": { + "type": "boolean" + }, + "startTime": { + "type": "string", + "format": "date-time" + }, + "@timestamp": { + "type": "string", + "format": "date-time" + }, + "observedTimestamp": { + "type": "string", + "format": "date-time" + }, + "value.int": { + "type": "integer" + }, + "value.double": { + "type": "number" + }, + "buckets": { + "type": "array", + "items": { + "$ref": "/schemas/Bucket" + } + }, + "bucketCount": { + "type": "integer" + }, + "bucketCountsList": { + "type": "array", + "items": { + "type": "number" + } + }, + "explicitBoundsCount": { + "type": "integer" + }, + "explicitBoundsList": { + "type": "array", + "items": { + "type": "number" + } + }, + "quantiles": { + "type": "array", + "items": { + "$ref": "/schemas/Quantile" + } + }, + "quantileValuesCount": { + "type": "number" + }, + "positiveBuckets": { + "type": "array", + "items": { + "$ref": "/schemas/Bucket" + } + }, + "negativeBuckets": { + "type": "array", + "items": { + "$ref": "/schemas/Bucket" + } + }, + "positiveOffset": { + "type": "array", + "items": { + "type": "number" + } + }, + "negativeOffset": { + "type": "array", + "items": { + "type": "number" + } + }, + "zeroCount": { + "type": "number" + }, + "scale": { + "type": "integer" + }, + "max": { + "type": "number" + }, + "min": { + "type": "number" + }, + "sum": { + "type": "number" + }, + "count": { + "type": "number" + }, + "exemplars": { + "type": "array", + "items": { + "$ref": "/schemas/Exemplar" + } + }, + "instrumentationScope": { + "$ref": "/schemas/InstrumentationScope" + }, + "schemaUrl": { + "type": "string" + } + }, + "required": [ + "name", + "description", + "unit", + "kind", + "@timestamp" + ], + "$defs": { + "Bucket": { + "$id": "/schemas/Bucket", + "type": "object", + "additionalProperties": false, + "properties": { + "count": { + "type": "number" + }, + "sum": { + "type": "number" + }, + "max": { + "type": "number" + }, + "min": { + "type": "number" + } + }, + "required": [ + "count", + "max", + "min" + ], + "title": "Bucket" + }, + "Quantile": { + "$id": "/schemas/Quantile", + "type": "object", + "additionalProperties": false, + "properties": { + "quantile": { + "type": "number" + }, + "value": { + "type": "number" + } + }, + "required": [ + "quantile", + "value" + ], + "title": "Quantile" + }, + "InstrumentationScope": { + "$id": "/schemas/InstrumentationScope", + "type": "object", + "additionalProperties": true, + "properties": { + "name": { + "type": "string" + }, + "version": { + "type": "string" + }, + "schemaUrl": { + "type": "string" + }, + "droppedAttributesCount": { + "type": "integer" + } + }, + "title": "InstrumentationScope" + }, + "Exemplar": { + "$id": "/schemas/Exemplar", + "type": "object", + "additionalProperties": true, + "properties": { + "time": { + "type": "string", + "format": "date-time" + }, + "spanId": { + "type": "string" + }, + "traceId": { + "type": "string" + } + }, + "required": [ + "time" + ], + "title": "Exemplar" + }, + "Attributes": { + "$id": "/schemas/Attributes", + "type": "object", + "additionalProperties": true, + "properties": { + "data_stream": { + "$ref": "/schemas/Dataflow" + } + }, + "title": "Attributes" + }, + "Dataflow": { + "$id": "/schemas/Dataflow", + "type": "object", + "additionalProperties": true, + "properties": { + "type": { + "type": "string" + }, + "namespace": { + "type": "string" + }, + "dataset": { + "type": "string" + } + }, + "title": "Dataflow" + } + } +} \ No newline at end of file diff --git a/schema/metrics/samples/gauge.json b/schema/metrics/samples/gauge.json new file mode 100644 index 000000000..e8ec59070 --- /dev/null +++ b/schema/metrics/samples/gauge.json @@ -0,0 +1,52 @@ +{ + "unit": "ms", + "exemplars": [], + "kind": "GAUGE", + "name": "lastLatency", + "flags": 0, + "description": "The last API latency observed at collection interval", + "startTime": "2023-01-20T05:16:16.425669Z", + "@timestamp": "2023-01-20T05:16:16.425669Z", + "value.double": 0.0, + "resource": { + "cloud@account@id": "123367104812", + "process@pid": 1, + "host@arch": "amd64", + "host@id": "i-0005de88c8ebe7dbb", + "host@image@id": "ami-093d4bc1f6d4a890b", + "telemetry@sdk@version": "1.19.0", + "service@name": "AOCDockerDemoService", + "process@runtime@name": "OpenJDK Runtime Environment", + "os@type": "linux", + "cloud@availability_zone": "us-west-2b", + "host@type": "c5.2xlarge", + "cloud@provider": "aws", + "telemetry@sdk@language": "java", + "host@name": "ip-172-16-42-233.amazon.com", + "process@runtime@description": "Debian OpenJDK 64-Bit Server VM 17.0.4+8-Debian-1deb11u1", + "service@namespace": "AOCDockerDemo", + "cloud@region": "us-west-2", + "process@executable@path": "/usr/lib/jvm/java-17-openjdk-amd64/bin/java", + "process@command_line": "/usr/lib/jvm/java-17-openjdk-amd64/bin/java -javaagent:/aws-observability/classpath/aws-opentelemetry-agent-1.19.0-SNAPSHOT.jar", + "process@runtime@version": "17.0.4+8-Debian-1deb11u1", + "cloud@platform": "aws_ec2", + "telemetry@sdk@name": "opentelemetry", + "container@id": "71301ad845e7d082911d846ac9af3cd9ba4f2144d82d7ac0dfd51f335b256a61", + "telemetry@auto@version": "1.19.0-aws-SNAPSHOT", + "os@description": "Linux 5.4.225-139.416.amzn2int.x86_64" + }, + "attributes": { + "statusCode": "", + "apiName": "", + "serviceName": "AOCDockerDemoService" + }, + + "instrumentationScope": { + "version": "1.0", + "name": "aws-otel", + "schemaUrl": "https://opentelemetry.io/schemas/1.13.0", + "attributes": { + "identification": "aws-ec2" + } + } +} diff --git a/schema/metrics/samples/histogram.json b/schema/metrics/samples/histogram.json new file mode 100644 index 000000000..3814aedb9 --- /dev/null +++ b/schema/metrics/samples/histogram.json @@ -0,0 +1,65 @@ +{ + "max": 652094078, + "kind": "HISTOGRAM", + "buckets": [ + { + "min": 3.4028234663852886e+38, + "max": 0, + "count": 0 + }, + { + "min": 0, + "max": 10000000, + "count": 0 + }, + { + "min": 10000000, + "max": 50000000, + "count": 5 + }, + { + "min": 50000000, + "max": 100000000, + "count": 1 + }, + { + "min": 100000000, + "max": 3.4028234663852886e+38, + "count": 10 + } + ], + "count": 16, + "bucketCountsList": [ + 0, + 0, + 5, + 1, + 10 + ], + "description": "Histogram of durationInNanos in the events", + "sum": 3136355061, + "unit": "seconds", + "aggregationTemporality": "AGGREGATION_TEMPORALITY_DELTA", + "min": 44606914, + "bucketCounts": 5, + "name": "histogram", + "startTime": "2023-01-20T05:16:16.425669Z", + "explicitBoundsCount": 4, + "@timestamp": "2023-01-20T05:16:16.425669Z", + "explicitBoundsList": [ + 0, + 10000000, + 50000000, + 100000000 + ], + "attributes": { + "aggr_duration": 26709005000, + "serviceName": "AOCDockerDemoService", + "histogram_key": "durationInNanos", + "data_stream": { + "dataset": "histogram", + "namespace": "production", + "type": "metric" + } + } +} \ No newline at end of file diff --git a/schema/metrics/samples/load_samples.md b/schema/metrics/samples/load_samples.md new file mode 100644 index 000000000..96f1af7c6 --- /dev/null +++ b/schema/metrics/samples/load_samples.md @@ -0,0 +1,29 @@ +## Load samples +For loading the given samples run the next request once the Opensearch cluster including Observability plugin has started: + + +`PUT sso_metrics-default-namespace/_bulk` +```json +{ "create":{ } } +{"max":652094078,"kind":"HISTOGRAM","buckets":[{"min":3.4028234663852886e+38,"max":0,"count":0},{"min":0,"max":10000000,"count":0},{"min":10000000,"max":50000000,"count":5},{"min":50000000,"max":100000000,"count":1},{"min":100000000,"max":3.4028234663852886e+38,"count":10}],"count":16,"bucketCountsList":[0,0,5,1,10],"description":"Histogram of durationInNanos in the events","sum":3136355061,"unit":"seconds","aggregationTemporality":"AGGREGATION_TEMPORALITY_DELTA","min":44606914,"bucketCounts":5,"name":"histogram","startTime":"2023-01-20T05:16:16.425669Z","explicitBoundsCount":4,"@timestamp":"2023-01-20T05:16:16.425669Z","explicitBoundsList":[0,10000000,50000000,100000000],"attributes":{"aggr_duration":26709005000,"serviceName":"AOCDockerDemoService","histogram_key":"durationInNanos","data_stream":{"dataset":"histogram","namespace":"production","type":"metric"}}} +{ "create":{ } } +{"unit":"ms","exemplars":[],"kind":"GAUGE","name":"lastLatency","flags":0,"description":"The last API latency observed at collection interval","startTime":"2023-01-20T05:16:16.425669Z","@timestamp":"2023-01-20T05:16:16.425669Z","value.double":0,"resource":{"cloud@account@id":"123367104812","process@pid":1,"host@arch":"amd64","host@id":"i-0005de88c8ebe7dbb","host@image@id":"ami-093d4bc1f6d4a890b","telemetry@sdk@version":"1.19.0","service@name":"AOCDockerDemoService","process@runtime@name":"OpenJDK Runtime Environment","os@type":"linux","cloud@availability_zone":"us-west-2b","host@type":"c5.2xlarge","cloud@provider":"aws","telemetry@sdk@language":"java","host@name":"ip-172-16-42-233.amazon.com","process@runtime@description":"Debian OpenJDK 64-Bit Server VM 17.0.4+8-Debian-1deb11u1","service@namespace":"AOCDockerDemo","cloud@region":"us-west-2","process@executable@path":"/usr/lib/jvm/java-17-openjdk-amd64/bin/java","process@command_line":"/usr/lib/jvm/java-17-openjdk-amd64/bin/java -javaagent:/aws-observability/classpath/aws-opentelemetry-agent-1.19.0-SNAPSHOT.jar","process@runtime@version":"17.0.4+8-Debian-1deb11u1","cloud@platform":"aws_ec2","telemetry@sdk@name":"opentelemetry","container@id":"71301ad845e7d082911d846ac9af3cd9ba4f2144d82d7ac0dfd51f335b256a61","telemetry@auto@version":"1.19.0-aws-SNAPSHOT","os@description":"Linux 5.4.225-139.416.amzn2int.x86_64"},"attributes":{"statusCode":"","apiName":"","serviceName":"AOCDockerDemoService"},"instrumentationScope":{"version":"1.0","name":"aws-otel","schemaUrl":"https://opentelemetry.io/schemas/1.13.0","attributes":{"identification":"aws-ec2"}}} +{ "create":{ } } +{"kind":"SUM","flags":0,"description":"Queue Size change","monotonic":false,"unit":"one","aggregationTemporality":"AGGREGATION_TEMPORALITY_CUMULATIVE","exemplars":[],"name":"queueSizeChange","startTime":"2023-01-20T05:16:16.425669Z","@timestamp":"2023-01-20T05:16:16.425669Z","value.double":0,"resource":{"cloud@account@id":"123367104812","process@pid":1,"host@arch":"amd64","host@id":"i-0005de88c8ebe7dbb","host@image@id":"ami-093d4bc1f6d4a890b","telemetry@sdk@version":"1.19.0","service@name":"AOCDockerDemoService","process@runtime@name":"OpenJDK Runtime Environment","os@type":"linux","cloud@availability_zone":"us-west-2b","host@type":"c5.2xlarge","cloud@provider":"aws","telemetry@sdk@language":"java","host@name":"ip-172-16-42-233.amazon.com","process@runtime@description":"Debian OpenJDK 64-Bit Server VM 17.0.4+8-Debian-1deb11u1","service@namespace":"AOCDockerDemo","cloud@region":"us-west-2","process@executable@path":"/usr/lib/jvm/java-17-openjdk-amd64/bin/java","process@command_line":"/usr/lib/jvm/java-17-openjdk-amd64/bin/java -javaagent:/aws-observability/classpath/aws-opentelemetry-agent-1.19.0-SNAPSHOT.jar","process@runtime@version":"17.0.4+8-Debian-1deb11u1","cloud@platform":"aws_ec2","telemetry@sdk@name":"opentelemetry","container@id":"71301ad845e7d082911d846ac9af3cd9ba4f2144d82d7ac0dfd51f335b256a61","telemetry@auto@version":"1.19.0-aws-SNAPSHOT","os@description":"Linux 5.4.225-139.416.amzn2int.x86_64"},"instrumentationScope":{"version":"1.0","name":"aws-otel","schemaUrl":"https://opentelemetry.io/schemas/1.13.0","attributes":{"identification":"aws-ec2"}},"attributes":{"serviceName":"AOCDockerDemoService","statusCode":"","apiName":"","data_stream":{"dataset":"sum","namespace":"production","type":"metric"}}} +``` + +- Run the next query to get the Histogram type metrics: + +- `GET sso_metrics-default-namespace/_search` +```json +{ + "query":{ + "term": { + "kind":{ + "value":"HISTOGRAM" + } + } + } +} + +``` \ No newline at end of file diff --git a/schema/metrics/samples/sum.json b/schema/metrics/samples/sum.json new file mode 100644 index 000000000..35e319b56 --- /dev/null +++ b/schema/metrics/samples/sum.json @@ -0,0 +1,59 @@ +{ + "kind": "SUM", + "flags": 0, + "description": "Queue Size change", + "monotonic": false, + "unit": "one", + "aggregationTemporality": "AGGREGATION_TEMPORALITY_CUMULATIVE", + "exemplars": [], + "name": "queueSizeChange", + "startTime": "2023-01-20T05:16:16.425669Z", + "@timestamp": "2023-01-20T05:16:16.425669Z", + "value.double": 0.0, + "resource": { + "cloud@account@id": "123367104812", + "process@pid": 1, + "host@arch": "amd64", + "host@id": "i-0005de88c8ebe7dbb", + "host@image@id": "ami-093d4bc1f6d4a890b", + "telemetry@sdk@version": "1.19.0", + "service@name": "AOCDockerDemoService", + "process@runtime@name": "OpenJDK Runtime Environment", + "os@type": "linux", + "cloud@availability_zone": "us-west-2b", + "host@type": "c5.2xlarge", + "cloud@provider": "aws", + "telemetry@sdk@language": "java", + "host@name": "ip-172-16-42-233.amazon.com", + "process@runtime@description": "Debian OpenJDK 64-Bit Server VM 17.0.4+8-Debian-1deb11u1", + "service@namespace": "AOCDockerDemo", + "cloud@region": "us-west-2", + "process@executable@path": "/usr/lib/jvm/java-17-openjdk-amd64/bin/java", + "process@command_line": "/usr/lib/jvm/java-17-openjdk-amd64/bin/java -javaagent:/aws-observability/classpath/aws-opentelemetry-agent-1.19.0-SNAPSHOT.jar", + "process@runtime@version": "17.0.4+8-Debian-1deb11u1", + "cloud@platform": "aws_ec2", + "telemetry@sdk@name": "opentelemetry", + "container@id": "71301ad845e7d082911d846ac9af3cd9ba4f2144d82d7ac0dfd51f335b256a61", + "telemetry@auto@version": "1.19.0-aws-SNAPSHOT", + "os@description": "Linux 5.4.225-139.416.amzn2int.x86_64" + }, + "instrumentationScope": { + "version": "1.0", + "name": "aws-otel", + "schemaUrl": "https://opentelemetry.io/schemas/1.13.0", + "attributes": { + "identification": "aws-ec2" + } + }, + + "attributes": { + "serviceName": "AOCDockerDemoService", + "statusCode": "", + "apiName": "", + "data_stream": { + "dataset": "sum", + "namespace": "production", + "type": "metric" + } + } +} diff --git a/schema/traces/README.md b/schema/traces/README.md new file mode 100644 index 000000000..dcf16adcb --- /dev/null +++ b/schema/traces/README.md @@ -0,0 +1,153 @@ +# Traces Schema Support +Observability in the software industry is the ability to monitor and diagnose systems and applications in real-time, in order to understand how they are behaving and identify potential issues. +Traces are a critical component of observability, providing detailed information about the flow of requests through a system, including timing information and any relevant contextual data. + +The importance of supporting traces schema lies in the fact that it enables better analysis and understanding of system behavior. +A structured schema provides a clear, consistent format for traces, making it easier for observability tools to process and aggregate the data. +This in turn makes it easier for engineers to understand the performance and behavior of their systems, and quickly identify potential issues. + +When traces are unstructured, it can be difficult for observability tools to extract meaningful information from them - For example, if the timing information for a particular request is not consistently represented in the same format, +it can be difficult to compare and analyze performance data over time. Similarly, if contextual data is not consistently recorded, it can be difficult to understand the context in which a particular request was executed. + +With a structured schema in place, observability tools can automatically extract and aggregate data, making it easier to understand system behavior at a high level. +This can help teams quickly identify performance bottlenecks, track the root cause of errors, and resolve issues more efficiently. + +## Details +The next section provides the Simple Schema for Observability support which conforms with the OTEL specification. + +- traces.mapping presents the template mapping for creating the Simple Schema for Observability index +- traces.schema presents the json schema validation for verification of a trace document conforms to the mapping structure + +### data-stream +[data-stream](https://opensearch.org/docs/latest/opensearch/data-streams/) Data streams simplify this process and enforce a setup that best suits time-series data, such as being designed primarily for append-only data and ensuring that each document has a timestamp field. +A data stream is internally composed of multiple backing indices. Search requests are routed to all the backing indices, while indexing requests are routed to the latest write index. + +As part of the Observability naming scheme, the value of the data stream fields combine to the name of the actual data stream : + +`{data_stream.type}-{data_stream.dataset}-{data_stream.namespace}`. +This means the fields can only contain characters that are valid as part of names of data streams. + +- **type** conforms to one of the supported Observability signals (Traces, Logs, Metrics, Alerts) +- **dataset** user defined field that can mainly be utilized for describing the origin of the signal +- **namespace** user custom field that can be used to describe any customer domain specific classification + +#### Timestamp field +As part of the data-stream definition the `@timestamp` is mandatory, if the field is not present to begin with use `ObservedTimestamp` as value for this field +**Note** - `@timestamp` value is the actual signal happening time and `observedTimestamp` is the time the exporter reads the actual event record. + +### Instrumentation scope +This is a logical unit of the application with which the emitted telemetry can be associated. It is typically the developer’s choice to decide what denotes a reasonable instrumentation scope. +The most common approach is to use the instrumentation library as the scope, however other scopes are also common, e.g. a module, a package, or a class can be chosen as the instrumentation scope. + +The instrumentation scope may have zero or more additional attributes that provide additional information about the scope. As an example the field +`instrumentationScope.attributes.identification` is presented will be used to determine the resource origin of the signal and can be used to filter accordingly + +## Traces +see [OTEL traces convention](https://github.com/open-telemetry/opentelemetry-specification/tree/main/semantic_conventions/trace) + +Traces are defined implicitly by their Spans - In particular, a Trace can be thought of as a directed acyclic graph (DAG) of Spans, where the edges between Spans are defined as parent/child relationship. + +## Spans +A span represents an operation within a transaction. Each Span encapsulates the following state: +Observability in the software industry is the ability to monitor and diagnose systems and applications in real-time, in order to understand how they are behaving and identify potential issues. +Traces are a critical component of observability, providing detailed information about the flow of requests through a system, including timing information and any relevant contextual data. + +The importance of supporting traces schema lies in the fact that it enables better analysis and understanding of system behavior. +A structured schema provides a clear, consistent format for traces, making it easier for observability tools to process and aggregate the data. +This in turn makes it easier for engineers to understand the performance and behavior of their systems, and quickly identify potential issues. + +When traces are unstructured, it can be difficult for observability tools to extract meaningful information from them - For example, if the timing information for a particular request is not consistently represented in the same format, +it can be difficult to compare and analyze performance data over time. Similarly, if contextual data is not consistently recorded, it can be difficult to understand the context in which a particular request was executed. + +With a structured schema in place, observability tools can automatically extract and aggregate data, making it easier to understand system behavior at a high level. +This can help teams quickly identify performance bottlenecks, track the root cause of errors, and resolve issues more efficiently. + +## Details +The next section provides the Simple Schema for Observability support which conforms with the OTEL specification. + +- traces.mapping presents the template mapping for creating the Simple Schema for Observability index +- traces.schema presents the json schema validation for verification of a trace document conforms to the mapping structure + +### data-stream +[data-stream](https://opensearch.org/docs/latest/opensearch/data-streams/) Data streams simplify this process and enforce a setup that best suits time-series data, such as being designed primarily for append-only data and ensuring that each document has a timestamp field. +A data stream is internally composed of multiple backing indices. Search requests are routed to all the backing indices, while indexing requests are routed to the latest write index. + +As part of the Observability naming scheme, the value of the data stream fields combine to the name of the actual data stream : + +`{data_stream.type}-{data_stream.dataset}-{data_stream.namespace}`. +This means the fields can only contain characters that are valid as part of names of data streams. + +- **type** conforms to one of the supported Observability signals (Traces, Logs, Metrics, Alerts) +- **dataset** user defined field that can mainly be utilized for describing the origin of the signal +- **namespace** user custom field that can be used to describe any customer domain specific classification + +#### Timestamp field +As part of the data-stream definition the `@timestamp` is mandatory, if the field is not present to begin with use `ObservedTimestamp` as value for this field +**Note** - `@timestamp` value is the actual signal happening time and `observedTimestamp` is the time the exporter reads the actual event record. + +### Instrumentation scope +This is a logical unit of the application with which the emitted telemetry can be associated. It is typically the developer’s choice to decide what denotes a reasonable instrumentation scope. +The most common approach is to use the instrumentation library as the scope, however other scopes are also common, e.g. a module, a package, or a class can be chosen as the instrumentation scope. + +The instrumentation scope may have zero or more additional attributes that provide additional information about the scope. As an example the field +`instrumentationScope.attributes.identification` is presented will be used to determine the resource origin of the signal and can be used to filter accordingly + +## Traces +see [OTEL traces convention](https://github.com/open-telemetry/opentelemetry-specification/tree/main/semantic_conventions/trace) + +Traces are defined implicitly by their Spans - In particular, a Trace can be thought of as a directed acyclic graph (DAG) of Spans, where the edges between Spans are defined as parent/child relationship. + +## Spans +A span represents an operation within a transaction. Each Span encapsulates the following state: + +* An operation name +* start and finish timestamp +* Attributes list of key-value pairs. +* Set of Events, each of which is itself a tuple (timestamp, name, Attributes) +* Parent's Span identifier. +* Links to causally-related Spans (via the SpanContext of those related Spans). +* SpanContext information required to reference a Span. + +### SpanContext +Represents all the information that identifies Span in the Trace and is propagated to child Spans and across process boundaries. +A **SpanContext** contains the tracing identifiers and the options that are propagated from parent to child Spans. + +* `TraceId` - It is worldwide unique with practically sufficient probability by being made as 16 randomly generated bytes - used to group all spans for a specific trace together across all processes. +* `SpanId` - It is the identifier for a span, globally unique with practically sufficient probability by being made as 8 randomly generated bytes. When passed to a child Span this identifier becomes the parent span id for the child Span. +* `Tracestate` - carries tracing-system specific context in a list of key value pairs . Trace-state allows different vendors propagate additional information and inter-operate with their legacy Id formats. For more details see this. + +Additional fields can be supported via the Attributes key/value store see [traces](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/semantic_conventions/README.md) + +### Structure +The default fields that are supported by the traces are +- **TraceId** : It is worldwide unique with practically sufficient probability by being made as 16 randomly generated bytes - used to group all spans for a specific trace together across all processes. +- **SpanId** : It is the identifier for a span, globally unique with practically sufficient probability by being made as 8 randomly generated bytes. When passed to a child Span this identifier becomes the parent span id for the child Span. +- **ParentId** : It is the identifier for a span's parent span. +- **TraceState** : carries tracing-system specific context in a list of key value pairs. Tracestate allows different vendors propagate additional information and inter-operate with their legacy Id formats. + +- **Name** : String representing the span's name +- **Kind** + - SpanKind.CLIENT + - SpanKind.SERVER + - SpanKind.CONSUMER + - SpanKind.PRODUCER + - SpanKind.INTERNAL + +- **StartTime** : Start time of the event +- **EndTime** : End time of the event +- **Attributes** + - An Attribute is a key-value pair, which has the following structure [Attributes](https://github.com/open-telemetry/opentelemetry-specification/blob/b00980832b4b823155001df56dbf9203d4e53f98/specification/common/README.md#attribute) + +- **DroppedAttributesCount** : Integer counting the dropped attributes +- **Events** : A set of the next tuples (timestamp, name, Attributes) +- **DroppedEventsCount** : Integer counting the dropped events +- **Links** : links to causally-related Spans +- **DroppedLinksCount** : Integer counting the dropped links +- **Status** - + + _status code is the int value + status message is the text representation_ + + - `UNSET = 0` : The default status. + - `OK = 1` : The operation has been validated by an Application developer or Operator to have completed successfully. + - `ERROR = 2` : The operation contains an error. \ No newline at end of file diff --git a/schema/traces/samples/load_samples.md b/schema/traces/samples/load_samples.md new file mode 100644 index 000000000..9536522c3 --- /dev/null +++ b/schema/traces/samples/load_samples.md @@ -0,0 +1,28 @@ +## Load samples +For loading the given samples run the next request once the Opensearch cluster including Observability plugin has started: + + +`PUT sso_traces-default-namespace/_bulk` +```json +{ "create":{ } } +{"traceId":"4fa04f117be100f476b175e41096e736","spanId":"e275ac9d21929e9b","traceState":[],"parentSpanId":"","name":"client_checkout","kind":"INTERNAL","@timestamp":"2021-11-13T20:20:39+00:00","endTime":"2021-11-14T20:10:41+00:00","droppedAttributesCount":0,"droppedEventsCount":0,"droppedLinksCount":0,"resource":{"telemetry@sdk@name":"opentelemetry","telemetry@sdk@language":"python","telemetry@sdk@version":"0.14b0","service@name":"frontend-client","host@hostname":"ip-172-31-10-8.us-west-2.compute.internal"},"status":{"code":0}} +{ "create":{ } } +{"traceId":"15d30e4d211d79e10fcaeab97015c90d","spanId":"5bcca8ba513bb54a","traceState":[],"parentSpanId":"","name":"mysql","kind":"CLIENT","@timestamp":"2021-11-13T20:20:39+00:00","endTime":"2021-11-14T20:10:41+00:00","events":[{"@timestamp":"2021-03-25T17:21:03.044+00:00","name":"exception","attributes":{"exception@message":"1050 %2842S01%29: Table %27User_Carts%27 already exists","exception@type":"ProgrammingError","exception@stacktrace":"Traceback %28most recent call last :File /usr/lib/python3.6/site-packages/opentelemetry/sdk/trace/__init__.py, line 804, in use_span yield spanFile /usr/lib/python3.6/site-packages/opentelemetry/instrumentation/dbapi/__init__.py, line 354, in traced_executionraise exFile /usr/lib/python3.6/site-packages/opentelemetry/instrumentation/dbapi/__init__.py, line 345, in traced_executionresult = query_method%28%2Aargs, %2A%2Akwargs%29File /usr/lib/python3.6/site-packages/mysql/connector/cursor.py"},"droppedAttributesCount":0}],"links":[],"droppedAttributesCount":0,"droppedEventsCount":0,"droppedLinksCount":0,"status":{"message":"1050 %2842S01%29: Table %27User_Carts%27 already exists","code":2},"attributes":{"data_stream":{"type":"span","dataset":"mysql"},"component":"mysql","db@user":"root","net@peer@name":"localhost","db@type":"sql","net@peer@port":3306,"db@instance":"","db@statement":"CREATE TABLE `User_Carts` %28 `ItemId` varchar%2816%29 NOT NULL, `TotalQty` int%2811%29 NOT NULL, PRIMARY KEY %28`ItemId`%29%29 ENGINE=InnoDB"},"resource":{"telemetry@sdk@language":"python","service@name":"database","telemetry@sdk@version":"0.14b0","service@instance@id":"140307275923408","telemetry@sdk@name":"opentelemetry","host@hostname":"ip-172-31-10-8.us-west-2.compute.internal"}} +{ "create":{ } } +{"traceId":"c1d985bd02e1dbb85b444011f19a1ecc","spanId":"55a698828fe06a42","traceState":[],"parentSpanId":"","name":"mysql","kind":"CLIENT","@timestamp":"2021-11-13T20:20:39+00:00","endTime":"2021-11-14T20:10:41+00:00","events":[{"@timestamp":"2021-03-25T17:21:03+00:00","name":"exception","attributes":{"exception@message":"1050 %2842S01%29: Table Inventory_Items already exists","exception@type":"ProgrammingError","exception@stacktrace":"Traceback most recent call last"},"droppedAttributesCount":0}],"links":[{"traceId":"c1d985bd02e1dbb85b444011f19a1ecc","spanId":"55a698828fe06a42w2","traceState":[],"attributes":{"db@user":"root","net@peer@name":"localhost","component":"mysql","db@type":"sql","net@peer@port":3306,"db@instance":"","db@statement":"CREATE TABLE `Inventory_Items` %28 `ItemId` varchar%2816%29 NOT NULL, `TotalQty` int%2811%29 NOT NULL, PRIMARY KEY %28`ItemId`%29%29 ENGINE=InnoDB"},"droppedAttributesCount":0}],"droppedAttributesCount":0,"droppedEventsCount":0,"droppedLinksCount":0,"resource":{"telemetry@sdk@language":"python","telemetry@sdk@version":"0.14b0","service@instance@id":"140307275923408","service@name":"database","telemetry@sdk@name":"opentelemetry","host@hostname":"ip-172-31-10-8.us-west-2.compute.internal"},"status":{"code":2,"message":"1050 %2842S01%29: Table %27Inventory_Items%27 already exists"},"attributes":{"data_stream":{"type":"span","namespace":"exceptions","dataset":"mysql"},"db@user":"root","net@peer@name":"localhost","component":"mysql","db@type":"sql","net@peer@port":3306,"db@instance":"","db@statement":"CREATE TABLE `Inventory_Items` %28 `ItemId` varchar%2816%29 NOT NULL, `TotalQty` int%2811%29 NOT NULL, PRIMARY KEY %28`ItemId`%29%29 ENGINE=InnoDB"}} +``` + +Run the next query to get the Spans kind CLIENT: + +- `GET sso_traces-default-namespace/_search` +```json +{ + "query":{ + "term": { + "kind":{ + "value":"CLIENT" + } + } + } +} +``` \ No newline at end of file diff --git a/schema/traces/samples/traceA.json b/schema/traces/samples/traceA.json new file mode 100644 index 000000000..451ae8249 --- /dev/null +++ b/schema/traces/samples/traceA.json @@ -0,0 +1,23 @@ +{ + "traceId": "4fa04f117be100f476b175e41096e736", + "spanId": "e275ac9d21929e9b", + "traceState": [], + "parentSpanId": "", + "name": "client_checkout", + "kind": "INTERNAL", + "@timestamp": "2021-11-13T20:20:39+00:00", + "endTime": "2021-11-14T20:10:41+00:00", + "droppedAttributesCount": 0, + "droppedEventsCount": 0, + "droppedLinksCount": 0, + "resource": { + "telemetry@sdk@name": "opentelemetry", + "telemetry@sdk@language": "python", + "telemetry@sdk@version": "0.14b0", + "service@name": "frontend-client", + "host@hostname": "ip-172-31-10-8.us-west-2.compute.internal" + }, + "status": { + "code": 0 + } +} \ No newline at end of file diff --git a/schema/traces/samples/traceB.json b/schema/traces/samples/traceB.json new file mode 100644 index 000000000..cf5ab4979 --- /dev/null +++ b/schema/traces/samples/traceB.json @@ -0,0 +1,51 @@ +{ + "traceId": "15d30e4d211d79e10fcaeab97015c90d", + "spanId": "5bcca8ba513bb54a", + "traceState": [], + "parentSpanId": "", + "name": "mysql", + "kind": "CLIENT", + "@timestamp": "2021-11-13T20:20:39+00:00", + "endTime": "2021-11-14T20:10:41+00:00", + "events": [ + { + "@timestamp": "2021-03-25T17:21:03.044+00:00", + "name": "exception", + "attributes": { + "exception@message": "1050 %2842S01%29: Table %27User_Carts%27 already exists", + "exception@type": "ProgrammingError", + "exception@stacktrace": "Traceback %28most recent call last :File /usr/lib/python3.6/site-packages/opentelemetry/sdk/trace/__init__.py, line 804, in use_span yield spanFile /usr/lib/python3.6/site-packages/opentelemetry/instrumentation/dbapi/__init__.py, line 354, in traced_executionraise exFile /usr/lib/python3.6/site-packages/opentelemetry/instrumentation/dbapi/__init__.py, line 345, in traced_executionresult = query_method%28%2Aargs, %2A%2Akwargs%29File /usr/lib/python3.6/site-packages/mysql/connector/cursor.py" + }, + "droppedAttributesCount": 0 + } + ], + "links": [], + "droppedAttributesCount": 0, + "droppedEventsCount": 0, + "droppedLinksCount": 0, + "status": { + "message": "1050 %2842S01%29: Table %27User_Carts%27 already exists", + "code": 2 + }, + "attributes": { + "data_stream": { + "type": "span", + "dataset": "mysql" + }, + "component": "mysql", + "db@user": "root", + "net@peer@name": "localhost", + "db@type": "sql", + "net@peer@port": 3306, + "db@instance": "", + "db@statement": "CREATE TABLE `User_Carts` %28 `ItemId` varchar%2816%29 NOT NULL, `TotalQty` int%2811%29 NOT NULL, PRIMARY KEY %28`ItemId`%29%29 ENGINE=InnoDB" + }, + "resource": { + "telemetry@sdk@language": "python", + "service@name": "database", + "telemetry@sdk@version": "0.14b0", + "service@instance@id": "140307275923408", + "telemetry@sdk@name": "opentelemetry", + "host@hostname": "ip-172-31-10-8.us-west-2.compute.internal" + } +} \ No newline at end of file diff --git a/schema/traces/samples/traceC.json b/schema/traces/samples/traceC.json new file mode 100644 index 000000000..7da4fd7a7 --- /dev/null +++ b/schema/traces/samples/traceC.json @@ -0,0 +1,68 @@ +{ + "traceId": "c1d985bd02e1dbb85b444011f19a1ecc", + "spanId": "55a698828fe06a42", + "traceState": [], + "parentSpanId": "", + "name": "mysql", + "kind": "CLIENT", + "@timestamp": "2021-11-13T20:20:39+00:00", + "endTime": "2021-11-14T20:10:41+00:00", + "events": [ + { + "@timestamp": "2021-03-25T17:21:03+00:00", + "name": "exception", + "attributes": { + "exception@message": "1050 %2842S01%29: Table Inventory_Items already exists", + "exception@type": "ProgrammingError", + "exception@stacktrace": "Traceback most recent call last" + }, + "droppedAttributesCount": 0 + } + ], + "links": [ + { + "traceId": "c1d985bd02e1dbb85b444011f19a1ecc", + "spanId": "55a698828fe06a42w2", + "traceState": [], + "attributes": { + "db@user": "root", + "net@peer@name": "localhost", + "component": "mysql", + "db@type": "sql", + "net@peer@port": 3306, + "db@instance": "", + "db@statement": "CREATE TABLE `Inventory_Items` %28 `ItemId` varchar%2816%29 NOT NULL, `TotalQty` int%2811%29 NOT NULL, PRIMARY KEY %28`ItemId`%29%29 ENGINE=InnoDB" + }, + "droppedAttributesCount": 0 + } + ], + "droppedAttributesCount": 0, + "droppedEventsCount": 0, + "droppedLinksCount": 0, + "resource": { + "telemetry@sdk@language": "python", + "telemetry@sdk@version": "0.14b0", + "service@instance@id": "140307275923408", + "service@name": "database", + "telemetry@sdk@name": "opentelemetry", + "host@hostname": "ip-172-31-10-8.us-west-2.compute.internal" + }, + "status": { + "code": 2, + "message": "1050 %2842S01%29: Table %27Inventory_Items%27 already exists" + }, + "attributes": { + "data_stream": { + "type": "span", + "namespace": "exceptions", + "dataset": "mysql" + }, + "db@user": "root", + "net@peer@name": "localhost", + "component": "mysql", + "db@type": "sql", + "net@peer@port": 3306, + "db@instance": "", + "db@statement": "CREATE TABLE `Inventory_Items` %28 `ItemId` varchar%2816%29 NOT NULL, `TotalQty` int%2811%29 NOT NULL, PRIMARY KEY %28`ItemId`%29%29 ENGINE=InnoDB" + } +} diff --git a/schema/traces/traces-mapping.json b/schema/traces/traces-mapping.json new file mode 100644 index 000000000..4e8b26dbe --- /dev/null +++ b/schema/traces/traces-mapping.json @@ -0,0 +1,197 @@ +{ + "index_patterns": [ + "sso_traces-*-*" + ], + "data_stream": {}, + "template": { + "mappings": { + "_meta": { + "version": "0.1.0-dev" + }, + "dynamic_templates": [ + { + "attributes_map": { + "mapping": { + "type": "keyword" + }, + "path_match": "attributes.*" + } + }, + { + "events_attributes_map": { + "mapping": { + "type": "keyword" + }, + "path_match": "events.attributes.*" + } + }, + { + "links_attributes_map": { + "mapping": { + "type": "keyword" + }, + "path_match": "links.attributes.*" + } + }, + { + "instrumentation_scope_attributes_map": { + "mapping": { + "type": "keyword" + }, + "path_match": "instrumentationScope.attributes.*" + } + }, + { + "resources_map": { + "mapping": { + "type": "keyword" + }, + "path_match": "resource.*" + } + } + ], + "_source": { + "enabled": true + }, + "properties": { + "traceId": { + "ignore_above": 256, + "type": "keyword" + }, + "spanId": { + "ignore_above": 256, + "type": "keyword" + }, + "traceState": { + "type": "text", + "fields": { + "keyword": { + "type": "keyword", + "ignore_above": 256 + } + } + }, + "parentSpanId": { + "ignore_above": 256, + "type": "keyword" + }, + "name": { + "ignore_above": 1024, + "type": "keyword" + }, + "kind": { + "ignore_above": 128, + "type": "keyword" + }, + "startTime": { + "type": "date_nanos" + }, + "endTime": { + "type": "date_nanos" + }, + "droppedAttributesCount": { + "type": "long" + }, + "droppedEventsCount": { + "type": "long" + }, + "droppedLinksCount": { + "type": "long" + }, + "status": { + "properties": { + "code": { + "ignore_above": 128, + "type": "keyword" + }, + "message": { + "ignore_above": 128, + "type": "keyword" + } + } + }, + "attributes": { + "type": "object", + "properties": { + "data_stream": { + "properties": { + "dataset": { + "ignore_above": 128, + "type": "keyword" + }, + "namespace": { + "ignore_above": 128, + "type": "keyword" + }, + "type": { + "ignore_above": 56, + "type": "keyword" + } + } + } + } + }, + "events": { + "type": "nested", + "properties": { + "name": { + "ignore_above": 1024, + "type": "keyword" + }, + "@timestamp": { + "type": "date_nanos" + }, + "observedTimestamp": { + "type": "date_nanos" + } + } + }, + "links": { + "type": "nested", + "properties": { + "traceId": { + "ignore_above": 256, + "type": "keyword" + }, + "spanId": { + "ignore_above": 256, + "type": "keyword" + }, + "traceState": { + "type": "text", + "fields": { + "keyword": { + "type": "keyword", + "ignore_above": 256 + } + } + } + } + }, + "instrumentationScope": { + "properties": { + "name": { + "type": "keyword" + }, + "version": { + "type": "keyword" + }, + "droppedAttributesCount": { + "type": "integer" + }, + "schemaUrl": { + "type": "keyword" + } + } + }, + "schemaUrl": { + "type": "keyword" + } + } + } + }, + "version": 1, + "_meta": { + "description": "Observability Traces Mapping Template" + } +} \ No newline at end of file diff --git a/schema/traces/traces.schema b/schema/traces/traces.schema new file mode 100644 index 000000000..49d98b1fc --- /dev/null +++ b/schema/traces/traces.schema @@ -0,0 +1,249 @@ +{ + "$schema": "http://json-schema.org/draft-06/schema#", + "$id": "https://opensearch.org/schemas/Span", + "type": "object", + "additionalProperties": false, + "properties": { + "traceId": { + "type": "string" + }, + "spanId": { + "type": "string" + }, + "traceState": { + "type": "array", + "items": { + "$ref": "/schemas/KeyValue" + } + }, + "status": { + "type": "object", + "$ref": "/schemas/Status" + }, + "parentSpanId": { + "type": "string" + }, + "name": { + "type": "string" + }, + "kind": { + "type": "string", + "items": { + "type": "string", + "enum": [ + "SPAN_KIND_UNSPECIFIED", + "SPAN_KIND_INTERNAL", + "SPAN_KIND_SERVER", + "SPAN_KIND_CLIENT", + "SPAN_KIND_PRODUCER", + "SPAN_KIND_CONSUMER" + ] + } + }, + "startTime": { + "type": "string", + "format": "date-time" + }, + "endTime": { + "type": "string", + "format": "date-time" + }, + "resource": { + "type": "object" + }, + "attributes": { + "$ref": "/schemas/Attributes" + }, + "events": { + "type": "array", + "items": { + "$ref": "/schemas/Event" + } + }, + "links": { + "type": "array", + "items": { + "type": "object", + "additionalProperties": false, + "properties": { + "traceId": { + "type": "string" + }, + "spanId": { + "type": "string" + }, + "traceState": { + "type": "array", + "items": { + "type": "object" + } + }, + "attributes": { + "$ref": "/schemas/Attributes" + }, + "droppedAttributesCount": { + "type": "integer" + } + }, + "required": [ + "traceId", + "spanId", + "traceState" + ], + "title": "Links" + } + }, + "droppedAttributesCount": { + "type": "integer" + }, + "droppedEventsCount": { + "type": "integer" + }, + "droppedLinksCount": { + "type": "integer" + }, + "instrumentationScope": { + "$ref": "/schemas/InstrumentationScope" + }, + "schemaUrl": { + "type": "string" + } + }, + "required": [ + "traceId", + "spanId", + "startTime", + "endTime", + "kind", + "name", + "status" + ], + "$defs": { + "Status": { + "$id": "/schemas/Status", + "type": "object", + "additionalProperties": false, + "properties": { + "code": { + "type": "integer", + "items": { + "type": "integer", + "enum": [ + 0, + 1, + 2 + ] + } + }, + "message": { + "type": "string", + "items": { + "type": "string", + "enum": [ + "UNSET", + "OK", + "ERROR" + ] + } + } + }, + "required": [ + "code" + ], + "title": "Status" + }, + "Event": { + "$id": "/schemas/Event", + "type": "object", + "additionalProperties": false, + "properties": { + "@timestamp": { + "type": "string", + "format": "date-time" + }, + "observedTimestamp": { + "type": "string", + "format": "date-time" + }, + "name": { + "type": "string" + }, + "attributes": { + "$ref": "/schemas/Attributes" + }, + "droppedAttributesCount": { + "type": "integer" + } + }, + "required": [ + "attributes", + "droppedAttributesCount", + "name", + "@timestamp" + ], + "title": "Event" + }, + "Attributes": { + "$id": "/schemas/Attributes", + "type": "object", + "additionalProperties": true, + "properties": { + "data_stream": { + "$ref": "/schemas/Dataflow" + } + }, + "title": "Attributes" + }, + "KeyValue": { + "$id": "/schemas/KeyValue", + "type": "object", + "additionalProperties": true, + "properties": { + "key": { + "type": "string" + }, + "value": { + "type": "object" + } + }, + "title": "KeyValue" + }, + "InstrumentationScope": { + "$id": "/schemas/InstrumentationScope", + "type": "object", + "additionalProperties": true, + "properties": { + "name": { + "type": "string" + }, + "version": { + "type": "string" + }, + "schemaUrl": { + "type": "string" + }, + "droppedAttributesCount": { + "type": "integer" + } + }, + "title": "InstrumentationScope" + }, + "Dataflow": { + "$id": "/schemas/Dataflow", + "type": "object", + "additionalProperties": true, + "properties": { + "type": { + "type": "string" + }, + "namespace": { + "type": "string" + }, + "dataset": { + "type": "string" + } + }, + "title": "Attributes" + } + } +} diff --git a/scripts/build.sh b/scripts/build.sh new file mode 100755 index 000000000..4b2893f30 --- /dev/null +++ b/scripts/build.sh @@ -0,0 +1,82 @@ +#!/bin/bash + +# +# Copyright OpenSearch Contributors +# SPDX-License-Identifier: Apache-2.0 +# + +set -ex + +function usage() { + echo "Usage: $0 [args]" + echo "" + echo "Arguments:" + echo -e "-v VERSION\t[Required] OpenSearch version." + echo -e "-q QUALIFIER\t[Optional] Version qualifier." + echo -e "-s SNAPSHOT\t[Optional] Build a snapshot, default is 'false'." + echo -e "-p PLATFORM\t[Optional] Platform, ignored." + echo -e "-a ARCHITECTURE\t[Optional] Build architecture, ignored." + echo -e "-o OUTPUT\t[Optional] Output path, default is 'artifacts'." + echo -e "-h help" +} + +while getopts ":h:v:q:s:o:p:a:" arg; do + case $arg in + h) + usage + exit 1 + ;; + v) + VERSION=$OPTARG + ;; + q) + QUALIFIER=$OPTARG + ;; + s) + SNAPSHOT=$OPTARG + ;; + o) + OUTPUT=$OPTARG + ;; + p) + PLATFORM=$OPTARG + ;; + a) + ARCHITECTURE=$OPTARG + ;; + :) + echo "Error: -${OPTARG} requires an argument" + usage + exit 1 + ;; + ?) + echo "Invalid option: -${arg}" + exit 1 + ;; + esac +done + +if [ -z "$VERSION" ]; then + echo "Error: You must specify the OpenSearch version" + usage + exit 1 +fi + +[[ ! -z "$QUALIFIER" ]] && VERSION=$VERSION-$QUALIFIER +[[ "$SNAPSHOT" == "true" ]] && VERSION=$VERSION-SNAPSHOT +[ -z "$OUTPUT" ] && OUTPUT=artifacts + +mkdir -p $OUTPUT + +./gradlew assemble --no-daemon --refresh-dependencies -DskipTests=true -Dopensearch.version=$VERSION -Dbuild.snapshot=$SNAPSHOT -Dbuild.version_qualifier=$QUALIFIER + +zipPath=$(find . -path \*build/distributions/*.zip) +distributions="$(dirname "${zipPath}")" + +echo "COPY ${distributions}/*.zip" +mkdir -p $OUTPUT/plugins +cp ${distributions}/*.zip ./$OUTPUT/plugins + +./gradlew publishPluginZipPublicationToZipStagingRepository -Dopensearch.version=$VERSION -Dbuild.snapshot=$SNAPSHOT -Dbuild.version_qualifier=$QUALIFIER +mkdir -p $OUTPUT/maven/org/opensearch +cp -r ./build/local-staging-repo/org/opensearch/. $OUTPUT/maven/org/opensearch diff --git a/src/main/kotlin/org/opensearch/observability/ObservabilityPlugin.kt b/src/main/kotlin/org/opensearch/observability/ObservabilityPlugin.kt index 1fcae8a03..5a1e18f60 100644 --- a/src/main/kotlin/org/opensearch/observability/ObservabilityPlugin.kt +++ b/src/main/kotlin/org/opensearch/observability/ObservabilityPlugin.kt @@ -27,6 +27,8 @@ import org.opensearch.observability.action.DeleteObservabilityObjectAction import org.opensearch.observability.action.GetObservabilityObjectAction import org.opensearch.observability.action.UpdateObservabilityObjectAction import org.opensearch.observability.index.ObservabilityIndex +import org.opensearch.observability.index.ObservabilityMetricsIndex +import org.opensearch.observability.index.ObservabilityTracesIndex import org.opensearch.observability.resthandler.ObservabilityRestHandler import org.opensearch.observability.resthandler.ObservabilityStatsRestHandler import org.opensearch.observability.resthandler.SchedulerRestHandler @@ -34,6 +36,7 @@ import org.opensearch.observability.scheduler.ObservabilityJobParser import org.opensearch.observability.scheduler.ObservabilityJobRunner import org.opensearch.observability.settings.PluginSettings import org.opensearch.plugins.ActionPlugin +import org.opensearch.plugins.ClusterPlugin import org.opensearch.plugins.Plugin import org.opensearch.repositories.RepositoriesService import org.opensearch.rest.RestController @@ -47,7 +50,8 @@ import java.util.function.Supplier * Entry point of the OpenSearch Observability plugin. * This class initializes the rest handlers. */ -class ObservabilityPlugin : Plugin(), ActionPlugin, JobSchedulerExtension { +@Suppress("TooManyFunctions") +class ObservabilityPlugin : Plugin(), ActionPlugin, ClusterPlugin, JobSchedulerExtension { companion object { const val PLUGIN_NAME = "opensearch-observability" @@ -81,9 +85,17 @@ class ObservabilityPlugin : Plugin(), ActionPlugin, JobSchedulerExtension { ): Collection { PluginSettings.addSettingsUpdateConsumer(clusterService) ObservabilityIndex.initialize(client, clusterService) + ObservabilityMetricsIndex.initialize(client, clusterService) + ObservabilityTracesIndex.initialize(client, clusterService) return emptyList() } + override fun onNodeStarted() { + ObservabilityIndex.afterStart() + ObservabilityTracesIndex.afterStart() + ObservabilityMetricsIndex.afterStart() + } + /** * {@inheritDoc} */ diff --git a/src/main/kotlin/org/opensearch/observability/index/ObservabilityIndex.kt b/src/main/kotlin/org/opensearch/observability/index/ObservabilityIndex.kt index 3363715f8..f287d16fc 100644 --- a/src/main/kotlin/org/opensearch/observability/index/ObservabilityIndex.kt +++ b/src/main/kotlin/org/opensearch/observability/index/ObservabilityIndex.kt @@ -20,6 +20,7 @@ import org.opensearch.action.search.SearchRequest import org.opensearch.action.update.UpdateRequest import org.opensearch.client.Client import org.opensearch.cluster.service.ClusterService +import org.opensearch.common.component.LifecycleListener import org.opensearch.common.unit.TimeValue import org.opensearch.common.xcontent.LoggingDeprecationHandler import org.opensearch.common.xcontent.NamedXContentRegistry @@ -48,7 +49,7 @@ import java.util.concurrent.TimeUnit * Class for doing OpenSearch index operation to maintain observability objects in cluster. */ @Suppress("TooManyFunctions") -internal object ObservabilityIndex { +internal object ObservabilityIndex : LifecycleListener() { private val log by logger(ObservabilityIndex::class.java) private const val INDEX_NAME = ".opensearch-observability" private const val NOTEBOOKS_INDEX_NAME = ".opensearch-notebooks" @@ -82,6 +83,14 @@ internal object ObservabilityIndex { this.mappingsUpdated = false } + /** + * once lifecycle indicate start has occurred - instantiating system index creation + */ + override fun afterStart() { + // create default index + createIndex() + } + /** * Create index using the mapping and settings defined in resource */ diff --git a/src/main/kotlin/org/opensearch/observability/index/ObservabilityMetricsIndex.kt b/src/main/kotlin/org/opensearch/observability/index/ObservabilityMetricsIndex.kt new file mode 100644 index 000000000..f3a30e6ac --- /dev/null +++ b/src/main/kotlin/org/opensearch/observability/index/ObservabilityMetricsIndex.kt @@ -0,0 +1,122 @@ +/* + * Copyright OpenSearch Contributors + * SPDX-License-Identifier: Apache-2.0 + */ +package org.opensearch.observability.index + +import org.opensearch.ResourceAlreadyExistsException +import org.opensearch.ResourceNotFoundException +import org.opensearch.action.admin.indices.template.get.GetIndexTemplatesRequest +import org.opensearch.action.admin.indices.template.put.PutComposableIndexTemplateAction +import org.opensearch.client.Client +import org.opensearch.cluster.metadata.ComposableIndexTemplate +import org.opensearch.cluster.metadata.Template +import org.opensearch.cluster.service.ClusterService +import org.opensearch.common.component.LifecycleListener +import org.opensearch.common.compress.CompressedXContent +import org.opensearch.common.settings.Settings +import org.opensearch.observability.ObservabilityPlugin.Companion.LOG_PREFIX +import org.opensearch.observability.settings.PluginSettings +import org.opensearch.observability.util.SecureIndexClient +import org.opensearch.observability.util.logger +import java.util.* + +/** + * Class for doing OpenSearch Metrics schema mapping & default index init operation + */ +internal object ObservabilityMetricsIndex : LifecycleListener() { + private val log by logger(ObservabilityMetricsIndex::class.java) + private const val METRICS_MAPPING_TEMPLATE_NAME = "sso_metric_template" + private const val METRICS_MAPPING_TEMPLATE_FILE = "metrics-mapping-template.json" + private const val METRIC_PATTERN_NAME = "sso_metrics-*-*" + + private lateinit var client: Client + private lateinit var clusterService: ClusterService + + /** + * Initialize the class + * @param client The OpenSearch client + * @param clusterService The OpenSearch cluster service + */ + fun initialize(client: Client, clusterService: ClusterService): ObservabilityMetricsIndex { + this.client = SecureIndexClient(client) + this.clusterService = clusterService + return this + } + + /** + * once lifecycle indicate start has occurred - instantiating the mapping template + */ + override fun afterStart() { + // create default mapping + createMappingTemplate() + } + + /** + * Create the pre-defined mapping template + */ + @Suppress("TooGenericExceptionCaught", "MagicNumber") + private fun createMappingTemplate() { + log.info("$LOG_PREFIX:createMappingTemplate $METRICS_MAPPING_TEMPLATE_NAME API called") + if (!isTemplateExists(METRICS_MAPPING_TEMPLATE_NAME)) { + val classLoader = ObservabilityMetricsIndex::class.java.classLoader + val indexMappingSource = classLoader.getResource(METRICS_MAPPING_TEMPLATE_FILE)?.readText()!! + val settings = Settings.builder() + .put("index.number_of_shards", 3) + .put("index.auto_expand_replicas", "0-2") + .build() + val template = Template(settings, CompressedXContent(indexMappingSource), null) + val request = PutComposableIndexTemplateAction.Request(METRICS_MAPPING_TEMPLATE_NAME) + .indexTemplate( + ComposableIndexTemplate( + listOf(METRIC_PATTERN_NAME), + template, + Collections.emptyList(), + 1, + 1, + Collections.singletonMap("description", "Observability Metrics Mapping Template") as Map?, + ComposableIndexTemplate.DataStreamTemplate() + ) + ) + try { + val validationException = request.validateIndexTemplate(null) + if (validationException != null && !validationException.validationErrors().isEmpty()) { + error("$LOG_PREFIX:Index Template $METRICS_MAPPING_TEMPLATE_NAME validation errors ${validationException.message}") + } + val actionFuture = client.admin().indices().execute(PutComposableIndexTemplateAction.INSTANCE, request) + val response = actionFuture.actionGet(PluginSettings.operationTimeoutMs) + if (response.isAcknowledged) { + log.info("$LOG_PREFIX:Mapping Template $METRICS_MAPPING_TEMPLATE_NAME creation Acknowledged") + } else { + error("$LOG_PREFIX:Mapping Template $METRICS_MAPPING_TEMPLATE_NAME creation not Acknowledged") + } + } catch (exception: ResourceAlreadyExistsException) { + log.warn("message: ${exception.message}") + } catch (exception: Exception) { + if (exception.cause !is ResourceAlreadyExistsException) { + throw exception + } + } + } + } + + /** + * Check if the mapping template is created and available. + * @param template name + * @return true if template is available, false otherwise + */ + @Suppress("TooGenericExceptionCaught", "SwallowedException", "RethrowCaughtException") + private fun isTemplateExists(template: String): Boolean { + try { + val indices = client.admin().indices() + val response = indices.getTemplates(GetIndexTemplatesRequest(template)).get() + return response.indexTemplates.isNotEmpty() + } catch (exception: ResourceNotFoundException) { + return false + } catch (exception: ResourceAlreadyExistsException) { + return true + } catch (exception: Exception) { + throw exception + } + } +} diff --git a/src/main/kotlin/org/opensearch/observability/index/ObservabilityTracesIndex.kt b/src/main/kotlin/org/opensearch/observability/index/ObservabilityTracesIndex.kt new file mode 100644 index 000000000..62d39be22 --- /dev/null +++ b/src/main/kotlin/org/opensearch/observability/index/ObservabilityTracesIndex.kt @@ -0,0 +1,122 @@ +/* + * Copyright OpenSearch Contributors + * SPDX-License-Identifier: Apache-2.0 + */ +package org.opensearch.observability.index + +import org.opensearch.ResourceAlreadyExistsException +import org.opensearch.ResourceNotFoundException +import org.opensearch.action.admin.indices.template.get.GetIndexTemplatesRequest +import org.opensearch.action.admin.indices.template.put.PutComposableIndexTemplateAction +import org.opensearch.client.Client +import org.opensearch.cluster.metadata.ComposableIndexTemplate +import org.opensearch.cluster.metadata.Template +import org.opensearch.cluster.service.ClusterService +import org.opensearch.common.component.LifecycleListener +import org.opensearch.common.compress.CompressedXContent +import org.opensearch.common.settings.Settings +import org.opensearch.observability.ObservabilityPlugin.Companion.LOG_PREFIX +import org.opensearch.observability.settings.PluginSettings +import org.opensearch.observability.util.SecureIndexClient +import org.opensearch.observability.util.logger +import java.util.* + +/** + * Class for doing OpenSearch Traces schema mapping & default index init operation + */ +internal object ObservabilityTracesIndex : LifecycleListener() { + private val log by logger(ObservabilityTracesIndex::class.java) + private const val TRACES_MAPPING_TEMPLATE_NAME = "sso_trace_template" + private const val TRACES_MAPPING_TEMPLATE_FILE = "traces-mapping-template.json" + private const val TRACES_PATTERN_NAME = "sso_traces-*-*" + + private lateinit var client: Client + private lateinit var clusterService: ClusterService + + /** + * Initialize the class + * @param client The OpenSearch client + * @param clusterService The OpenSearch cluster service + */ + fun initialize(client: Client, clusterService: ClusterService): ObservabilityTracesIndex { + this.client = SecureIndexClient(client) + this.clusterService = clusterService + return this + } + + /** + * once lifecycle indicate start has occurred - instantiating the mapping template + */ + override fun afterStart() { + // create default mapping + createMappingTemplate() + } + + /** + * Create the pre-defined mapping template + */ + @Suppress("TooGenericExceptionCaught", "MagicNumber") + private fun createMappingTemplate() { + log.info("$LOG_PREFIX:createMappingTemplate $TRACES_MAPPING_TEMPLATE_NAME API called") + if (!isTemplateExists(TRACES_MAPPING_TEMPLATE_NAME)) { + val classLoader = ObservabilityTracesIndex::class.java.classLoader + val indexMappingSource = classLoader.getResource(TRACES_MAPPING_TEMPLATE_FILE)?.readText()!! + val settings = Settings.builder() + .put("index.number_of_shards", 3) + .put("index.auto_expand_replicas", "0-2") + .build() + val template = Template(settings, CompressedXContent(indexMappingSource), null) + val request = PutComposableIndexTemplateAction.Request(TRACES_MAPPING_TEMPLATE_NAME) + .indexTemplate( + ComposableIndexTemplate( + listOf(TRACES_PATTERN_NAME), + template, + Collections.emptyList(), + 1, + 1, + Collections.singletonMap("description", "Observability Traces Mapping Template") as Map?, + ComposableIndexTemplate.DataStreamTemplate() + ) + ) + try { + val validationException = request.validateIndexTemplate(null) + if (validationException != null && !validationException.validationErrors().isEmpty()) { + error("$LOG_PREFIX:Index Template $TRACES_MAPPING_TEMPLATE_NAME validation errors ${validationException.message}") + } + val actionFuture = client.admin().indices().execute(PutComposableIndexTemplateAction.INSTANCE, request) + val response = actionFuture.actionGet(PluginSettings.operationTimeoutMs) + if (response.isAcknowledged) { + log.info("$LOG_PREFIX:Mapping Template $TRACES_MAPPING_TEMPLATE_NAME creation Acknowledged") + } else { + error("$LOG_PREFIX:Mapping Template $TRACES_MAPPING_TEMPLATE_NAME creation not Acknowledged") + } + } catch (exception: ResourceAlreadyExistsException) { + log.warn("message: ${exception.message}") + } catch (exception: Exception) { + if (exception.cause !is ResourceAlreadyExistsException) { + throw exception + } + } + } + } + + /** + * Check if the mapping template is created and available. + * @param template name + * @return true if template is available, false otherwise + */ + @Suppress("TooGenericExceptionCaught", "SwallowedException", "RethrowCaughtException") + private fun isTemplateExists(template: String): Boolean { + try { + val indices = client.admin().indices() + val response = indices.getTemplates(GetIndexTemplatesRequest(template)).get() + return response.indexTemplates.isNotEmpty() + } catch (exception: ResourceNotFoundException) { + return false + } catch (exception: ResourceAlreadyExistsException) { + return true + } catch (exception: Exception) { + throw exception + } + } +} diff --git a/src/main/resources/metrics-mapping-template.json b/src/main/resources/metrics-mapping-template.json new file mode 100644 index 000000000..a428e5706 --- /dev/null +++ b/src/main/resources/metrics-mapping-template.json @@ -0,0 +1,266 @@ +{ + "_meta": { + "version": "1.0" + }, + "_source": { + "enabled": true + }, + "dynamic_templates": [ + { + "attributes_map": { + "mapping": { + "type": "keyword" + }, + "path_match": "attributes.*" + } + }, + { + "resources_map": { + "mapping": { + "type": "keyword" + }, + "path_match": "resource.*" + } + }, + { + "exemplar_attributes_map": { + "mapping": { + "type": "keyword" + }, + "path_match": "exemplar.attributes.*" + } + }, + { + "instrumentation_scope_attributes_map": { + "mapping": { + "type": "keyword" + }, + "path_match": "instrumentationScope.attributes.*" + } + } + ], + "properties": { + "name": { + "type": "text", + "fields": { + "keyword": { + "type": "keyword", + "ignore_above": 256 + } + } + }, + "attributes": { + "type": "object", + "properties": { + "data_stream": { + "properties": { + "dataset": { + "ignore_above": 128, + "type": "keyword" + }, + "namespace": { + "ignore_above": 128, + "type": "keyword" + }, + "type": { + "ignore_above": 56, + "type": "keyword" + } + } + } + } + }, + "description": { + "type": "text", + "fields": { + "keyword": { + "type": "keyword", + "ignore_above": 256 + } + } + }, + "unit": { + "type": "keyword", + "ignore_above": 128 + }, + "kind": { + "type": "keyword", + "ignore_above": 128 + }, + "aggregationTemporality": { + "type": "keyword", + "ignore_above": 128 + }, + "monotonic": { + "type": "boolean" + }, + "startTime": { + "type": "date" + }, + "@timestamp": { + "type": "date" + }, + "observedTimestamp": { + "type": "date_nanos" + }, + "value": { + "properties": { + "int": { + "type": "integer" + }, + "double": { + "type": "double" + } + } + }, + "buckets": { + "properties": { + "count": { + "type": "long" + }, + "sum": { + "type": "double" + }, + "max": { + "type": "float" + }, + "min": { + "type": "float" + } + } + }, + "bucketCount": { + "type": "long" + }, + "bucketCountsList": { + "type": "long" + }, + "explicitBoundsList": { + "type": "float" + }, + "explicitBoundsCount": { + "type": "float" + }, + "quantiles": { + "properties": { + "quantile": { + "type": "double" + }, + "value": { + "type": "double" + } + } + }, + "quantileValuesCount": { + "type": "long" + }, + "positiveBuckets": { + "properties": { + "count": { + "type": "long" + }, + "max": { + "type": "float" + }, + "min": { + "type": "float" + } + } + }, + "negativeBuckets": { + "properties": { + "count": { + "type": "long" + }, + "max": { + "type": "float" + }, + "min": { + "type": "float" + } + } + }, + "negativeOffset": { + "type": "integer" + }, + "positiveOffset": { + "type": "integer" + }, + "zeroCount": { + "type": "long" + }, + "scale": { + "type": "long" + }, + "max": { + "type": "float" + }, + "min": { + "type": "float" + }, + "sum": { + "type": "float" + }, + "count": { + "type": "long" + }, + "exemplar": { + "properties": { + "time": { + "type": "date" + }, + "traceId": { + "ignore_above": 256, + "type": "keyword" + }, + "spanId": { + "ignore_above": 256, + "type": "keyword" + } + } + }, + "instrumentationScope": { + "properties": { + "name": { + "type": "text", + "fields": { + "keyword": { + "type": "keyword", + "ignore_above": 128 + } + } + }, + "version": { + "type": "text", + "fields": { + "keyword": { + "type": "keyword", + "ignore_above": 256 + } + } + }, + "droppedAttributesCount": { + "type": "integer" + }, + "schemaUrl": { + "type": "text", + "fields": { + "keyword": { + "type": "keyword", + "ignore_above": 256 + } + } + } + } + }, + "schemaUrl": { + "type": "text", + "fields": { + "keyword": { + "type": "keyword", + "ignore_above": 256 + } + } + } + } +} \ No newline at end of file diff --git a/src/main/resources/traces-mapping-template.json b/src/main/resources/traces-mapping-template.json new file mode 100644 index 000000000..357f71132 --- /dev/null +++ b/src/main/resources/traces-mapping-template.json @@ -0,0 +1,185 @@ +{ + "_meta": { + "version": "1.0" + }, + "_source": { + "enabled": true + }, + "dynamic_templates": [ + { + "attributes_map": { + "mapping": { + "type": "keyword" + }, + "path_match": "attributes.*" + } + }, + { + "events_attributes_map": { + "mapping": { + "type": "keyword" + }, + "path_match": "events.attributes.*" + } + }, + { + "links_attributes_map": { + "mapping": { + "type": "keyword" + }, + "path_match": "links.attributes.*" + } + }, + { + "instrumentation_scope_attributes_map": { + "mapping": { + "type": "keyword" + }, + "path_match": "instrumentationScope.attributes.*" + } + }, + { + "resources_map": { + "mapping": { + "type": "keyword" + }, + "path_match": "resource.*" + } + } + ], + "properties": { + "traceId": { + "ignore_above": 256, + "type": "keyword" + }, + "spanId": { + "ignore_above": 256, + "type": "keyword" + }, + "traceState": { + "type": "text", + "fields": { + "keyword": { + "type": "keyword", + "ignore_above": 256 + } + } + }, + "parentSpanId": { + "ignore_above": 256, + "type": "keyword" + }, + "name": { + "ignore_above": 1024, + "type": "keyword" + }, + "kind": { + "ignore_above": 128, + "type": "keyword" + }, + "startTime": { + "type": "date_nanos" + }, + "endTime": { + "type": "date_nanos" + }, + "droppedAttributesCount": { + "type": "long" + }, + "droppedEventsCount": { + "type": "long" + }, + "droppedLinksCount": { + "type": "long" + }, + "status": { + "properties": { + "code": { + "ignore_above": 128, + "type": "keyword" + }, + "message": { + "ignore_above": 128, + "type": "keyword" + } + } + }, + "attributes": { + "type": "object", + "properties": { + "data_stream": { + "properties": { + "dataset": { + "ignore_above": 128, + "type": "keyword" + }, + "namespace": { + "ignore_above": 128, + "type": "keyword" + }, + "type": { + "ignore_above": 56, + "type": "keyword" + } + } + } + } + }, + "events": { + "type": "nested", + "properties": { + "name": { + "ignore_above": 1024, + "type": "keyword" + }, + "@timestamp": { + "type": "date_nanos" + }, + "observedTimestamp": { + "type": "date_nanos" + } + } + }, + "links": { + "type": "nested", + "properties": { + "traceId": { + "ignore_above": 256, + "type": "keyword" + }, + "spanId": { + "ignore_above": 256, + "type": "keyword" + }, + "traceState": { + "type": "text", + "fields": { + "keyword": { + "type": "keyword", + "ignore_above": 256 + } + } + } + } + }, + "instrumentationScope": { + "properties": { + "name": { + "type": "keyword" + }, + "version": { + "type": "keyword" + }, + "droppedAttributesCount": { + "type": "integer" + }, + "schemaUrl": { + "type": "keyword" + } + } + }, + "schemaUrl": { + "type": "keyword" + } + } +} diff --git a/src/test/kotlin/org/opensearch/observability/ObservabilityPluginIT.kt b/src/test/kotlin/org/opensearch/observability/ObservabilityPluginIT.kt index 0b9984ec7..442bfc37a 100644 --- a/src/test/kotlin/org/opensearch/observability/ObservabilityPluginIT.kt +++ b/src/test/kotlin/org/opensearch/observability/ObservabilityPluginIT.kt @@ -11,6 +11,7 @@ import org.opensearch.cluster.health.ClusterHealthStatus import org.opensearch.plugins.PluginInfo import org.opensearch.test.OpenSearchIntegTestCase +@Suppress("TooManyFunctions") class ObservabilityPluginIT : OpenSearchIntegTestCase() { fun testPluginsAreInstalled() { val request = ClusterHealthRequest() diff --git a/src/test/kotlin/org/opensearch/observability/PluginRestTestCase.kt b/src/test/kotlin/org/opensearch/observability/PluginRestTestCase.kt index c5a1d4115..fdd030db2 100644 --- a/src/test/kotlin/org/opensearch/observability/PluginRestTestCase.kt +++ b/src/test/kotlin/org/opensearch/observability/PluginRestTestCase.kt @@ -9,6 +9,7 @@ import com.google.gson.JsonObject import org.apache.http.HttpHost import org.junit.After import org.junit.AfterClass +import org.junit.Assert import org.junit.Before import org.opensearch.client.Request import org.opensearch.client.RequestOptions @@ -23,6 +24,8 @@ import org.opensearch.common.xcontent.NamedXContentRegistry import org.opensearch.common.xcontent.XContentType import org.opensearch.commons.ConfigConstants import org.opensearch.commons.rest.SecureRestClientBuilder +import org.opensearch.rest.RestRequest +import org.opensearch.rest.RestStatus import org.opensearch.test.rest.OpenSearchRestTestCase import java.io.BufferedReader import java.io.IOException @@ -54,33 +57,63 @@ abstract class PluginRestTestCase : OpenSearchRestTestCase() { } open fun preserveOpenSearchIndicesAfterTest(): Boolean = false + open fun preserveOpenSearchDataStreamsAfterTest(): Boolean = true @Throws(IOException::class) @After open fun wipeAllOpenSearchIndices() { - if (preserveOpenSearchIndicesAfterTest()) return - val response = client().performRequest(Request("GET", "/_cat/indices?format=json&expand_wildcards=all")) - val xContentType = XContentType.fromMediaType(response.entity.contentType.value) - xContentType.xContent().createParser( - NamedXContentRegistry.EMPTY, DeprecationHandler.THROW_UNSUPPORTED_OPERATION, - response.entity.content - ).use { parser -> - for (index in parser.list()) { - val jsonObject: Map<*, *> = index as java.util.HashMap<*, *> - val indexName: String = jsonObject["index"] as String - // .opendistro_security isn't allowed to delete from cluster - if (".opendistro_security" != indexName) { - val request = Request("DELETE", "/$indexName") - // TODO: remove PERMISSIVE option after moving system index access to REST API call - val options = RequestOptions.DEFAULT.toBuilder() - options.setWarningsHandler(WarningsHandler.PERMISSIVE) - request.options = options.build() - adminClient().performRequest(request) + if (!preserveOpenSearchDataStreamsAfterTest()) { + var getResponse = executeRequest( + RestRequest.Method.GET.name, + "/_data_stream/*", + "", + RestStatus.OK.status + ) + Assert.assertNotNull(!getResponse.get("data_streams").asJsonArray.isEmpty) + for (stream in getResponse.get("data_streams").asJsonArray) { + dataStreamCleanUp(stream.asJsonObject.get("name").asString) + } + } + if (!preserveOpenSearchIndicesAfterTest()) { + val response = client().performRequest(Request("GET", "/_cat/indices?format=json&expand_wildcards=all")) + val xContentType = XContentType.fromMediaType(response.entity.contentType.value) + xContentType.xContent().createParser( + NamedXContentRegistry.EMPTY, DeprecationHandler.THROW_UNSUPPORTED_OPERATION, + response.entity.content + ).use { parser -> + for (index in parser.list()) { + val jsonObject: Map<*, *> = index as java.util.HashMap<*, *> + val indexName: String = jsonObject["index"] as String + // .opendistro_security isn't allowed to delete from cluster and same for data-stream backing indices + if (".opendistro_security" != indexName && !indexName.startsWith(".ds-", false)) { + val request = Request("DELETE", "/$indexName") + // TODO: remove PERMISSIVE option after moving system index access to REST API call + val options = RequestOptions.DEFAULT.toBuilder() + options.setWarningsHandler(WarningsHandler.PERMISSIVE) + request.options = options.build() + adminClient().performRequest(request) + } } } } } + fun dataStreamCleanUp(stream: String) { + // remove data-stream for house cleaning + var response = executeRequest( + RestRequest.Method.DELETE.name, + "/_data_stream/$stream", + "", + RestStatus.OK.status + ) + Assert.assertNotNull(response.get("acknowledged")) + Assert.assertEquals(true, response.get("acknowledged").asBoolean) + } + + open fun cleanUp() { + // Override for specific cleanup behaviour + } + /** * Returns the REST client settings used for super-admin actions like cleaning up after the test has completed. */ diff --git a/src/test/kotlin/org/opensearch/observability/rest/AssemblyValidationIT.kt b/src/test/kotlin/org/opensearch/observability/rest/AssemblyValidationIT.kt new file mode 100644 index 000000000..bb9d7af5c --- /dev/null +++ b/src/test/kotlin/org/opensearch/observability/rest/AssemblyValidationIT.kt @@ -0,0 +1,43 @@ +/* + * Copyright OpenSearch Contributors + * SPDX-License-Identifier: Apache-2.0 + */ + +package org.opensearch.observability.rest + +import org.junit.Assert +import org.opensearch.observability.PluginRestTestCase +import org.opensearch.rest.RestRequest +import org.opensearch.rest.RestStatus + +class AssemblyValidationIT : PluginRestTestCase() { + companion object { + private const val TRACES_MAPPING_TEMPLATE_NAME = "sso_trace_template" + private const val METRICS_MAPPING_TEMPLATE_NAME = "sso_metric_template" + } + + fun `test observability traces template and was created`() { + // verify traces mapping template was created successfully as part of the plugin initialization + Thread.sleep(1000) + var response = executeRequest( + RestRequest.Method.GET.name, + "/_index_template/$TRACES_MAPPING_TEMPLATE_NAME", + "", + RestStatus.OK.status + ) + Thread.sleep(1000) + Assert.assertNotNull(response.get("index_templates")) + Assert.assertNotNull(!response.get("index_templates").asJsonArray.isEmpty) + + // verify metrics mapping template was created successfully as part of the plugin initialization + response = executeRequest( + RestRequest.Method.GET.name, + "/_index_template/$METRICS_MAPPING_TEMPLATE_NAME", + "", + RestStatus.OK.status + ) + Thread.sleep(1000) + Assert.assertNotNull(response.get("index_templates")) + Assert.assertNotNull(!response.get("index_templates").asJsonArray.isEmpty) + } +}