diff --git a/.env.override b/.env.override index 83771825eb..846a8f1945 100644 --- a/.env.override +++ b/.env.override @@ -17,5 +17,4 @@ KAFKA_SERVICE_DOCKERFILE=./src/kafka/Dockerfile.elastic # ********************* COLLECTOR_CONTRIB_IMAGE=docker.elastic.co/beats/elastic-agent:8.16.0 OTEL_COLLECTOR_CONFIG=./src/otelcollector/otelcol-elastic-config.yaml -OTEL_COLLECTOR_CONFIG_EXTRAS=./src/otelcollector/otelcol-elastic-config-extras.yaml ELASTIC_AGENT_OTEL=true diff --git a/.github/README.md b/.github/README.md index b683369a40..a0a30ecc84 100644 --- a/.github/README.md +++ b/.github/README.md @@ -12,11 +12,11 @@ Additionally, the OpenTelemetry Contrib collector has also been changed to the [ ## Docker compose -1. Start a free trial on [Elastic Cloud](https://cloud.elastic.co/) and copy the `endpoint` and `secretToken` from the Elastic APM setup instructions in your Kibana. -1. Open the file `src/otelcollector/otelcol-elastic-config-extras.yaml` in an editor and replace the following two placeholders: - - `YOUR_APM_ENDPOINT_WITHOUT_HTTPS_PREFIX`: your Elastic APM endpoint (*without* `https://` prefix) that *must* also include the port (example: `1234567.apm.us-west2.gcp.elastic-cloud.com:443`). - - `YOUR_APM_SECRET_TOKEN`: your Elastic APM secret token. -1. Start the demo with the following command from the repository's root directory: +1. Start a free trial on [Elastic Cloud](https://cloud.elastic.co/) and copy the `Elasticsearch endpoint` and the `API Key` from the `Help -> Connection details` drop down instructions in your Kibana. These variables will be used by the [elasticsearch exporter](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/exporter/elasticsearchexporter#elasticsearch-exporter) to authenticate and transmit data to your Elasticsearch instance. +2. Open the file `src/otelcollector/otelcol-elastic-config.yaml` in an editor and replace the following two placeholders: + - `YOUR_ELASTICSEARCH_ENDPOINT`: your Elasticsearch endpoint (*with* `https://` prefix example: `https://1234567.us-west2.gcp.elastic-cloud.com:443`). + - `YOUR_ELASTICSEARCH_API_KEY`: your Elasticsearch API Key +3. Start the demo with the following command from the repository's root directory: ``` make start ``` @@ -27,27 +27,23 @@ Additionally, the OpenTelemetry Contrib collector has also been changed to the [ - Set up [kubectl](https://kubernetes.io/docs/reference/kubectl/). - Set up [Helm](https://helm.sh/). -### Start the Demo + +### Start the Demo (Kubernetes deployment) 1. Setup Elastic Observability on Elastic Cloud. -1. Create a secret in Kubernetes with the following command. +2. Create a secret in Kubernetes with the following command. ``` - kubectl create secret generic elastic-secret \ - --from-literal=elastic_apm_endpoint='YOUR_APM_ENDPOINT_WITHOUT_HTTPS_PREFIX' \ - --from-literal=elastic_apm_secret_token='YOUR_APM_SECRET_TOKEN' + kubectl create secret generic elastic-secret-otel \ + --from-literal=elastic_endpoint='YOUR_ELASTICSEARCH_ENDPOINT' \ + --from-literal=elastic_api_key='YOUR_ELASTICSEARCH_API_KEY' ``` Don't forget to replace - - `YOUR_APM_ENDPOINT_WITHOUT_HTTPS_PREFIX`: your Elastic APM endpoint (*without* `https://` prefix) that *must* also include the port (example: `1234567.apm.us-west2.gcp.elastic-cloud.com:443`). - - `YOUR_APM_SECRET_TOKEN`: your Elastic APM secret token, include the Bearer or ApiKey but not the "Authorization=" part e.g. Bearer XXXXXX or ApiKey XXXXX below is an example: - ``` - kubectl create secret generic elastic-secret \ - --from-literal=elastic_apm_endpoint='12345.apm.us-west2.gcp.elastic-cloud.com:443' \ - --from-literal=elastic_apm_secret_token='Bearer 123456789123456YE2' - ``` -1. Execute the following commands to deploy the OpenTelemetry demo to your Kubernetes cluster: + - `YOUR_ELASTICSEARCH_ENDPOINT`: your Elasticsearch endpoint (*with* `https://` prefix example: `https://1234567.us-west2.gcp.elastic-cloud.com:443`). + - `YOUR_ELASTICSEARCH_API_KEY`: your Elasticsearch API Key +3. Execute the following commands to deploy the OpenTelemetry demo to your Kubernetes cluster: ``` # clone this repository git clone https://github.com/elastic/opentelemetry-demo - + # switch to the kubernetes/elastic-helm directory cd opentelemetry-demo/kubernetes/elastic-helm @@ -61,30 +57,29 @@ Additionally, the OpenTelemetry Contrib collector has also been changed to the [ helm install -f deployment.yaml my-otel-demo open-telemetry/opentelemetry-demo ``` -#### Kubernetes monitoring +Additionally, this EDOT Collector configuration includes the following components for comprehensive Kubernetes monitoring: + - [K8s Objects Receiver](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/k8sobjectsreceiver): Captures detailed information about Kubernetes objects. + - [K8s Cluster Receiver](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/k8sclusterreceiver): Collects metrics and metadata about the overall cluster state. -This demo already enables cluster level metrics collection with `clusterMetrics` and -Kubernetes events collection with `kubernetesEvents`. +#### Kubernetes monitoring (daemonset) -In order to add Node level metrics collection we can run an additional Otel collector Daemonset with the following: +The `daemonset` EDOT collector is configured with the components to monitor node-level metrics and logs, ensuring detailed insights into individual Kubernetes nodes: -1. Create a secret in Kubernetes with the following command. - ``` - kubectl create secret generic elastic-secret-ds \ - --from-literal=elastic_endpoint='YOUR_ELASTICSEARCH_ENDPOINT' \ - --from-literal=elastic_api_key='YOUR_ELASTICSEARCH_API_KEY' - ``` - Don't forget to replace - - `YOUR_ELASTICSEARCH_ENDPOINT`: your Elasticsearch endpoint (*with* `https://` prefix example: `https://1234567.us-west2.gcp.elastic-cloud.com:443`). - - `YOUR_ELASTICSEARCH_API_KEY`: your Elasticsearch API Key +- [Host Metrics Receiver](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/hostmetrics): Collects system-level metrics such as CPU, memory, and disk usage from the host. +- [Kubelet Stats Receiver](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/kubeletstats): Gathers pod and container metrics directly from the kubelet. +- [Filelog Receiver](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/filelog): Ingests and parses log files from nodes, providing detailed log analysis. -2. Execute the following command to deploy the OpenTelemetry Collector to your Kubernetes cluster, in the same directory `kubernetes/elastic-helm` in this repository. +To deploy the EDOT Collector to your Kubernetes cluster, ensure the `elastic-secret-otel` Kubernetes secret is created (if it doesn't already exist). Then, run the following command from the `kubernetes/elastic-helm` directory in this repository. ``` # deploy the Elastic OpenTelemetry collector distribution through helm install helm install otel-daemonset open-telemetry/opentelemetry-collector --values daemonset.yaml ``` +#### Kubernetes architecture diagram + +![Deployment architecture](../kubernetes/elastic-helm/elastic-architecture.png "K8s architecture") + ## Explore and analyze the data With Elastic ### Service map diff --git a/kubernetes/elastic-helm/daemonset.yaml b/kubernetes/elastic-helm/daemonset.yaml index 8ed45de5e4..6deb903f90 100644 --- a/kubernetes/elastic-helm/daemonset.yaml +++ b/kubernetes/elastic-helm/daemonset.yaml @@ -18,17 +18,20 @@ securityContext: runAsGroup: 0 extraEnvs: + # Work around for open /mounts error: https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/35990 + - name: HOST_PROC_MOUNTINFO + value: "" - name: ELASTIC_AGENT_OTEL value: "true" - name: ELASTIC_ENDPOINT valueFrom: secretKeyRef: - name: elastic-secret-ds + name: elastic-secret-otel key: elastic_endpoint - name: ELASTIC_API_KEY valueFrom: secretKeyRef: - name: elastic-secret-ds + name: elastic-secret-otel key: elastic_api_key - name: K8S_NODE_NAME valueFrom: @@ -61,7 +64,7 @@ config: exporters: debug: verbosity: basic - elasticsearch: + elasticsearch/ecs: endpoints: - ${env:ELASTIC_ENDPOINT} api_key: ${env:ELASTIC_API_KEY} @@ -71,32 +74,55 @@ config: enabled: true mapping: mode: ecs + elasticsearch/otel: + endpoints: + - ${env:ELASTIC_ENDPOINT} + api_key: ${env:ELASTIC_API_KEY} + logs_dynamic_index: + enabled: true + metrics_dynamic_index: + enabled: true + mapping: + mode: otel processors: batch: {} elasticinframetrics: add_system_metrics: true add_k8s_metrics: true - resourcedetection/eks: - detectors: [env, eks] + drop_original: true + resourcedetection/cluster: + detectors: [env, eks, gcp, aks, eks, k8snode] timeout: 15s override: true + k8snode: + auth_type: serviceAccount eks: resource_attributes: k8s.cluster.name: enabled: true - resourcedetection/gcp: - detectors: [env, gcp] - timeout: 2s - override: true - resource/k8s: + aks: + resource_attributes: + k8s.cluster.name: + enabled: true + resource/k8s: # Resource attributes tailored for services within Kubernetes. attributes: - - key: service.name - from_attribute: app.label.component + - key: service.name # Set the service.name resource attribute based on the well-known app.kubernetes.io/name label + from_attribute: app.label.name action: insert - attributes/k8s_logs_dataset: - actions: - - key: data_stream.dataset - value: "kubernetes.container_logs" + - key: service.name # Set the service.name resource attribute based on the k8s.container.name attribute + from_attribute: k8s.container.name + action: insert + - key: app.label.name # Delete app.label.name attribute previously used for service.name + action: delete + - key: service.version # Set the service.version resource attribute based on the well-known app.kubernetes.io/version label + from_attribute: app.label.version + action: insert + - key: app.label.version # Delete app.label.version attribute previously used for service.version + action: delete + resource/hostname: + attributes: + - key: host.name + from_attribute: k8s.node.name action: upsert attributes/dataset: actions: @@ -311,12 +337,16 @@ config: pipelines: logs: receivers: [filelog] - processors: [batch, k8sattributes, resourcedetection/system, resourcedetection/eks, resourcedetection/gcp, resource/demo, resource/k8s, resource/cloud, attributes/k8s_logs_dataset] - exporters: [debug, elasticsearch] + processors: [batch, k8sattributes, resourcedetection/cluster, resource/hostname, resource/demo, resource/k8s, resource/cloud] + exporters: [debug, elasticsearch/otel] metrics: receivers: [hostmetrics, kubeletstats] - processors: [batch, k8sattributes, elasticinframetrics, resourcedetection/system, resource/demo, resourcedetection/eks, resourcedetection/gcp, resource/k8s, resource/cloud, attributes/dataset, resource/process] - exporters: [debug, elasticsearch] + processors: [batch, k8sattributes, elasticinframetrics, resourcedetection/cluster, resource/hostname, resource/demo, resource/k8s, resource/cloud, attributes/dataset, resource/process] + exporters: [debug, elasticsearch/ecs] + metrics/otel: + receivers: [kubeletstats] + processors: [batch, k8sattributes, resourcedetection/cluster, resource/hostname, resource/demo, resource/k8s, resource/cloud] + exporters: [debug, elasticsearch/otel] traces: null telemetry: metrics: diff --git a/kubernetes/elastic-helm/deployment.yaml b/kubernetes/elastic-helm/deployment.yaml index bc29904d8c..a37359fa5b 100644 --- a/kubernetes/elastic-helm/deployment.yaml +++ b/kubernetes/elastic-helm/deployment.yaml @@ -37,6 +37,18 @@ opentelemetry-collector: repository: docker.elastic.co/beats/elastic-agent tag: 8.16.0 mode: "deployment" + useGOMEMLIMIT: false + resources: + # The high resource limits set here are due to the usage of the lsminterval processor. + # The # [LSM Interval Processor](https://github.com/elastic/opentelemetry-collector-components/tree/main/processor/lsmintervalprocessor) + # aggregates metrics in a db-backed over a defined interval and periodically + # forwards the latest values to the next component in the pipeline. + limits: + cpu: 1500m + memory: 1500Mi + requests: + cpu: 250m + memory: 1500Mi presets: kubernetesAttributes: enabled: true @@ -48,16 +60,22 @@ opentelemetry-collector: extraEnvs: - name: ELASTIC_AGENT_OTEL value: "true" - - name: ELASTIC_APM_ENDPOINT + - name: ELASTIC_ENDPOINT valueFrom: secretKeyRef: - name: elastic-secret - key: elastic_apm_endpoint - - name: ELASTIC_APM_SECRET_TOKEN + name: elastic-secret-otel + key: elastic_endpoint + - name: ELASTIC_API_KEY valueFrom: secretKeyRef: - name: elastic-secret - key: elastic_apm_secret_token + name: elastic-secret-otel + key: elastic_api_key + - name: GOMAXPROCS + valueFrom: + resourceFieldRef: + resource: limits.cpu + - name: GOMEMLIMIT + value: "1025MiB" alternateConfig: extensions: @@ -65,13 +83,256 @@ opentelemetry-collector: endpoint: ${env:MY_POD_IP}:13133 connectors: spanmetrics: {} + # [Signal To Metrics Connector](https://github.com/elastic/opentelemetry-collector-components/tree/main/connector/signaltometricsconnector) + signaltometrics: # Produces metrics from all signal types (traces, logs, or metrics). + logs: + - name: service_summary + include_resource_attributes: + - key: service.name + - key: deployment.environment # service.environment + - key: telemetry.sdk.language # service.language.name + - key: agent.name # set via elastictraceprocessor + attributes: + - key: metricset.name + default_value: service_summary + sum: + value: "1" + datapoints: + - name: service_summary + include_resource_attributes: + - key: service.name + - key: deployment.environment # service.environment + - key: telemetry.sdk.language # service.language.name + - key: agent.name # set via elastictraceprocessor + attributes: + - key: metricset.name + default_value: service_summary + sum: + value: "1" + spans: + - name: service_summary + include_resource_attributes: + - key: service.name + - key: deployment.environment # service.environment + - key: telemetry.sdk.language # service.language.name + - key: agent.name # set via elastictraceprocessor + attributes: + - key: metricset.name + default_value: service_summary + sum: + value: Int(AdjustedCount()) + - name: transaction.duration.histogram + description: APM service transaction aggregated metrics as histogram + include_resource_attributes: + - key: service.name + - key: deployment.environment # service.environment + - key: telemetry.sdk.language # service.language.name + - key: agent.name # set via elastictraceprocessor + attributes: + - key: transaction.root + - key: transaction.type + - key: metricset.name + default_value: service_transaction + - key: elasticsearch.mapping.hints + default_value: [_doc_count] + unit: us + exponential_histogram: + value: Microseconds(end_time - start_time) + max_size: 2 + - name: transaction.duration.summary + description: APM service transaction aggregated metrics as summary + include_resource_attributes: + - key: service.name + - key: deployment.environment # service.environment + - key: telemetry.sdk.language # service.language.name + - key: agent.name # set via elastictraceprocessor + attributes: + - key: transaction.root + - key: transaction.type + - key: metricset.name + default_value: service_transaction + - key: elasticsearch.mapping.hints + default_value: [aggregate_metric_double] + unit: us + histogram: + buckets: [1] + value: Microseconds(end_time - start_time) + - name: transaction.duration.histogram + description: APM transaction aggregated metrics as histogram + ephemeral_resource_attribute: true + include_resource_attributes: + - key: service.name + - key: deployment.environment # service.environment + - key: telemetry.sdk.language # service.language.name + - key: agent.name # set via elastictraceprocessor + - key: container.id + - key: k8s.pod.name + - key: service.version + - key: service.instance.id # service.node.name + - key: process.runtime.name # service.runtime.name + - key: process.runtime.version # service.runtime.version + - key: telemetry.sdk.version # service.language.version?? + - key: host.name + - key: os.type # host.os.platform + - key: faas.instance + - key: faas.name + - key: faas.version + - key: cloud.provider + - key: cloud.region + - key: cloud.availability_zone + - key: cloud.platform # cloud.servicename + - key: cloud.account.id + attributes: + - key: transaction.root + - key: transaction.name + - key: transaction.type + - key: transaction.result + - key: event.outcome + - key: metricset.name + default_value: transaction + - key: elasticsearch.mapping.hints + default_value: [_doc_count] + unit: us + exponential_histogram: + value: Microseconds(end_time - start_time) + max_size: 2 + - name: transaction.duration.summary + description: APM transaction aggregated metrics as summary + ephemeral_resource_attribute: true + include_resource_attributes: + - key: service.name + - key: deployment.environment # service.environment + - key: telemetry.sdk.language # service.language.name + - key: agent.name # set via elastictraceprocessor + - key: container.id + - key: k8s.pod.name + - key: service.version + - key: service.instance.id # service.node.name + - key: process.runtime.name # service.runtime.name + - key: process.runtime.version # service.runtime.version + - key: telemetry.sdk.version # service.language.version?? + - key: host.name + - key: os.type # host.os.platform + - key: faas.instance + - key: faas.name + - key: faas.version + - key: cloud.provider + - key: cloud.region + - key: cloud.availability_zone + - key: cloud.platform # cloud.servicename + - key: cloud.account.id + attributes: + - key: transaction.root + - key: transaction.name + - key: transaction.type + - key: transaction.result + - key: event.outcome + - key: metricset.name + default_value: transaction + - key: elasticsearch.mapping.hints + default_value: [aggregate_metric_double] + unit: us + histogram: + buckets: [1] + value: Microseconds(end_time - start_time) + - name: span.destination.service.response_time.sum.us + description: APM span destination metrics + ephemeral_resource_attribute: true + include_resource_attributes: + - key: service.name + - key: deployment.environment # service.environment + - key: telemetry.sdk.language # service.language.name + - key: agent.name # set via elastictraceprocessor + attributes: + - key: span.name + - key: event.outcome + - key: service.target.type + - key: service.target.name + - key: span.destination.service.resource + - key: metricset.name + default_value: service_destination + unit: us + sum: + value: Double(Microseconds(end_time - start_time)) + - name: span.destination.service.response_time.count + description: APM span destination metrics + ephemeral_resource_attribute: true + include_resource_attributes: + - key: service.name + - key: deployment.environment # service.environment + - key: telemetry.sdk.language # service.language.name + - key: agent.name # set via elastictraceprocessor + attributes: + - key: span.name + - key: event.outcome + - key: service.target.type + - key: service.target.name + - key: span.destination.service.resource + - key: metricset.name + default_value: service_destination + sum: + value: Int(AdjustedCount()) + # event.success_count is populated using 2 metric definition with different conditions + # and value for the histogram bucket based on event outcome. Both metric definition + # are created using same name and attribute and will result in a single histogram. + # We use mapping hint of aggregate_metric_double, so, only the sum and the count + # values are required and the actual histogram bucket is ignored. + - name: event.success_count + description: Success count as a metric for service transaction + include_resource_attributes: + - key: service.name + - key: deployment.environment # service.environment + - key: telemetry.sdk.language # service.language.name + - key: agent.name # set via elastictraceprocessor + attributes: + - key: transaction.root + - key: transaction.type + - key: metricset.name + default_value: service_transaction + - key: elasticsearch.mapping.hints + default_value: [aggregate_metric_double] + conditions: + - attributes["event.outcome"] != nil and attributes["event.outcome"] == "success" + unit: us + histogram: + buckets: [1] + count: Int(AdjustedCount()) + value: Int(AdjustedCount()) + - name: event.success_count + description: Success count as a metric for service transaction + include_resource_attributes: + - key: service.name + - key: deployment.environment # service.environment + - key: telemetry.sdk.language # service.language.name + - key: agent.name # set via elastictraceprocessor + attributes: + - key: transaction.root + - key: transaction.type + - key: metricset.name + default_value: service_transaction + - key: elasticsearch.mapping.hints + default_value: [aggregate_metric_double] + conditions: + - attributes["event.outcome"] != nil and attributes["event.outcome"] != "success" + unit: us + histogram: + buckets: [0] + count: Int(AdjustedCount()) + value: Double(0) exporters: debug: {} - otlp/elastic: - endpoint: ${env:ELASTIC_APM_ENDPOINT} - compression: none - headers: - Authorization: ${env:ELASTIC_APM_SECRET_TOKEN} + elasticsearch/otel: + endpoints: + - ${env:ELASTIC_ENDPOINT} + api_key: ${env:ELASTIC_API_KEY} + logs_dynamic_index: + enabled: true + metrics_dynamic_index: + enabled: true + traces_dynamic_index: + enabled: true + mapping: + mode: otel processors: batch: {} resource: @@ -79,7 +340,137 @@ opentelemetry-collector: - key: deployment.environment value: "opentelemetry-demo" action: upsert + # Transform processor to remove services high cardinality on span names + transform/services: + error_mode: ignore + trace_statements: + - context: span + conditions: + - IsMatch(name, "^[A-Z]+\\s+.+") + statements: + - merge_maps(attributes, ExtractPatterns(name, "^(?P\\S+)"), "upsert") + - set(name, attributes["method"]) + # [Elastic Trace Processor](https://github.com/elastic/opentelemetry-collector-components/tree/main/processor/elastictraceprocessor) + elastictrace: {} # The processor enriches traces with elastic specific requirements. + # [LSM Interval Processor](https://github.com/elastic/opentelemetry-collector-components/tree/main/processor/lsmintervalprocessor) + lsminterval: + intervals: + - duration: 1m + statements: + - set(resource.attributes["metricset.interval"], "1m") + - set(attributes["data_stream.dataset"], Concat([attributes["metricset.name"], "1m"], ".")) + - set(attributes["processor.event"], "metric") + - duration: 10m + statements: + - set(resource.attributes["metricset.interval"], "10m") + - set(attributes["data_stream.dataset"], Concat([attributes["metricset.name"], "10m"], ".")) + - set(attributes["processor.event"], "metric") + # [Resource Detection Processor](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/resourcedetectionprocessor) + resourcedetection/eks: + detectors: [env, eks] # Detects resources from environment variables and EKS (Elastic Kubernetes Service). + timeout: 15s + override: true + eks: + resource_attributes: + k8s.cluster.name: + enabled: true + resourcedetection/gcp: + detectors: [env, gcp] # Detects resources from environment variables and GCP (Google Cloud Platform). + timeout: 2s + override: true + resourcedetection/aks: + detectors: [env, aks] # Detects resources from environment variables and AKS (Azure Kubernetes Service). + timeout: 2s + override: true + aks: + resource_attributes: + k8s.cluster.name: + enabled: true + # [Resource Processor](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/resourceprocessor) + resource/k8s: # Resource attributes tailored for services within Kubernetes. + attributes: + - key: service.name # Set the service.name resource attribute based on the well-known app.kubernetes.io/name label + from_attribute: app.label.name + action: insert + - key: service.name # Set the service.name resource attribute based on the k8s.container.name attribute + from_attribute: k8s.container.name + action: insert + - key: app.label.name # Delete app.label.name attribute previously used for service.name + action: delete + - key: service.version # Set the service.version resource attribute based on the well-known app.kubernetes.io/version label + from_attribute: app.label.version + action: insert + - key: app.label.version # Delete app.label.version attribute previously used for service.version + action: delete + resource/hostname: + attributes: + - key: host.name + from_attribute: k8s.node.name + action: upsert + # [K8s Attributes Processor](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/k8sattributesprocessor) + k8sattributes: + passthrough: false # Annotates resources with the pod IP and does not try to extract any other metadata. + pod_association: + # Below association takes a look at the k8s.pod.ip and k8s.pod.uid resource attributes or connection's context, and tries to match it with the pod having the same attribute. + - sources: + - from: resource_attribute + name: k8s.pod.ip + - sources: + - from: resource_attribute + name: k8s.pod.uid + - sources: + - from: connection + extract: + metadata: + - "k8s.namespace.name" + - "k8s.deployment.name" + - "k8s.replicaset.name" + - "k8s.statefulset.name" + - "k8s.daemonset.name" + - "k8s.cronjob.name" + - "k8s.job.name" + - "k8s.node.name" + - "k8s.pod.name" + - "k8s.pod.ip" + - "k8s.pod.uid" + - "k8s.pod.start_time" + labels: + - tag_name: app.label.name + key: app.kubernetes.io/name + from: pod + - tag_name: app.label.version + key: app.kubernetes.io/version + from: pod receivers: + # [K8s Objects Receiver](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/k8sobjectsreceiver) + k8sobjects: + objects: + - name: events + mode: "watch" + group: "events.k8s.io" + exclude_watch_type: + - "DELETED" + # [K8s Cluster Receiver](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/k8sclusterreceiver) + k8s_cluster: + auth_type: serviceAccount # Determines how to authenticate to the K8s API server. This can be one of none (for no auth), serviceAccount (to use the standard service account token provided to the agent pod), or kubeConfig to use credentials from ~/.kube/config. + node_conditions_to_report: + - Ready + - MemoryPressure + allocatable_types_to_report: + - cpu + - memory + metrics: + k8s.pod.status_reason: + enabled: true + resource_attributes: + k8s.kubelet.version: + enabled: true + os.description: + enabled: true + os.type: + enabled: true + k8s.container.status.last_terminated_reason: + enabled: true httpcheck/frontendproxy: targets: - endpoint: http://example-frontendproxy:8080 @@ -97,36 +488,74 @@ opentelemetry-collector: extensions: - health_check pipelines: - logs: + metrics/k8s: exporters: - debug - - otlp/elastic + - elasticsearch/otel processors: - - batch - - resource + - k8sattributes + - resourcedetection/eks + - resourcedetection/gcp + - resourcedetection/aks + - resource/k8s + - resource/hostname receivers: - - otlp - metrics: + - k8s_cluster + logs/k8s: + receivers: + - k8sobjects + processors: + - resourcedetection/eks + - resourcedetection/gcp + - resourcedetection/aks + - resource/hostname exporters: - - otlp/elastic - debug + - elasticsearch/otel + logs: + exporters: + - debug + - elasticsearch/otel + - signaltometrics + processors: + - batch + - resource + receivers: + - otlp + metrics: + exporters: + - elasticsearch/otel + - signaltometrics + - debug processors: - - batch - - resource + - batch + - resource receivers: - - httpcheck/frontendproxy - - otlp - - spanmetrics + - httpcheck/frontendproxy + - otlp + - spanmetrics traces: exporters: - - otlp/elastic - - debug - - spanmetrics + - elasticsearch/otel + - debug + - spanmetrics + - signaltometrics processors: - - batch - - resource + - transform/services + - batch + - elastictrace + - resource + receivers: + - otlp + metrics/aggregated-otel-metrics: receivers: - - otlp + - signaltometrics + processors: + - batch + - lsminterval + exporters: + - debug + - elasticsearch/otel telemetry: metrics: address: ${env:MY_POD_IP}:8888 diff --git a/kubernetes/elastic-helm/elastic-architecture.png b/kubernetes/elastic-helm/elastic-architecture.png new file mode 100644 index 0000000000..bbd6544268 Binary files /dev/null and b/kubernetes/elastic-helm/elastic-architecture.png differ diff --git a/src/otelcollector/otelcol-elastic-config-extras.yaml b/src/otelcollector/otelcol-elastic-config-extras.yaml deleted file mode 100644 index fe26fbc1f3..0000000000 --- a/src/otelcollector/otelcol-elastic-config-extras.yaml +++ /dev/null @@ -1,22 +0,0 @@ -exporters: - otlp/elastic: - # !!! Elastic APM https endpoint WITHOUT the "https://" prefix - endpoint: "YOUR_APM_ENDPOINT_WITHOUT_HTTPS_PREFIX" - compression: none - headers: - Authorization: "YOUR_APM_SECRET_TOKEN" - -service: - pipelines: - traces: - receivers: [otlp] - processors: [batch] - exporters: [spanmetrics, otlp/elastic] - metrics: - receivers: [otlp, spanmetrics] - processors: [batch] - exporters: [otlp/elastic] - logs: - receivers: [otlp] - processors: [batch] - exporters: [otlp/elastic] diff --git a/src/otelcollector/otelcol-elastic-config.yaml b/src/otelcollector/otelcol-elastic-config.yaml index 1f65f74885..f3446bb8b2 100644 --- a/src/otelcollector/otelcol-elastic-config.yaml +++ b/src/otelcollector/otelcol-elastic-config.yaml @@ -9,12 +9,330 @@ receivers: allowed_origins: - "http://*" - "https://*" + httpcheck/frontendproxy: + targets: + - endpoint: http://frontendproxy:${env:ENVOY_PORT} exporters: debug: + elasticsearch/otel: + endpoints: + - "YOUR_ELASTICSEARCH_ENDPOINT" + api_key: "YOUR_ELASTICSEARCH_API_KEY" + logs_dynamic_index: + enabled: true + metrics_dynamic_index: + enabled: true + traces_dynamic_index: + enabled: true + mapping: + mode: otel processors: batch: + transform: + error_mode: ignore + trace_statements: + - context: span + statements: + # could be removed when https://github.com/vercel/next.js/pull/64852 is fixed upstream + - replace_pattern(name, "\\?.*", "") + - replace_match(name, "GET /api/products/*", "GET /api/products/{productId}") + # [Elastic Trace Processor](https://github.com/elastic/opentelemetry-collector-components/tree/main/processor/elastictraceprocessor) + elastictrace: {} # The processor enriches traces with elastic specific requirements. + # [LSM Interval Processor](https://github.com/elastic/opentelemetry-collector-components/tree/main/processor/lsmintervalprocessor) + lsminterval: + intervals: + - duration: 1m + statements: + - set(resource.attributes["metricset.interval"], "1m") + - set(attributes["data_stream.dataset"], Concat([attributes["metricset.name"], "1m"], ".")) + - set(attributes["processor.event"], "metric") + - duration: 10m + statements: + - set(resource.attributes["metricset.interval"], "10m") + - set(attributes["data_stream.dataset"], Concat([attributes["metricset.name"], "10m"], ".")) + - set(attributes["processor.event"], "metric") connectors: spanmetrics: + # [Signal To Metrics Connector](https://github.com/elastic/opentelemetry-collector-components/tree/main/connector/signaltometricsconnector) + signaltometrics: # Produces metrics from all signal types (traces, logs, or metrics). + logs: + - name: service_summary + include_resource_attributes: + - key: service.name + - key: deployment.environment # service.environment + - key: telemetry.sdk.language # service.language.name + - key: agent.name # set via elastictraceprocessor + attributes: + - key: metricset.name + default_value: service_summary + sum: + value: "1" + datapoints: + - name: service_summary + include_resource_attributes: + - key: service.name + - key: deployment.environment # service.environment + - key: telemetry.sdk.language # service.language.name + - key: agent.name # set via elastictraceprocessor + attributes: + - key: metricset.name + default_value: service_summary + sum: + value: "1" + spans: + - name: service_summary + include_resource_attributes: + - key: service.name + - key: deployment.environment # service.environment + - key: telemetry.sdk.language # service.language.name + - key: agent.name # set via elastictraceprocessor + attributes: + - key: metricset.name + default_value: service_summary + sum: + value: Int(AdjustedCount()) + - name: transaction.duration.histogram + description: APM service transaction aggregated metrics as histogram + include_resource_attributes: + - key: service.name + - key: deployment.environment # service.environment + - key: telemetry.sdk.language # service.language.name + - key: agent.name # set via elastictraceprocessor + attributes: + - key: transaction.root + - key: transaction.type + - key: metricset.name + default_value: service_transaction + - key: elasticsearch.mapping.hints + default_value: [_doc_count] + unit: us + exponential_histogram: + value: Microseconds(end_time - start_time) + max_size: 2 + - name: transaction.duration.summary + description: APM service transaction aggregated metrics as summary + include_resource_attributes: + - key: service.name + - key: deployment.environment # service.environment + - key: telemetry.sdk.language # service.language.name + - key: agent.name # set via elastictraceprocessor + attributes: + - key: transaction.root + - key: transaction.type + - key: metricset.name + default_value: service_transaction + - key: elasticsearch.mapping.hints + default_value: [aggregate_metric_double] + unit: us + histogram: + buckets: [1] + value: Microseconds(end_time - start_time) + - name: transaction.duration.histogram + description: APM transaction aggregated metrics as histogram + ephemeral_resource_attribute: true + include_resource_attributes: + - key: service.name + - key: deployment.environment # service.environment + - key: telemetry.sdk.language # service.language.name + - key: agent.name # set via elastictraceprocessor + - key: container.id + - key: k8s.pod.name + - key: service.version + - key: service.instance.id # service.node.name + - key: process.runtime.name # service.runtime.name + - key: process.runtime.version # service.runtime.version + - key: telemetry.sdk.version # service.language.version?? + - key: host.name + - key: os.type # host.os.platform + - key: faas.instance + - key: faas.name + - key: faas.version + - key: cloud.provider + - key: cloud.region + - key: cloud.availability_zone + - key: cloud.platform # cloud.servicename + - key: cloud.account.id + attributes: + - key: transaction.root + - key: transaction.name + - key: transaction.type + - key: transaction.result + - key: event.outcome + - key: metricset.name + default_value: transaction + - key: elasticsearch.mapping.hints + default_value: [_doc_count] + unit: us + exponential_histogram: + value: Microseconds(end_time - start_time) + max_size: 2 + - name: transaction.duration.summary + description: APM transaction aggregated metrics as summary + ephemeral_resource_attribute: true + include_resource_attributes: + - key: service.name + - key: deployment.environment # service.environment + - key: telemetry.sdk.language # service.language.name + - key: agent.name # set via elastictraceprocessor + - key: container.id + - key: k8s.pod.name + - key: service.version + - key: service.instance.id # service.node.name + - key: process.runtime.name # service.runtime.name + - key: process.runtime.version # service.runtime.version + - key: telemetry.sdk.version # service.language.version?? + - key: host.name + - key: os.type # host.os.platform + - key: faas.instance + - key: faas.name + - key: faas.version + - key: cloud.provider + - key: cloud.region + - key: cloud.availability_zone + - key: cloud.platform # cloud.servicename + - key: cloud.account.id + attributes: + - key: transaction.root + - key: transaction.name + - key: transaction.type + - key: transaction.result + - key: event.outcome + - key: metricset.name + default_value: transaction + - key: elasticsearch.mapping.hints + default_value: [aggregate_metric_double] + unit: us + histogram: + buckets: [1] + value: Microseconds(end_time - start_time) + - name: span.destination.service.response_time.sum.us + description: APM span destination metrics + ephemeral_resource_attribute: true + include_resource_attributes: + - key: service.name + - key: deployment.environment # service.environment + - key: telemetry.sdk.language # service.language.name + - key: agent.name # set via elastictraceprocessor + attributes: + - key: span.name + - key: event.outcome + - key: service.target.type + - key: service.target.name + - key: span.destination.service.resource + - key: metricset.name + default_value: service_destination + unit: us + sum: + value: Double(Microseconds(end_time - start_time)) + - name: span.destination.service.response_time.count + description: APM span destination metrics + ephemeral_resource_attribute: true + include_resource_attributes: + - key: service.name + - key: deployment.environment # service.environment + - key: telemetry.sdk.language # service.language.name + - key: agent.name # set via elastictraceprocessor + attributes: + - key: span.name + - key: event.outcome + - key: service.target.type + - key: service.target.name + - key: span.destination.service.resource + - key: metricset.name + default_value: service_destination + sum: + value: Int(AdjustedCount()) + # event.success_count is populated using 2 metric definition with different conditions + # and value for the histogram bucket based on event outcome. Both metric definition + # are created using same name and attribute and will result in a single histogram. + # We use mapping hint of aggregate_metric_double, so, only the sum and the count + # values are required and the actual histogram bucket is ignored. + - name: event.success_count + description: Success count as a metric for service transaction + include_resource_attributes: + - key: service.name + - key: deployment.environment # service.environment + - key: telemetry.sdk.language # service.language.name + - key: agent.name # set via elastictraceprocessor + attributes: + - key: transaction.root + - key: transaction.type + - key: metricset.name + default_value: service_transaction + - key: elasticsearch.mapping.hints + default_value: [aggregate_metric_double] + conditions: + - attributes["event.outcome"] != nil and attributes["event.outcome"] == "success" + unit: us + histogram: + buckets: [1] + count: Int(AdjustedCount()) + value: Int(AdjustedCount()) + - name: event.success_count + description: Success count as a metric for service transaction + include_resource_attributes: + - key: service.name + - key: deployment.environment # service.environment + - key: telemetry.sdk.language # service.language.name + - key: agent.name # set via elastictraceprocessor + attributes: + - key: transaction.root + - key: transaction.type + - key: metricset.name + default_value: service_transaction + - key: elasticsearch.mapping.hints + default_value: [aggregate_metric_double] + conditions: + - attributes["event.outcome"] != nil and attributes["event.outcome"] != "success" + unit: us + histogram: + buckets: [0] + count: Int(AdjustedCount()) + value: Double(0) + +service: + pipelines: + logs: + exporters: + - debug + - elasticsearch/otel + - signaltometrics + processors: + - batch + receivers: + - otlp + metrics: + exporters: + - elasticsearch/otel + - signaltometrics + - debug + processors: + - batch + receivers: + - httpcheck/frontendproxy + - otlp + - spanmetrics + traces: + exporters: + - elasticsearch/otel + - debug + - spanmetrics + - signaltometrics + processors: + - transform + - batch + - elastictrace + receivers: + - otlp + metrics/aggregated-otel-metrics: + receivers: + - signaltometrics + processors: + - batch + - lsminterval + exporters: + - debug + - elasticsearch/otel