diff --git a/docs/apm-components.asciidoc b/docs/apm-components.asciidoc new file mode 100644 index 00000000000..1d9adaa02c0 --- /dev/null +++ b/docs/apm-components.asciidoc @@ -0,0 +1,64 @@ +[[apm-components]] +=== Components and documentation + +Elastic APM consists of four components: *APM agents*, *Elastic Agent*, *Elasticsearch*, and *Kibana*. + +image::./images/apm-architecture.png[Architecture of Elastic APM] + +[float] +==== APM Agents + +APM agents are open source libraries written in the same language as your service. +You may only need one, or you might use all of them. +You install them into your service as you would install any other library. +They instrument your code and collect performance data and errors at runtime. +This data is buffered for a short period and sent on to APM Server. + +Each agent has its own documentation: + +* {apm-go-ref-v}/introduction.html[Go agent] +* {apm-ios-ref-v}/intro.html[iOS agent] +* {apm-java-ref-v}/intro.html[Java agent] +* {apm-dotnet-ref-v}/intro.html[.NET agent] +* {apm-node-ref-v}/intro.html[Node.js agent] +* {apm-php-ref-v}/intro.html[PHP agent] +* {apm-py-ref-v}/getting-started.html[Python agent] +* {apm-ruby-ref-v}/introduction.html[Ruby agent] +* {apm-rum-ref-v}/intro.html[JavaScript Real User Monitoring (RUM) agent] + +[float] +==== Elastic Agent + +Elastic Agent is a single, unified way to add monitoring for logs, metrics, traces, and other types of data to each host. +A single agent makes it easier and faster to deploy monitoring across your infrastructure. +The agent’s single, unified policy makes it easier to add integrations for new data sources. + +The APM integration within Elastic Agent receives performance data from your APM agents, +validates and processes it, and then transforms the data into {es} documents. +Removing this logic from APM agents help keeps them light, prevents certain security risks, +and improves compatibility across the Elastic Stack. + +[float] +==== Elasticsearch + +{ref}/index.html[Elasticsearch] is a highly scalable free and open full-text search and analytics engine. +It allows you to store, search, and analyze large volumes of data quickly and in near real time. +Elasticsearch is used to store APM performance metrics and make use of its aggregations. + +[float] +==== Kibana APM app + +{kibana-ref}/index.html[Kibana] is a free and open analytics and visualization platform designed to work with Elasticsearch. +You use Kibana to search, view, and interact with data stored in Elasticsearch. + +Since application performance monitoring is all about visualizing data and detecting bottlenecks, +it's crucial you understand how to use the {kibana-ref}/xpack-apm.html[APM app] in Kibana. +The following sections will help you get started: + +* {apm-app-ref}/apm-ui.html[Set up] +* {apm-app-ref}/apm-getting-started.html[Get started] +* {apm-app-ref}/apm-how-to.html[How-to guides] + +APM also has built-in integrations with Machine learning. To learn more about this feature, +or the anomaly detection feature that's built on top of it, +refer to {kibana-ref}/machine-learning-integration.html[Machine learning integration]. diff --git a/docs/apm-distributed-tracing.asciidoc b/docs/apm-distributed-tracing.asciidoc index ed653c92f5c..3cfd69cc652 100644 --- a/docs/apm-distributed-tracing.asciidoc +++ b/docs/apm-distributed-tracing.asciidoc @@ -101,9 +101,9 @@ with each agent's API. Sending services must add the `traceparent` header to outgoing requests. -// -- -// include::../../tab-widgets/distributed-trace-send-widget.asciidoc[] -// -- +-- +include::./tab-widgets/distributed-trace-send-widget.asciidoc[] +-- [float] [[distributed-tracing-incoming]] @@ -112,9 +112,9 @@ Sending services must add the `traceparent` header to outgoing requests. Receiving services must parse the incoming `traceparent` header, and start a new transaction or span as a child of the received context. -// -- -// include::../../tab-widgets/distributed-trace-receive-widget.asciidoc[] -// -- +-- +include::./tab-widgets/distributed-trace-receive-widget.asciidoc[] +-- [float] [[distributed-tracing-rum]] diff --git a/docs/apm-overview.asciidoc b/docs/apm-overview.asciidoc index c6a2083f653..fcee5225d9b 100644 --- a/docs/apm-overview.asciidoc +++ b/docs/apm-overview.asciidoc @@ -22,5 +22,6 @@ like JVM metrics in the Java Agent, and Go runtime metrics in the Go Agent. [float] === Give Elastic APM a try -// Learn more about the <> that make up Elastic APM, +Learn more about the <> that make up Elastic APM +// , // or jump right into the <>. diff --git a/docs/apm-rum.asciidoc b/docs/apm-rum.asciidoc new file mode 100644 index 00000000000..c39eaa2a85a --- /dev/null +++ b/docs/apm-rum.asciidoc @@ -0,0 +1,12 @@ +[[apm-rum]] +=== Real User Monitoring (RUM) +Real User Monitoring captures user interaction with clients such as web browsers. +The {apm-rum-ref-v}[JavaScript Agent] is Elastic’s RUM Agent. +// To use it you need to {apm-server-ref-v}/configuration-rum.html[enable RUM support] in the APM Server. + +Unlike Elastic APM backend agents which monitor requests and responses, +the RUM JavaScript agent monitors the real user experience and interaction within your client-side application. +The RUM JavaScript agent is also framework-agnostic, which means it can be used with any frontend JavaScript application. + +You will be able to measure metrics such as "Time to First Byte", `domInteractive`, +and `domComplete` which helps you discover performance issues within your client-side application as well as issues that relate to the latency of your server-side application. diff --git a/docs/cross-cluster-search.asciidoc b/docs/cross-cluster-search.asciidoc new file mode 100644 index 00000000000..38097a27dd2 --- /dev/null +++ b/docs/cross-cluster-search.asciidoc @@ -0,0 +1,42 @@ +[[cross-cluster-search]] +=== Cross-cluster search + +Elastic APM utilizes Elasticsearch's cross-cluster search functionality. +Cross-cluster search lets you run a single search request against one or more +{ref}/modules-remote-clusters.html[remote clusters] -- +making it easy to search APM data across multiple sources. +This means you can also have deployments per data type, making sizing and scaling more predictable, +and allowing for better performance while managing multiple observability use cases. + +[float] +[[set-up-cross-cluster-search]] +==== Set up cross-cluster search + +*Step 1. Set up remote clusters.* + +If you're using the Hosted Elasticsearch Service, see {cloud}/ec-enable-ccs.html[Enable cross-cluster search]. + +You can add remote clusters directly in Kibana, under *Management* > *Elasticsearch* > *Remote clusters*. +All you need is a name for the remote cluster and the seed node(s). +Remember the names of your remote clusters, you'll need them in step two. +See {ref}/ccr-getting-started.html[managing remote clusters] for detailed information on the setup process. + +Alternatively, you can {ref}/modules-remote-clusters.html#configuring-remote-clusters[configure remote clusters] +in Elasticsearch's `elasticsearch.yml` file. + +*Step 2. Edit the default APM app index pattern.* + +The APM app index pattern determines from which clusters and indices to display data. +Index patterns follow this convention: `:`. +The default value is `apm-*`, which displays data from any index beginning with `apm-`. + +To display data from the local cluster, and all remote clusters, +you'll need to update the index pattern to: `*:apm-*,apm-*`. +You can also specify certain clusters to display data from, for example, `obs-1:apm-*,obs-2:apm-*`. + +There are two ways to edit the default index pattern: + +* In the APM app -- Navigate to *APM* > *Settings* > *Indices*, and change all `xpack.apm.indices.*` values to +include the new index pattern, e.g., `*:apm-*,apm-*`. +* In `kibana.yml` -- All of the {kibana-ref}/apm-settings-kb.html[`xpack.apm.indices.*`] configuration values must +include the new index pattern, e.g., `*:apm-*,apm-*`. diff --git a/docs/features.asciidoc b/docs/features.asciidoc index d8ec7bbab47..e16b55c9030 100644 --- a/docs/features.asciidoc +++ b/docs/features.asciidoc @@ -7,25 +7,22 @@ * <> * <> -// * <> -// * <> -// * <> -// * <> -// * <> -// * <> +* <> +* <> +* <> +* <> +* <> include::./apm-data-security.asciidoc[] include::./apm-distributed-tracing.asciidoc[] -// include::./rum.asciidoc[] +include::./apm-rum.asciidoc[] -// include::./trace-sampling.asciidoc[] +include::./sampling.asciidoc[] -// include::./opentracing.asciidoc[] +include::./open-telemetry.asciidoc[] -// include::./opentelemetry-elastic.asciidoc[] +include::./log-correlation.asciidoc[] -// include::./obs-integrations.asciidoc[] - -// include::./cross-cluster-search.asciidoc[] \ No newline at end of file +include::./cross-cluster-search.asciidoc[] diff --git a/docs/images/apm-architecture.png b/docs/images/apm-architecture.png new file mode 100644 index 00000000000..372ea225586 Binary files /dev/null and b/docs/images/apm-architecture.png differ diff --git a/docs/images/dt-sampling-example.png b/docs/images/dt-sampling-example.png new file mode 100644 index 00000000000..015b7c67e7f Binary files /dev/null and b/docs/images/dt-sampling-example.png differ diff --git a/docs/integrations-index.asciidoc b/docs/integrations-index.asciidoc index 83e4c7a096d..2b7a46e1435 100644 --- a/docs/integrations-index.asciidoc +++ b/docs/integrations-index.asciidoc @@ -9,6 +9,8 @@ include::{asciidoc-dir}/../../shared/attributes.asciidoc[] include::apm-overview.asciidoc[] +include::apm-components.asciidoc[] + == Quick start Include quick start file from obs-docs repo diff --git a/docs/log-correlation.asciidoc b/docs/log-correlation.asciidoc new file mode 100644 index 00000000000..d60bf16fbb5 --- /dev/null +++ b/docs/log-correlation.asciidoc @@ -0,0 +1,176 @@ +[[log-correlation]] +=== Logging integration + +Many applications use logging frameworks to help record, format, and append an application's logs. +Elastic APM now offers a way to make your application logs even more useful, +by integrating with the most popular logging frameworks in their respective languages. +This means you can easily inject trace information into your logs, +allowing you to explore logs in the {observability-guide}/monitor-logs.html[Logs app], +then jump straight into the corresponding APM traces -- all while preserving the trace context. + +To get started: + +. Enable log correlation +. Add APM identifiers to your logs +. Ingest your logs into Elasticsearch + +[float] +==== Enable Log correlation + +Some Agents require you to first enable log correlation in the Agent. +This is done with a configuration variable, and is different for each Agent. +See the relevant https://www.elastic.co/guide/en/apm/agent/index.html[Agent documentation] for further information. + +// Not enough of the Agent docs are ready yet. +// Commenting these out and will replace when ready. +// * *Java*: {apm-java-ref-v}/config-logging.html#config-enable-log-correlation[`enable_log_correlation`] +// * *.NET*: {apm-dotnet-ref-v}/[] +// * *Node.js*: {apm-node-ref-v}/[] +// * *Python*: {apm-py-ref-v}/[] +// * *Ruby*: {apm-ruby-ref-v}/[] +// * *Rum*: {apm-rum-ref-v}/[] + +[float] +==== Add APM identifiers to your logs + +Once log correlation is enabled, +you must ensure your logs contain APM identifiers. +In some supported frameworks, this is already done for you. +In other scenarios, like for unstructured logs, +you'll need to add APM identifiers to your logs in any easy to parse manner. + +The identifiers we're interested in are: {ecs-ref}/ecs-tracing.html[`trace.id`] and +{ecs-ref}/ecs-tracing.html[`transaction.id`]. Certain Agents also support the `span.id` field. + +This process for adding these fields will differ based the Agent you're using, the logging framework, +and the type and structure of your logs. + +See the relevant https://www.elastic.co/guide/en/apm/agent/index.html[Agent documentation] to learn more. + +// Not enough of the Agent docs have been backported yet. +// Commenting these out and will replace when ready. +// * *Go*: {apm-go-ref-v}/supported-tech.html#supported-tech-logging[Logging frameworks] +// * *Java*: {apm-java-ref-v}/[] NOT merged yet https://github.com/elastic/apm-agent-java/pull/854 +// * *.NET*: {apm-dotnet-ref-v}/[] +// * *Node.js*: {apm-node-ref-v}/[] +// * *Python*: {apm-py-ref-v}/[] +// * *Ruby*: {apm-ruby-ref-v}/[] Not backported yet https://www.elastic.co/guide/en/apm/agent/ruby/master/log-correlation.html +// * *Rum*: {apm-rum-ref-v}/[] + +[float] +==== Ingest your logs into Elasticsearch + +Once your logs contain the appropriate identifiers (fields), you need to ingest them into Elasticsearch. +Luckily, we've got a tool for that -- Filebeat is Elastic's log shipper. +The {filebeat-ref}/filebeat-installation-configuration.html[Filebeat quick start] +guide will walk you through the setup process. + +Because logging frameworks and formats vary greatly between different programming languages, +there is no one-size-fits-all approach for ingesting your logs into Elasticsearch. +The following tips should hopefully get you going in the right direction: + +**Download Filebeat** + +There are many ways to download and get started with Filebeat. +Read the {filebeat-ref}/filebeat-installation-configuration.html[Filebeat quick start] guide to determine which is best for you. + +**Configure Filebeat** + +Modify the {filebeat-ref}/configuring-howto-filebeat.html[`filebeat.yml`] configuration file to your needs. +Here are some recommendations: + +* Set `filebeat.inputs` to point to the source of your logs +* Point Filebeat to the same Elastic Stack that is receiving your APM data + * If you're using Elastic cloud, set `cloud.id` and `cloud.auth`. + * If your using a manual setup, use `output.elasticsearch.hosts`. + +[source,yml] +---- +filebeat.inputs: +- type: log <1> + paths: <2> + - /var/log/*.log +cloud.id: "staging:dXMtZWFzdC0xLmF3cy5mb3VuZC5pbyRjZWMNjN2Q3YTllOTYyNTc0Mw==" <3> +cloud.auth: "elastic:YOUR_PASSWORD" <4> +---- +<1> Configures the `log` input +<2> Path(s) that must be crawled to fetch the log lines +<3> Used to resolve the Elasticsearch and Kibana URLs for Elastic Cloud +<4> Authorization token for Elastic Cloud + +**JSON logs** + +For JSON logs you can use the {filebeat-ref}/filebeat-input-log.html[`log` input] to read lines from log files. +Here's what a sample configuration might look like: + +[source,yml] +---- +filebeat.inputs: + json.keys_under_root: true <1> + json.add_error_key: true <2> + json.message_key: message <3> +---- +<1> `true` copies JSON keys to the top level in the output document +<2> Tells Filebeat to add an `error.message` and `error.type: json` key in case of JSON unmarshalling errors +<3> Specifies the JSON key on which to apply line filtering and multiline settings + +**Parsing unstructured logs** + +Consider the following log that is decorated with the `transaction.id` and `trace.id` fields: + +[source,log] +---- +2019-09-18 21:29:49,525 - django.server - ERROR - "GET / HTTP/1.1" 500 27 | elasticapm transaction.id=fcfbbe447b9b6b5a trace.id=f965f4cc5b59bdc62ae349004eece70c span.id=None +---- + +All that's needed now is an {filebeat-ref}/configuring-ingest-node.html[ingest node processor] to pre-process your logs and +extract these structured fields before they are indexed in Elasticsearch. +To do this, you'd need to create a pipeline that uses Elasticsearch's {ref}/grok-processor.html[Grok Processor]. +Here's an example: + +[source, json] +---- +PUT _ingest/pipeline/log-correlation +{ + "description": "Parses the log correlation IDs out of the raw plain-text log", + "processors": [ + { + "grok": { + "field": "message", <1> + "patterns": ["%{GREEDYDATA:message} | elasticapm transaction.id=%{DATA:transaction.id} trace.id=%{DATA:trace.id} span.id=%{DATA:span.id}"] <2> + } + } + ] +} +---- +<1> The field to use for grok expression parsing +<2> An ordered list of grok expression to match and extract named captures with: +`%{DATA:transaction.id}` captures the value of `transaction.id`, +`%{DATA:trace.id}` captures the value or `trace.id`, and +`%{DATA:span.id}` captures the value of `span.id`. + +NOTE: Depending on how you've added APM data to your logs, +you may need to tweak this grok pattern in order to work for your setup. +In addition, it's possible to extract more structure out of your logs. +Make sure to follow the {ecs-ref}/ecs-field-reference.html[Elastic Common Schema] +when defining which fields you are storing in Elasticsearch. + +Then, configure Filebeat to use the processor in `filebeat.yml`: + +[source, json] +---- +output.elasticsearch: + pipeline: "log-correlation" +---- + +If your logs contain messages that span multiple lines of text (common in Java stack traces), +you'll also need to configure {filebeat-ref}/multiline-examples.html[multiline settings]. + +The following example shows how to configure Filebeat to handle a multiline message where the first line of the message begins with a bracket ([). + +[source,yml] +---- +multiline.pattern: '^\[' +multiline.negate: true +multiline.match: after +---- diff --git a/docs/open-telemetry.asciidoc b/docs/open-telemetry.asciidoc new file mode 100644 index 00000000000..8ef147e4ce2 --- /dev/null +++ b/docs/open-telemetry.asciidoc @@ -0,0 +1,404 @@ +[[open-telemetry]] +=== OpenTelemetry integration + +:ot-spec: https://github.com/open-telemetry/opentelemetry-specification/blob/master/README.md +:ot-contrib: https://github.com/open-telemetry/opentelemetry-collector-contrib +:ot-repo: https://github.com/open-telemetry/opentelemetry-collector +:ot-pipelines: https://opentelemetry.io/docs/collector/configuration/#service +:ot-extension: {ot-repo}/blob/master/extension/README.md +:ot-scaling: {ot-repo}/blob/master/docs/performance.md + +:ot-collector: https://opentelemetry.io/docs/collector/getting-started/ +:ot-dockerhub: https://hub.docker.com/r/otel/opentelemetry-collector-contrib + +https://opentelemetry.io/docs/concepts/what-is-opentelemetry/[OpenTelemetry] is a set +of APIs, SDKs, tooling, and integrations that enable the capture and management of +telemetry data from your services for greater observability. For more information about the +OpenTelemetry project, see the {ot-spec}[spec]. + +Elastic OpenTelemetry integrations allow you to reuse your existing OpenTelemetry +instrumentation to quickly analyze distributed traces and metrics to help you monitor +business KPIs and technical components with the {stack}. + +[float] +[[open-telemetry-native]] +==== APM Server native support of OpenTelemetry protocol + +IMPORTANT: The https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/exporter/elasticexporter#legacy-opentelemetry-collector-exporter-for-elastic[OpenTelemetry Collector exporter for Elastic] +was deprecated in 7.13 and replaced by the native support of the OpenTelemetry Line Protocol in +Elastic Observability (OTLP). To learn more, see +https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/exporter/elasticexporter#migration[migration]. + +Elastic APM Server natively supports the OpenTelemetry protocol. +This means trace data and metrics collected from your applications and infrastructure can +be sent directly to Elastic APM Server using the OpenTelemetry protocol. + +image::./../legacy/guide/images/open-telemetry-protocol-arch.png[OpenTelemetry Elastic architecture diagram] + +[float] +[[instrument-apps-otel]] +====== Instrument applications + +To export traces and metrics to APM Server, ensure that you have instrumented your services and applications +with the OpenTelemetry API, SDK, or both. For example, if you are a Java developer, you need to instrument your Java app using the +https://github.com/open-telemetry/opentelemetry-java-instrumentation[OpenTelemetry agent for Java]. + +By defining the following environment variables, you can configure the OTLP endpoint so that the OpenTelemetry agent communicates with +APM Server. + +[source,bash] +---- +export OTEL_RESOURCE_ATTRIBUTES=service.name=checkoutService,service.version=1.1,deployment.environment=production +export OTEL_EXPORTER_OTLP_ENDPOINT=https://apm_server_url:8200 +export OTEL_EXPORTER_OTLP_HEADERS="Authorization=Bearer an_apm_secret_token" +java -javaagent:/path/to/opentelemetry-javaagent-all.jar \ + -classpath lib/*:classes/ \ + com.mycompany.checkout.CheckoutServiceServer +---- + +|=== + +| `OTEL_RESOURCE_ATTRIBUTES` | The service name to identify your application. + +| `OTEL_EXPORTER_OTLP_ENDPOINT` | APM Server URL. The host and port that APM Server listens for events on. + +| `OTEL_EXPORTER_OTLP_HEADERS` | Authorization header that includes the Elastic APM Secret token or API key: `"Authorization=Bearer an_apm_secret_token"` or `"Authorization=ApiKey an_api_key"`. + +For information on how to format an API key, see our {apm-server-ref-v}/api-key.html[API key] docs. + +Please note the required space between `Bearer` and `an_apm_secret_token`, and `APIKey` and `an_api_key`. + +| `OTEL_EXPORTER_OTLP_CERTIFICATE` | Certificate for TLS credentials of the gRPC client. (optional) + +|=== + +You are now ready to collect traces and <> before <> +and <> in {kib}. + +[float] +[[connect-open-telemetry-collector]] +===== Connect OpenTelemetry Collector instances + +Using the OpenTelemetry collector instances in your architecture, you can connect them to Elastic Observability using the OTLP exporter. + +[source,yaml] +---- +receivers: <1> + # ... + otlp: + +processors: <2> + # ... + memory_limiter: + check_interval: 1s + limit_mib: 2000 + batch: + +exporters: + logging: + loglevel: warn <3> + otlp/elastic: <4> + # Elastic APM server https endpoint without the "https://" prefix + endpoint: "${ELASTIC_APM_SERVER_ENDPOINT}" <5> <7> + headers: + # Elastic APM Server secret token + Authorization: "Bearer ${ELASTIC_APM_SERVER_TOKEN}" <6> <7> + +service: + pipelines: + traces: + receivers: [otlp] + exporters: [logging, otlp/elastic] + metrics: + receivers: [otlp] + exporters: [logging, otlp/elastic] +---- +<1> The receivers, such as +the https://github.com/open-telemetry/opentelemetry-collector/tree/main/receiver/otlpreceiver[OTLP receiver], that forward data emitted by APM agents or the https://github.com/open-telemetry/opentelemetry-collector/tree/main/receiver/hostmetricsreceiver[host metrics receiver]. +<2> We recommend using the https://github.com/open-telemetry/opentelemetry-collector/blob/main/processor/batchprocessor/README.md[Batch processor] and also suggest using the https://github.com/open-telemetry/opentelemetry-collector/blob/main/processor/memorylimiter/README.md[memory limiter processor]. For more information, see https://github.com/open-telemetry/opentelemetry-collector/blob/main/processor/README.md#recommended-processors[Recommended processors]. +<3> The https://github.com/open-telemetry/opentelemetry-collector/tree/main/exporter/loggingexporter[logging exporter] is helpful for troubleshooting and supports various logging levels: `debug`, `info`, `warn`, and `error`. +<4> Elastic Observability endpoint configuration. To learn more, see https://github.com/open-telemetry/opentelemetry-collector/tree/main/exporter/otlpexporter[OpenTelemetry Collector > OTLP gRPC exporter]. +<5> Hostname and port of the APM Server endpoint. For example, `elastic-apm-server:8200`. +<6> Credential for Elastic APM {apm-server-ref-v}/secret-token.html[secret token authorization] (`Authorization: "Bearer a_secret_token"`) or {apm-server-ref-v}/api-key.html[API key authorization] (`Authorization: "ApiKey an_api_key"`). +<7> Environment-specific configuration parameters can be conveniently passed in as environment variables documented https://opentelemetry.io/docs/collector/configuration/#configuration-environment-variables[here] (e.g. `ELASTIC_APM_SERVER_ENDPOINT` and `ELASTIC_APM_SERVER_TOKEN`). + +TIP: When collecting infrastructure metrics, we recommend evaluating {metricbeat-ref}/metricbeat-overview.html[{metricbeat}] to get a mature collector with more integrations +and built-in dashboards. + +You're now ready to export traces and metrics from your services and applications. + +[float] +[[open-telemetry-collect-metrics]] +==== Collect metrics + +IMPORTANT: When collecting metrics, please note that the https://www.javadoc.io/doc/io.opentelemetry/opentelemetry-api/latest/io/opentelemetry/api/metrics/DoubleValueRecorder.html[`DoubleValueRecorder`] +and https://www.javadoc.io/doc/io.opentelemetry/opentelemetry-api/latest/io/opentelemetry/api/metrics/LongValueObserver.html[`LongValueRecorder`] metrics are not yet supported. + +Here's an example of how to capture business metrics from a Java application. + +[source,java] +---- +// initialize metric +Meter meter = GlobalMetricsProvider.getMeter("my-frontend"); +DoubleCounter orderValueCounter = meter.doubleCounterBuilder("order_value").build(); + +public void createOrder(HttpServletRequest request) { + + // create order in the database + ... + // increment business metrics for monitoring + orderValueCounter.add(orderPrice); +} +---- + +See the https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/api.md[Open Telemetry Metrics API] +for more information. + +[float] +[[open-telemetry-verify-metrics]] +===== Verify OpenTelemetry metrics data + +Use *Discover* to validate that metrics are successfully reported to {kib}. + +. Launch {kib}: ++ +-- +include::./tab-widgets/open-kibana-widget.asciidoc[] +-- + +. Open the main menu, then click *Discover*. +. Select `apm-*` as your index pattern. +. Filter the data to only show documents with metrics: `processor.name :"metric"` +. Narrow your search with a known OpenTelemetry field. For example, if you have an `order_value` field, add `order_value: *` to your search to return +only OpenTelemetry metrics documents. + +[float] +[[open-telemetry-visualize]] +===== Visualize in {kib} + +TSVB within {kib} is the recommended visualization for OpenTelemetry metrics. TSVB is a time series data visualizer that allows you to use the +{es} aggregation framework's full power. With TSVB, you can combine an infinite number of aggregations to display complex data. + +In this example eCommerce OpenTelemetry dashboard, there are four visualizations: sales, order count, product cache, and system load. The dashboard provides us with business +KPI metrics, along with performance-related metrics. + +[role="screenshot"] +image::./../legacy/guide/images/ecommerce-dashboard.png[OpenTelemetry visualizations] + +Let's look at how this dashboard was created, specifically the Sales USD and System load visualizations. + +. Open the main menu, then click *Dashboard*. +. Click *Create dashboard*. +. Click *Save*, enter the name of your dashboard, and then click *Save* again. +. Let’s add a Sales USD visualization. Click *Edit*. +. Click *Create new* and then select *TSVB*. +. For the label name, enter Sales USD, and then select the following: ++ +* Aggregation: `Positive Rate`. +* Field: `order_sum`. +* Scale: `auto`. +* Group by: `Everything` +. Click *Save*, enter Sales USD as the visualization name, and then click *Save and return*. +. Now let's create a visualization of load averages on the system. Click *Create new*. +. Select *TSVB*. +. Select the following: ++ +* Aggregation: `Average`. +* Field: `system.cpu.load_average.1m`. +* Group by: `Terms`. +* By: `host.ip`. +* Top: `10`. +* Order by: `Doc Count (default)`. +* Direction: `Descending`. +. Click *Save*, enter System load per host IP as the visualization name, and then click *Save and return*. ++ +Both visualizations are now displayed on your custom dashboard. + +IMPORTANT: By default, Discover shows data for the last 15 minutes. If you have a time-based index +and no data displays, you might need to increase the time range. + +[float] +[[open-telemetry-aws-lambda]] +==== AWS Lambda Support + +AWS Lambda functions can be instrumented with OpenTelemetry and monitored with Elastic Observability. + +To get started, follow the official AWS Distro for OpenTelemetry Lambda https://aws-otel.github.io/docs/getting-started/lambda[getting started documentation] and configure the OpenTelemetry Collector to output traces and metrics to your Elastic cluster. + +[float] +[[open-telemetry-aws-lambda-java]] +===== Instrumenting AWS Lambda Java functions + +NOTE: For a better startup time, we recommend using SDK-based instrumentation, i.e. manual instrumentation of the code, rather than auto instrumentation. + +To instrument AWS Lambda Java functions, follow the official https://aws-otel.github.io/docs/getting-started/lambda/lambda-java[AWS Distro for OpenTelemetry Lambda Support For Java]. + +Noteworthy configuration elements: + +* AWS Lambda Java functions should extend `com.amazonaws.services.lambda.runtime.RequestHandler`, ++ +[source,java] +---- +public class ExampleRequestHandler implements RequestHandler { + public APIGatewayProxyResponseEvent handleRequest(APIGatewayProxyRequestEvent event, Context context) { + // add your code ... + } +} +---- + +* When using SDK-based instrumentation, frameworks you want to gain visibility of should be manually instrumented +** The below example instruments https://square.github.io/okhttp/4.x/okhttp/okhttp3/-ok-http-client/[OkHttpClient] with the OpenTelemetry instrument https://search.maven.org/artifact/io.opentelemetry.instrumentation/opentelemetry-okhttp-3.0/1.3.1-alpha/jar[io.opentelemetry.instrumentation:opentelemetry-okhttp-3.0:1.3.1-alpha] ++ +[source,java] +---- +import io.opentelemetry.instrumentation.okhttp.v3_0.OkHttpTracing; + +OkHttpClient httpClient = new OkHttpClient.Builder() + .addInterceptor(OkHttpTracing.create(GlobalOpenTelemetry.get()).newInterceptor()) + .build(); +---- + +* The configuration of the OpenTelemetry Collector, with the definition of the Elastic Observability endpoint, can be added to the root directory of the Lambda binaries (e.g. defined in `src/main/resources/opentelemetry-collector.yaml`) ++ +[source,yaml] +---- +# Copy opentelemetry-collector.yaml in the root directory of the lambda function +# Set an environment variable 'OPENTELEMETRY_COLLECTOR_CONFIG_FILE' to '/var/task/opentelemetry-collector.yaml' +receivers: + otlp: + protocols: + http: + grpc: + +exporters: + logging: + loglevel: debug + otlp/elastic: + # Elastic APM server https endpoint without the "https://" prefix + endpoint: "${ELASTIC_OTLP_ENDPOINT}" <1> + headers: + # Elastic APM Server secret token + Authorization: "Bearer ${ELASTIC_OTLP_TOKEN}" <1> + +service: + pipelines: + traces: + receivers: [otlp] + exporters: [logging, otlp/elastic] + metrics: + receivers: [otlp] + exporters: [logging, otlp/elastic] +---- +<1> Environment-specific configuration parameters can be conveniently passed in as environment variables: `ELASTIC_OTLP_ENDPOINT` and `ELASTIC_OTLP_TOKEN` + +* Configure the AWS Lambda Java function with: +** https://docs.aws.amazon.com/lambda/latest/dg/API_Layer.html[Function +layer]: The latest https://aws-otel.github.io/docs/getting-started/lambda/lambda-java[AWS +Lambda layer for OpenTelemetry] (e.g. `arn:aws:lambda:eu-west-1:901920570463:layer:aws-otel-java-wrapper-ver-1-2-0:1`) +** https://docs.aws.amazon.com/lambda/latest/dg/API_TracingConfig.html[TracingConfig / Mode] set to `PassTrough` +** https://docs.aws.amazon.com/lambda/latest/dg/API_FunctionConfiguration.html[FunctionConfiguration / Timeout] set to more than 10 seconds to support the longer cold start inherent to the Lambda Java Runtime +** Export the environment variables: +*** `AWS_LAMBDA_EXEC_WRAPPER="/opt/otel-proxy-handler"` for wrapping handlers proxied through the API Gateway (see https://aws-otel.github.io/docs/getting-started/lambda/lambda-java#enable-auto-instrumentation-for-your-lambda-function[here]) +*** `OTEL_PROPAGATORS="tracecontext, baggage"` to override the default setting that also enables X-Ray headers causing interferences between OpenTelemetry and X-Ray +*** `OPENTELEMETRY_COLLECTOR_CONFIG_FILE="/var/task/opentelemetry-collector.yaml"` to specify the path to your OpenTelemetry Collector configuration + +[float] +[[open-telemetry-aws-lambda-java-terraform]] +===== Instrumenting AWS Lambda Java functions with Terraform + +We recommend using an infrastructure as code solution like Terraform or Ansible to manage the configuration of your AWS Lambda functions. + +Here is an example of AWS Lambda Java function managed with Terraform and the https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/lambda_function[AWS Provider / Lambda Functions]: + +* Sample Terraform code: https://github.com/cyrille-leclerc/my-serverless-shopping-cart/tree/main/checkout-function/deploy +* Note that the Terraform code to manage the HTTP API Gateway (https://github.com/cyrille-leclerc/my-serverless-shopping-cart/tree/main/utils/terraform/api-gateway-proxy[here]) is copied from the official OpenTelemetry Lambda sample https://github.com/open-telemetry/opentelemetry-lambda/tree/e72467a085a2a6e57af133032f85ac5b8bbbb8d1/utils[here] + +[float] +[[open-telemetry-aws-lambda-nodejs]] +===== Instrumenting AWS Lambda Node.js functions + +NOTE: For a better startup time, we recommend using SDK-based instrumentation for manual instrumentation of the code rather than auto instrumentation. + +To instrument AWS Lambda Node.js functions, see https://aws-otel.github.io/docs/getting-started/lambda/lambda-js[AWS Distro for OpenTelemetry Lambda Support For JS]. + +The configuration of the OpenTelemetry Collector, with the definition of the Elastic Observability endpoint, can be added to the root directory of the Lambda binaries: `src/main/resources/opentelemetry-collector.yaml`. + +[source,yaml] +---- +# Copy opentelemetry-collector.yaml in the root directory of the lambda function +# Set an environment variable 'OPENTELEMETRY_COLLECTOR_CONFIG_FILE' to '/var/task/opentelemetry-collector.yaml' +receivers: + otlp: + protocols: + http: + grpc: + +exporters: + logging: + loglevel: debug + otlp/elastic: + # Elastic APM server https endpoint without the "https://" prefix + endpoint: "${ELASTIC_OTLP_ENDPOINT}" <1> + headers: + # Elastic APM Server secret token + Authorization: "Bearer ${ELASTIC_OTLP_TOKEN}" <1> + +service: + pipelines: + traces: + receivers: [otlp] + exporters: [logging, otlp/elastic] + metrics: + receivers: [otlp] + exporters: [logging, otlp/elastic] +---- +<1> Environment-specific configuration parameters can be conveniently passed in as environment variables: `ELASTIC_OTLP_ENDPOINT` and `ELASTIC_OTLP_TOKEN` + +Configure the AWS Lambda Node.js function: + +* https://docs.aws.amazon.com/lambda/latest/dg/API_Layer.html[Function +layer]: The latest https://aws-otel.github.io/docs/getting-started/lambda/lambda-js[AWS +Lambda layer for OpenTelemetry]. For example, `arn:aws:lambda:eu-west-1:901920570463:layer:aws-otel-nodejs-ver-0-23-0:1`) +* https://docs.aws.amazon.com/lambda/latest/dg/API_TracingConfig.html[TracingConfig / Mode] set to `PassTrough` +* https://docs.aws.amazon.com/lambda/latest/dg/API_FunctionConfiguration.html[FunctionConfiguration / Timeout] set to more than 10 seconds to support the cold start of the Lambda JS Runtime +* Export the environment variables: +** `AWS_LAMBDA_EXEC_WRAPPER="/opt/otel-handler"` for wrapping handlers proxied through the API Gateway. See https://aws-otel.github.io/docs/getting-started/lambda/lambda-js#enable-auto-instrumentation-for-your-lambda-function[enable auto instrumentation for your lambda-function]. +** `OTEL_PROPAGATORS="tracecontext"` to override the default setting that also enables X-Ray headers causing interferences between OpenTelemetry and X-Ray +** `OPENTELEMETRY_COLLECTOR_CONFIG_FILE="/var/task/opentelemetry-collector.yaml"` to specify the path to your OpenTelemetry Collector configuration +** `OTEL_EXPORTER_OTLP_ENDPOINT="http://localhost:55681/v1/traces"` this environment variable is required to be set until https://github.com/open-telemetry/opentelemetry-js/pull/2331[PR #2331] is merged and released. +** `OTEL_TRACES_SAMPLER="AlwaysOn"` define the required sampler strategy if it is not sent from the caller. Note that `Always_on` can potentially create a very large amount of data, so in production set the correct sampling configuration, as per the https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/sdk.md#sampling[specification]. + +[float] +[[open-telemetry-aws-lambda-nodejs-terraform]] +===== Instrumenting AWS Lambda Node.js functions with Terraform + +To manage the configuration of your AWS Lambda functions, we recommend using an infrastructure as code solution like Terraform or Ansible. + +Here is an example of AWS Lambda Node.js function managed with Terraform and the https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/lambda_function[AWS Provider / Lambda Functions]: + +* https://github.com/michaelhyatt/terraform-aws-nodejs-api-worker-otel/tree/v0.23[Sample Terraform code] + +[float] +[[open-telemetry-known-limitations]] +==== Limitations + +[float] +[[open-telemetry-traces-limitations]] +===== OpenTelemetry traces + +* Traces of applications using `messaging` semantics might be wrongly displayed or not shown in the APM UI. You may only see `spans` coming from such services, but no `transaction` https://github.com/elastic/apm-server/issues/5094[#5094] +* Inability to see Stack traces in spans +* Inability in APM views to view the "Time Spent by Span Type" https://github.com/elastic/apm-server/issues/5747[#5747] +* Metrics derived from traces (throughput, latency, and errors) are not accurate when traces are sampled before being ingested by Elastic Observability (ie by an OpenTelemetry Collector or OpenTelemetry APM agent or SDK) https://github.com/elastic/apm/issues/472[#472] + +[float] +[[open-telemetry-metrics-limitations]] +===== OpenTelemetry metrics + +* Inability to see host metrics in Elastic Metrics Infrastructure view when using the OpenTelemetry Collector host metrics receiver https://github.com/elastic/apm-server/issues/5310[#5310] + +[float] +[[open-telemetry-logs-limitations]] +===== OpenTelemetry logs + +* OpenTelemetry logs are not yet supported https://github.com/elastic/apm-server/issues/5491[#5491] diff --git a/docs/sampling.asciidoc b/docs/sampling.asciidoc new file mode 100644 index 00000000000..8776fde0e12 --- /dev/null +++ b/docs/sampling.asciidoc @@ -0,0 +1,108 @@ +[[sampling]] +=== Transaction sampling + +Elastic APM supports head-based, probability sampling. +_Head-based_ means the sampling decision for each trace is made when that trace is initiated. +_Probability sampling_ means that each trace has a defined and equal probability of being sampled. + +For example, a sampling value of `.2` indicates a transaction sample rate of `20%`. +This means that only `20%` of traces will send and retain all of their associated information. +The remaining traces will drop contextual information to reduce the transfer and storage size of the trace. + +[float] +==== Why sample? + +Distributed tracing can generate a substantial amount of data, +and storage can be a concern for users running `100%` sampling -- especially as they scale. + +The goal of probability sampling is to provide you with a representative set of data that allows +you to make statistical inferences about the entire group of data. +In other words, in most cases, you can still find anomalous patterns in your applications, detect outages, track errors, +and lower MTTR, even when sampling at less than `100%`. + +[float] +==== What data is sampled? + +A sampled trace retains all data associated with it. + +Non-sampled traces drop <> data. +Spans contain more granular information about what is happening within a transaction, +like external requests or database calls. +Spans also contain contextual information and labels. + +Regardless of the sampling decision, all traces retain transaction and error data. +This means the following data will always accurately reflect *all* of your application's requests, regardless of the configured sampling rate: + +* Transaction duration and transactions per minute +* Transaction breakdown metrics +* Errors, error occurrence, and error rate + +// To turn off the sending of all data, including transaction and error data, set `active` to `false`. + +[float] +==== Sample rates + +What's the best sampling rate? Unfortunately, there isn't one. +Sampling is dependent on your data, the throughput of your application, data retainment policies, and other factors. +A sampling rate from `.1%` to `100%` would all be considered normal. +You may even decide to have a unique sample rate per service -- for example, if a certain service +experiences considerably more or less traffic than another. + +// Regardless, cost conscious customers are likely to be fine with a lower sample rate. + +[float] +==== Sampling with distributed tracing + +The initiating service makes the sampling decision in a distributed trace, +and all downstream services respect that decision. + +In each example below, `Service A` initiates four transactions. +In the first example, `Service A` samples at `.5` (`50%`). In the second, `Service A` samples at `1` (`100%`). +Each subsequent service respects the initial sampling decision, regardless of their configured sample rate. +The result is a sampling percentage that matches the initiating service: + +image::./images/dt-sampling-example.png[How sampling impacts distributed tracing] + +[float] +==== APM app implications + +Because the transaction sample rate is respected by downstream services, +the APM app always knows which transactions have and haven't been sampled. +This prevents the app from showing broken traces. +In addition, because transaction and error data is never sampled, +you can always expect metrics and errors to be accurately reflected in the APM app. + +*Service maps* + +Service maps rely on distributed traces to draw connections between services. +A minimum required version of APM agents is required for Service maps to work. +See {kibana-ref}/service-maps.html[Service maps] for more information. + +// Follow-up: Add link from https://www.elastic.co/guide/en/kibana/current/service-maps.html#service-maps-how +// to this page. + +[float] +==== Adjust the sample rate + +There are three ways to adjust the transaction sample rate of your APM agents: + +Dynamic:: +The transaction sample rate can be changed dynamically (no redeployment necessary) on a per-service and per-environment +basis with {kibana-ref}/agent-configuration.html[APM Agent Configuration] in Kibana. + +Kibana API:: +APM Agent configuration exposes an API that can be used to programmatically change +your agents' sampling rate. +An example is provided in the {kibana-ref}/agent-config-api.html[Agent configuration API reference]. + +Configuration:: +Each agent provides a configuration value used to set the transaction sample rate. +See the relevant agent's documentation for more details: + +* Go: {apm-go-ref-v}/configuration.html#config-transaction-sample-rate[`ELASTIC_APM_TRANSACTION_SAMPLE_RATE`] +* Java: {apm-java-ref-v}/config-core.html#config-transaction-sample-rate[`transaction_sample_rate`] +* .NET: {apm-dotnet-ref-v}/config-core.html#config-transaction-sample-rate[`TransactionSampleRate`] +* Node.js: {apm-node-ref-v}/configuration.html#transaction-sample-rate[`transactionSampleRate`] +* PHP: {apm-php-ref-v}/configuration-reference.html#config-transaction-sample-rate[`transaction_sample_rate`] +* Python: {apm-py-ref-v}/configuration.html#config-transaction-sample-rate[`transaction_sample_rate`] +* Ruby: {apm-ruby-ref-v}/configuration.html#config-transaction-sample-rate[`transaction_sample_rate`] \ No newline at end of file