Skip to content

Latest commit

 

History

History
283 lines (203 loc) · 11.9 KB

tracing.md

File metadata and controls

283 lines (203 loc) · 11.9 KB

Overview

  1. Idioms for Tracing (and associated Logging)
  2. How to use tracing and OpenTelemetry

Setup

Summary of key ideas

  1. tracing lib handles both tracing and logging
    1. Migration from log lib
    2. Migration from slog
  2. tracing API is close enough to OpenTelemetry
    1. Compatibility lib
      1. tracing::Span -> OpenTelemetry::Span
      2. tracing::Event -> OpenTelemetry::Event
    2. Differences:
      1. OpenTelemetry spans lack level

Gotchas

  1. Performance overhead for instrumentation
  2. Performance overhead for reporting spans over the network (or even local disk IO
  3. OpenTelemetry regularly breaks semver guarantees
  4. OpenTelemetry libs can version lock you into old dependencies
  5. Types & traits with the same name in different packages (designed for frustration/confusion)
  6. Jaeger UI didn't support auth
  7. Jaeger: Sustainable external storage is either very complex or expensive (Cassandra, Elastic, ...)
  8. Correct internal Propagation is hard (async, threads, ...)
  9. Propagation at servers and clients is hard
  10. tokio-console has no features for finding perf bottlenecks (eg. slowest method). It's not a profiler
  11. Upgrading is hard because some otel deps bump their dependencies and others don't
    1. Example: tracing-opentelemetry v0.26 depends on opentelemetry v0.25

tracing lib (part of Tokio)

  1. Concepts
    1. Span: TODO
    2. Event struct: a log associated with a Span
    3. Subscriber trait: TODO
      1. Subscriber responsible for associating spans with threads
    4. Layer trait: TODO
  2. Pro: Uses idiomatic rust
  3. Pro: Compatible with Futures, example
  4. Pro: Compatible with tokio
  5. Pro: Has convenient macros
  6. Pro: Includes thread attributes by default
  7. Cons: Sometimes unintuitive
    1. eg. https://docs.rs/tracing-core/0.1.30/tracing_core/trait.Subscriber.html#tymethod.record
  8. Cons: Non-standard when compared to OpenTelemetry usage in other coding languages
  9. Cons: Different span lifecycle than OpenTelemetry

Setup tracing

Usage

use tracing::{instrument, Span};
...

    let span = info_span!("my_span");
    let guard = span.enter();

    /* -- alternative:
        let span = info_span!("my_span").entered();
        defer! { drop(span) }  // See https://github.com/rodrigocfd/defer-lite/
    */

    // let _ = info_span!(...) // GOTCHA: DON'T DO THIS, immediately drops your span, so the hierarchy breaks

    span.record("foo", "bar"); // must be defined in the span macro

    // auto associated with span
    info!("something happened"); // or warn!(...) or error!(...) or debug!(...)

    span.record("error", anyhow!("something went wrong").into());

    // If required, reference the current span (last entered span on this thread):
    // tracing::Span::current()
    let current_span = Span::current();

    // manually set parent span
    //TODO: verify
    let _ = info_span!(parent: span.id(), "foo_bar");

    // -- Either drop or use https://github.com/rodrigocfd/defer-lite/
    // drop(guard); // if you don't drop, rustc might drop your span early
  1. See https://docs.rs/tracing/latest/tracing/macro.span.html

Wrap a function in a Span

  1. Not as flexible as code, but often convenient
  2. See https://docs.rs/tracing/latest/tracing/attr.instrument.html
use tracing_attributes::instrument;
...


#[instrument]
fn do_something(foo: &str) -> Result<String, anyhow::Error> {

    // these are automatically associated with the span
    debug!(aa = 7, "not that important");
    info!("interesting");
    warn!("hm...");
    error!("ooh no!");

    //GOTCHA: attributes/fields must be declared in the attribute

    Ok("output".to_owned())
}

#[instrument(err)]
fn do_something_dangerous() -> Result<String, anyhow::Error> {
    todo!("boo")
}

Attributes

record extra span attributes (after span created)

  1. 32 fields/attributes max
    1. Standard fields/attributes: https://opentelemetry.io/docs/specs/semconv/attributes-registry/http/

Get current span

info_span!("...", foo = tracing::field::Empty).entered();  // not .enter()
// allocate field `foo`, so we can set later
...

Span::current().record("foo", 88); // record field previously allocated

Set current span

let _ = span.entered();

// ... do some work

// _ is dropped at the end

Associating logs with spans, adding attributes

use tracing::{debug, error, info, info_span, warn};
...

let span = info_span!("do_something").entered();

...
// attributes do NOT need to be declared on span first
debug!(message = "example debug msg", aa = 7);
info!(message = "example info msg", b = true);
warn!(message = "example warn log event", f = 3.2);

error!("ooh no!");  // This marks span as error (in Jaeger too)

// -- or put the message last
info!(cc = "cheese", "something happened");
info!(dd = %foo1, "whatever") // use fmt::Display
info!(ee = ?foo2, "something") // use fmt::Debug
  1. ? sigil prefix means record with fmt::Debug
  2. % prefix means record with fmt::Display

Recording errors

error!(err = ?some_err, "failed to ...")
  1. See also https://docs.rs/tracing/latest/tracing/index.html#recording-fields
  2. See also: https://opentelemetry.io/docs/reference/specification/trace/semantic_conventions/exceptions/#attributes

Propagation: HTTP client

  1. TODO

Propagation: HTTP server

  1. TODO

Reporting delay in main fn

  1. Add a brief delay at the end of your program to report any outstanding spans
root_span.exit();

// -- Give the last span time to report to Jaeger
//sleep(Duration::from_millis(300));
opentelemetry::global::shutdown_tracer_provider();

Propagation (child function within process)

  1. Single-threaded: This is automatic
  2. Multi-threaded: (see below)

Example propagating span across threads

  1. tokio::tracing relies on thread-local, so we must manually propagate the span
use tracing_opentelemetry::OpenTelemetrySpanExt;
...

fn outer() -> Result<(), anyhow::Error> {
    let parent = info_span!("outer-fn").entered();
    let parent_cx = parent.context();

    for i in 0..3 {
        let parent_cx = parent_cx.clone();

        thread::spawn(move || {
            let inner = info_span!("inner-fn", the_index = i).entered();
            inner.set_parent(parent_cx); // link to parent
        })
        .join()
        .expect("failed to join");
    }

    Ok(())
}

Flow from source to jaeger

  1. ::
  2. tracing::dispatcher::get_default(...) (src in tracing-core)
  3. tracing::Span::new(...) (src intracing)
  4. tracing::Span::new_with(...) (src in tracing)
  5. tracing::Dispatch::new_span(...) (src in tracing-core)
    1. thin wrapper around Subscriber
    2. returns new tracing::Id
  6. Subscriber::new_span(...) (src intracing-core)
    1. seems like they are renaming to "Collector"
  7. Layered::new_span(...) as Subscriber (src in tracing-subscriber)
  8. Registry::new_span(...) as Subscriber (src in tracing-subscriber)
    1. Checks for parent
    2. Ensures local span ids are unique
  9. OpenTelemetryLayer::on_new_span(...) as Layer (src in tracing-opentelemetry)
    1. checks parent context for active span
    2. stores extra data in span extension (mutates extensions)
  10. SimpleSpanProcessor::on_end(...) as SimpleSpanProcessor (src in opentelemetry_sdk)
  11. SimpleSpanProcessor crossbeam_channel with SpanData
    1. BatchSpanProcessor is similar
  12. Exporter::export as trace::SpanExporter
  13. convert_otel_span_into_jaeger_span(...) (src in opentelemetry-jaeger)
  14. Uploader::upload(jaeger::Batch) (src in opentelemetry-jaeger)

TODO/Unorganized

  • Env vars
    • RUST_LOG=debug|info|warn|trace
    • RUSTFLAGS=--cfg tokio_unstable

Other Resources

  1. https://docs.rs/tracing-opentelemetry/latest/tracing_opentelemetry/
  2. https://docs.rs/tracing/latest/tracing/
  3. https://docs.rs/log/latest/log/
  4. https://crates.io/crates/opentelemetry-jaeger