Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Missing parent span used in Condvar in Jaeger exports #1595

Closed
Mossaka opened this issue Mar 4, 2024 · 2 comments
Closed

[Bug]: Missing parent span used in Condvar in Jaeger exports #1595

Mossaka opened this issue Mar 4, 2024 · 2 comments
Labels
bug Something isn't working triage:todo Needs to be traiged.

Comments

@Mossaka
Copy link

Mossaka commented Mar 4, 2024

What happened?

I have made a demo showing that the parent span of a2 is missing by using the tracing and otel crates and jaeger exporter.

https://github.com/Mossaka/otel-condvar/blob/main/src/main.rs

This program does a fork-exec and wait for the CondVar to wake up. The parent process sleeps for three seconds in a thread and then signal the condvar. I am expecting that both a1 and a2 have a shim_main parent span. a1 has one but a2 is missing one.

image

API Version

0.21

SDK Version

0.21

What Exporters are you seeing the problem on?

OTLP

Relevant log output

No response

@Mossaka Mossaka added bug Something isn't working triage:todo Needs to be traiged. labels Mar 4, 2024
@stormshield-fabs
Copy link
Contributor

The problem you're facing is related to inter-process trace propagation (see also OTel Context Propagation): when the second instance of the program starts, it has no idea the caller process was instrumented so it cannot set the parent ID as you'd expect.

You need to use a TraceContextPropagator that will let you extract information about the current span in the first process (trace ID, span ID), to forward it to the second process and properly set the parent span ID.

Here's a modified version of your repro (please note I've added serde and serde_json). I've also stripped the CondVar-related code, because it's not relevant to the problem and appears to not be working (the second instance seems to block on wait and never exits, so it also holds back the related span).

main.rs
use std::{collections::HashMap, env, process::Command};

use opentelemetry::{
    global::{self, shutdown_tracer_provider},
    trace::TraceError,
    KeyValue,
};
use opentelemetry_otlp::WithExportConfig;
use opentelemetry_sdk::{propagation::TraceContextPropagator, trace as sdktrace, Resource};
use tracing::{instrument, Span};
use tracing_opentelemetry::OpenTelemetrySpanExt;
use tracing_subscriber::{layer::SubscriberExt, Registry};

#[tokio::main]
async fn main() {
    let tracer = init_tracer().expect("Failed to initialize tracer.");
    let telemetry = tracing_opentelemetry::layer().with_tracer(tracer);
    global::set_text_map_propagator(TraceContextPropagator::new());
    let subscriber = Registry::default().with(telemetry);

    tracing::subscriber::with_default(subscriber, || {
        shim_main();
    });

    shutdown_tracer_provider();
    println!("Shutdown");
}

#[instrument]
fn shim_main() {
    let args: Vec<_> = env::args().collect();
    match args.get(1) {
        Some(arg) if arg == "1" => {
            a1();
            spawn();
        }
        Some(arg) if arg == "2" => {
            if let Some(trace_context) = env::args().nth(2) {
                let extractor: HashMap<String, String> =
                    serde_json::from_str(&trace_context).unwrap();
                let context =
                    global::get_text_map_propagator(|propagator| propagator.extract(&extractor));
                Span::current().set_parent(context);
            }
            a2();
        }
        _ => println!("Usage: {} <1|2 trace_context>", args[0]),
    }
}

#[instrument]
pub fn a1() {
    println!("a1");
}

#[instrument]
pub fn a2() {
    println!("a2");
}

#[instrument]
fn spawn() {
    let cmd = env::current_exe().unwrap();
    let cwd = env::current_dir().unwrap();
    let mut command = Command::new(cmd);

    let mut injector: HashMap<String, String> = HashMap::default();
    global::get_text_map_propagator(|propagator| {
        // We must explicitely retrieve the context from `tracing`
        propagator.inject_context(&Span::current().context(), &mut injector);
    });
    let trace_context = serde_json::to_string(&injector).unwrap();

    command.current_dir(cwd).arg("2").arg(trace_context);
    command.spawn().unwrap();
}

fn init_tracer() -> Result<opentelemetry_sdk::trace::Tracer, TraceError> {
    opentelemetry_otlp::new_pipeline()
        .tracing()
        .with_exporter(
            opentelemetry_otlp::new_exporter()
                .tonic()
                .with_endpoint("http://localhost:4317"),
        )
        .with_trace_config(
            sdktrace::config().with_resource(Resource::new(vec![KeyValue::new(
                "service.name",
                "instance3",
            )])),
        )
        .install_simple()
}

Finally, here's the collected trace in Jaeger:
image

@Mossaka
Copy link
Author

Mossaka commented Apr 8, 2024

Thanks! (sorry for getting back to you late as I was away for other stuff) :>

Ah yes the TraceContextPropogator is what really been missing. I really appreciate your help there!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage:todo Needs to be traiged.
Projects
None yet
Development

No branches or pull requests

3 participants