Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datadog.tracer.active_correlation not available in Sidekiq logging #1068

Open
jeffblake opened this issue Jun 8, 2020 · 13 comments · May be fixed by #3823
Open

Datadog.tracer.active_correlation not available in Sidekiq logging #1068

jeffblake opened this issue Jun 8, 2020 · 13 comments · May be fixed by #3823
Assignees
Labels
bug Involves a bug community Was opened by a community member integrations Involves tracing integrations

Comments

@jeffblake
Copy link

With Sidekiq 5 (released in 2017), job logging is no longer a middleware, and now sits above all middleware in the stack.

I think this is why Datadog.tracer.active_correlation is not available in the new JobLogger, because the correlation is flushed in the Sidekiq::ServerTracer middleware.

Seems crazy that no one has noticed this before, so maybe I have it wrong?

My setup is like

Sidekiq.configure_server do |config|
  config.log_formatter = Logging::Formatters::Json.new
  config.options[:job_logger] = Logging::Sidekiq::JobLogger

    config.server_middleware do |chain|
    chain.add Logging::Sidekiq::Server::JobLogger
  end
end

Datadog.tracer.active_correlation is always 0 for space_id and trace_id in the JobLogger

Datadog.tracer.active_correlation in my Logging::Sidekiq::Server::JobLogger middleware has a span_Id, but it seems to be one of the inner spans, e.g. a postgres query.

I suppose I could grab the root span in my custom middleware, store it in Thread.current[:dd_trace] = Datadog.tracer.active_root_span and then retrieve it in the JobLogger class for logging, (with an ensure block to flush it) ? But since other people must have this problem and I can't find a documented solution, thought I'd bring it up here.

@jeffblake
Copy link
Author

FYI I don't think the stale tracer is the cause here. In the Sidekiq source, https://github.com/mperham/sidekiq/blob/master/lib/sidekiq/processor.rb#L161
You can see that the logging of jobs (JobLogger) "wraps" the middleware, which means that the tracer is started and finished, and then yielded back to the JobLogger (who now no longer has an active trace/span).

@delner
Copy link
Contributor

delner commented Jun 9, 2020

Yeah, after looking at this I arrived at the same conclusion, I think we'll have to investigate this separately.

@wasabigeek
Copy link

@jeffblake we found this recently as well, I was wondering if you implemented the thread variable successfully?

@marcotc
Copy link
Member

marcotc commented Aug 20, 2020

👋 @jeffblake and @wasabigeek, as @jeffblake pointed out, job_logger is invoked before any middleware are processed:

  1. job_logger is invoked here: https://github.com/mperham/sidekiq/blob/v5.2.9/lib/sidekiq/processor.rb#L126
  2. Middleware are invoked here: https://github.com/mperham/sidekiq/blob/v5.2.9/lib/sidekiq/processor.rb#L137 (which invokes this block)

Given that, your JobLogger#call method won't be able to reach a #active_correlation, as all middleware and job work is performed and finished when you yield from JobLogger#call: all interesting work is inaccessible to the job_logger unfortunately.

This seems to me like there's a two tiered systems of Sidekiq server-side hooks:

  1. The logger: only one instance exists.
  2. Sidekiq middleware: a chained collection of processors.

One approach is for us to move our Datadog middleware to outside the context of Sidekiq middleware: we could make ddtrace a job_logger, and properly wrap any user provided logger inside ours. The downside is that this is not very maintainable, specially across Sidekiq versions and specially when users have another instrumentation library similar to ddtrace in their code-base.
Another option is to monkey-patch the top-level Sidekiq::Processor#dispatch method instead: this has the same downside of making it brittle and not interacting nicely with other instrumentation frameworks.

Whenever a library has proper middleware, like Sidekiq does, using them is normally a much better maintainability experience for both users and library maintainers.

The main issue I see in this case is that a custom job_logger could be a regular middleware, instead of a different class of processing logic inside Sidekiq.

I took a look at the interface provided by both Sidekiq::JobLogger and the middleware interface and, given all the information is available to the middleware as far as I can tell, my suggestion here would be to move the logic from the job_logger (Logging::Sidekiq::JobLogger) into your middleware (Logging::Sidekiq::Server::JobLogger), and make sure to add it to Sidekiq's server_middleware after ddtrace inserts our instrumentation middleware.
ddtrace adds its middleware to the end of Sidekiq's server stack when the Datadog.configure{|c| c.use :sidekiq} block completes.

Let me know if I'm missing something in your use case that can't be covered by this suggestion.

@jeffblake
Copy link
Author

@marcotc Your analysis seems spot on. The sidekiq creator acknowledged this when I brought it up, but expressed no interest to go back to logging being a middleware (which is the better design in my opinion).

ddtrace could provide a custom JobLogger, and consumers of the gem could use this as a guideline in their own custom JogLogger if they choose to override it?

Regarding your last paragraph, correct me if I'm wrong, but doing this in middleware seems like a no-go, because all middleware thread-variables would be flushed before the JobLogger gets a chance to log the span/trace ID? Or are you saying move logging out of JobLogger altogether? I don't like that because I prefer to follow whatever design Sidekiq gives us, whether it's correct or not...

@delner delner added the community Was opened by a community member label Apr 21, 2021
@delner delner added bug Involves a bug integrations Involves tracing integrations labels Apr 21, 2021
@icelynjennings
Copy link

Is this fixed?

@agrobbin
Copy link
Contributor

@icelynjennings if you use Sidekiq through Active Job, then it is sort of fixed. Most logs are correlated through Active Job integration. If, however, you're using Sidekiq "natively", then unfortunately it isn't fixed.

@icelynjennings
Copy link

icelynjennings commented Oct 13, 2021

Thank you for your ActiveJob solution @agrobbin. I wonder if the recent solving of this Sidekiq issue sidekiq/sidekiq#5021 via commit sidekiq/sidekiq@90535ab could somehow be leveraged to inherit Rails DD tracing into sidekiq logs

@agrobbin
Copy link
Contributor

@icelynjennings it'd be great if the functionality could be expanded, but I don't think the solution in sidekiq/sidekiq@90535ab actually does anything to make that easier. It broadcasts any Rails.logger output to Sidekiq.logger, but the log correlation happens within the Active Job job middleware stack, so the native Sidekiq logs will still not be available (since they occur outside of the Active Job job middleware stack).

@marcotc
Copy link
Member

marcotc commented Oct 15, 2021

For some more context: what information are we missing today because of the lack of trace-logs correlation at the job logger level?

Is it only these three ("start", "done", "fail") log lines: https://github.com/mperham/sidekiq/blob/1f05bd60102419d12733afa267d96524ece8e756/lib/sidekiq/job_logger.rb#L11-L20. Or is there more rich data that is going uncaptured today?

@icelynjennings
Copy link

Hi @marcotc, I'm new to sidekiq so I cannot answer but a DataDog support ticket I had filed led me to believe that native sidekiq log correlation is not currently available in DD. What I'm missing at the moment is any traceID within our sidekiq job logs.

@highwaybobbery
Copy link

For some more context: what information are we missing today because of the lack of trace-logs correlation at the job logger level?

Is it only these three ("start", "done", "fail") log lines: https://github.com/mperham/sidekiq/blob/1f05bd60102419d12733afa267d96524ece8e756/lib/sidekiq/job_logger.rb#L11-L20. Or is there more rich data that is going uncaptured today?

Just my personal experience: I've got a lot of work done within workers, multiple queues, multiple workers per queue, multiple threads per worker, but the only context I get in DD is that it's Worker.x.

We really need to be able to isolate what each individual thread in each worker in each queue is doing over time.

"Filter logs by thread x on worker y in queue z" is the real function I need.

@alexevanczuk
Copy link
Contributor

Hey all, does anyone know if anything has changed about this in the past year and a half and if it's possible to now correlate sidekiq logs?

@marcotc marcotc linked a pull request Aug 1, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Involves a bug community Was opened by a community member integrations Involves tracing integrations
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants