Skip to content
This repository has been archived by the owner on Jan 26, 2024. It is now read-only.

Span::context() should return value, not reference #74

Closed
isaachier opened this issue Apr 13, 2018 · 27 comments
Closed

Span::context() should return value, not reference #74

isaachier opened this issue Apr 13, 2018 · 27 comments
Assignees

Comments

@isaachier
Copy link
Contributor

isaachier commented Apr 13, 2018

Jaeger tries to implement a multithreaded Span by keeping the SpanContext immutable (for the most part, details about sampling flag not relevant here). Whenever the baggage changes, the span swaps the current SpanContext instance with a new SpanContext object with an identical baggage map other than the appended key-value pair. The problem arises with the following code.

std::unique_ptr<opentracing::Span> span = tracer->StartSpan("operation");

// Thread #1
SpanContext ctx(span->context());

// Thread #2
span->SetBaggageItem("key", "value");

Here is the issue with this paradigm.

Thread #1: acquire lock on span
Thread #1: get reference to span's context
Thread #1: release lock on span
Thread #2: acquire lock on span
Thread #1: return reference to caller
Thread #2: swap span's context, overwriting initial context

The code above shows a potential interleaving of instructions in such a way that the caller of Span::context() is left with undefined behavior. There is no way for the caller to prevent the existing memory from being overwritten. I proposed changing the return type to a value so we can safely copy the existing span context into a new memory location in the caller's domain, eliminating all race conditions.

@rnburn
Copy link
Contributor

rnburn commented Apr 13, 2018

Why doesn't jaeger modify the existing span context when you set a baggage item?

@isaachier
Copy link
Contributor Author

isaachier commented Apr 13, 2018

OK I see the LightStep C++ client uses a lock in the span context to avoid this (thanks @rnburn for reference here: https://github.com/lightstep/lightstep-tracer-cpp/blob/master/src/lightstep_span_context.h#L59). I wonder what @yurishkuro has to say about changing the C++ client to lock the baggage and maintaining a lock in the SpanContext instead of locking the Span alone.

@isaachier
Copy link
Contributor Author

To be fair, the reason the C++ client is the way it is has to do with the Go client. The Go client does not have a concept of returning via reference, only pointer, and no one thought that was a good idea... so I'm wondering if we should follow the opentracing-go spec here.

@yurishkuro
Copy link
Member

If the context is immutable it doesn't really matter if it's returned by value or by reference. In Go client the lock is held while reading the context (ref) from the span. I'm not seeing in your threading example what the problem is. If you have two threads doing read and write, there's no way to ensure which version the reader gets - it's a race in the usage, not in the implementation of the context () method. The potential memory leak is another story, but specific to C++.

@isaachier
Copy link
Contributor Author

The problem runs a little deeper than that. The issue is potentially reading in the middle of the write to the same memory location. Replacing old context with new one is not an atomic operation. Caller may be handed a partially written location with some mixed state of old and new values.

@isaachier
Copy link
Contributor Author

Even returning by value is impossible here given the slicing problem. Can we return a shared pointer @rnburn? It is probably a decent compromise for both of us even though I hate changing the standard.

@rnburn
Copy link
Contributor

rnburn commented Apr 13, 2018

I'm still not clear why this is even a problem with the interface. What's the issue with implementing locking within the span context?

And if you want to support copying why not use this non-breaking approach #56?

@isaachier
Copy link
Contributor Author

I'm waiting on @yurishkuro's opinion regarding changing the Jaeger implementation as that clearly resolves this too. About the shared pointer, that is more compatible with the current LightStep client than returning a copy would be. So I leave it up to you to decide if we can return by value instead.

@yurishkuro
Copy link
Member

I don't think I understand the issue, still.

The issue is potentially reading in the middle of the write to the same memory location. Replacing old context with new one is not an atomic operation.

So we're not writing to the context since it's immutable. And replacing span.context field with a new instance is done inside the lock for the whole span. How is it not atomic?

@isaachier
Copy link
Contributor Author

Wait a second, when I say immutable I just meant logically not practically. Welcome to the annoying world of C++ object assignment. I'll post the equivalent in Go code in a moment.

@isaachier
Copy link
Contributor Author

isaachier commented Apr 13, 2018

type SpanContext struct {
    flags uint8
    baggage map[string]string
}

func (c *SpanContext) assign(newValue SpanContext) {
    c.flags = newValue.flags
    c.baggage = newValue.baggage
}

type Span struct {
    sync.Mutex
    // ...
    context SpanContext
}

func (s *Span) Context() *SpanContext {
    s.Lock()
    defer s.Unlock()
    return &s.context
}

func (s *Span) SetBaggageItem(k, v string) {
    s.Lock()
    defer s.Unlock()
    newBaggage := s.context.Baggage()
    newBaggage[k] = v
    context := s.context.WithBaggage(newBaggage)
    s.context.assign(context)
}

If the caller uses the pointer, he or she has no idea if the reassignment took place, did not take place, or is in the middle of taking place.

@isaachier
Copy link
Contributor Author

I mean if we limit the code to the extent that you literally cannot do anything with it other that iterate over keys and values, as is the case now, we should be okay.

@isaachier
Copy link
Contributor Author

Again, this will affect any code interested in doing IsSampled in Jaeger or whatever vendor-specific API call since the assignment is not atomic. Meaning, flags could be from a new value and baggage could be from an old one in the example above.

@yurishkuro
Copy link
Member

func (c *SpanContext) assign(newValue SpanContext) {
    c.flags = newValue.flags
    c.baggage = newValue.baggage
}

this is not what the Go client does. Why can't the context be immutable? This ^ one clearly is not.

@isaachier
Copy link
Contributor Author

Right but C++ assignment = really calls operator= which is exactly what assign does above. The reason for this is mostly the low level nature. C++ assignment works like this works in every other language:

int x = 1;
int y = 2;
x = y;  // Copy value of y to x in place.

@isaachier
Copy link
Contributor Author

This can be solved with another lock in SpanContext but that sort of ruins the immutable aspect.

@yurishkuro
Copy link
Member

once again, you returned a reference from context() and then you override its internal values in assign - it's not immutable context. Immutable context means you create a new instance in SetBaggage, or you return a copy from context().

@rnburn
Copy link
Contributor

rnburn commented Apr 13, 2018

Having context() as a reference to the state of the span's context is really the most natural and efficient approach for C and C++. (The OpenTracing spec is suppose to be tweaked to match the target language, no?)

If you're going to change this for C++, then I expect you'd need to make a corresponding change for C. Code like this that you have in the README:

   span = tracer->start_span(tracer, "test-inject");
    return_code =
        tracer->inject_text_map(tracer,
                                (opentracing_text_map_writer*) &writer,
                                span->span_context(span));

would have to do something like this, right?

   span = tracer->start_span(tracer, "test-inject");
   opentracing_span_context* context = span->span_context(span); // Get a copy
    return_code =
        tracer->inject_text_map(tracer,
                                (opentracing_text_map_writer*) &writer,
                                context);
   free(context);

An approach like that that makes copies all the time when you want to reference the context is not going to be as performant. If you want an immutable copy, then let's make that explicit with something like a clone function for the span context (What's done in #56).

@isaachier
Copy link
Contributor Author

Definitely! The copy semantics in C are even worse so I appreciate the reminder to change that as well. Unfortunately, true immutability means creating a new object in a new memory address every time you update the baggage. This is fundamentally difficult in C++ which limits dynamic memory. I imagine Rust will have an even harder time with this particular issue.

The copy helps but not unless the context method returns a copy. In that case, how would you handle that in LightStep? That goes back to the idea of returning a smart pointer (TBD std::unique_ptr or std::shared_ptr), if only because of the slicing problem. Inheritance hierarchies are always difficult in C++ when you want to return the superclass to conform with the API.

@rnburn
Copy link
Contributor

rnburn commented Apr 13, 2018

Yes, but why do we need to interpret a SpanContext as immutable like that? If it's absolutely necessary to have that concept, then why don't we distinguish it with a type like ImmutableSpanContext?

@isaachier
Copy link
Contributor Author

I think @yurishkuro has strong preference for immutability here. Reminds me of Clojure's persistent data structures. I find it elegant but difficult without causing a breaking change in the OT C++ spec.

@isaachier
Copy link
Contributor Author

Even in Go I believe span.context = newValue can't be atomic unless span.context is itself a pointer.

@yurishkuro
Copy link
Member

Even in Go I believe span.context = newValue can't be atomic unless span.context is itself a pointer.

It can be, under lock on the Span.

Anyway, the main reason for making the context immutable in Go was to avoid having to kep another lock in it, in addition to the lock on the span. I am not very happy with the read/write nature of baggage on the span because we already ran into logical race conditions with it. I.e. I would've preferred if the baggage could only be defined on span creation, making the span context trully immutable.

@isaachier
Copy link
Contributor Author

I meant if someone has a pointer to the SpanContext and is not locking the Span when reading it, Go cannot guarantee the Span won't clobber the read by updating the context struct.

@yurishkuro
Copy link
Member

Well out Go client simply doesn't do that - it returns the context by value, but even if we change it to return a pointer the write operation always replaces the context in the span, so the old pointer is unchanged.

@isaachier
Copy link
Contributor Author

OK so I think it's probably better for me to lock the baggage to keep compatibility for any existing instrumentations. Unfortunately, SpanContext immutability would definitely break the code for any instrumented packages.

@ringerc
Copy link

ringerc commented Aug 31, 2018

My original thought when I brought this up was to make SpanContext copy-constructable via a factory interface exposed in the OpenTracing context, but this looks like it makes sense.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants