Skip to content
This repository has been archived by the owner on May 23, 2023. It is now read-only.

Add structured message support to Semantic Conventions #134

Open
4 tasks
ndrwrbgs opened this issue Nov 4, 2018 · 17 comments
Open
4 tasks

Add structured message support to Semantic Conventions #134

ndrwrbgs opened this issue Nov 4, 2018 · 17 comments

Comments

@ndrwrbgs
Copy link

ndrwrbgs commented Nov 4, 2018

Problem

The naive approach when a user wants to log a message is to fill in the message details and send the message. However, this prevents structured logging and hurts searchability. While a user or library could address this problem themselves, it seems like something that could benefit from being standardized.

Proposal

Provide a note about how to log structured messages in a compatible way in the Semantics Conventions document.

E.g. as key value pairs like this

Key Value
message.format I like to eat {0}s and {1}s
message.param1 apple
message.param2 banana

Questions to address

  • Best language agnostic format for the message format (above is C#'s)
  • Naming of the prefix (message? structuredLog?)
  • Language agnostic naming of the format (template? format?)
  • Naming of the indexed template parameters
@yurishkuro
Copy link
Member

yurishkuro commented Nov 5, 2018

in Python:

span.log_kv({
    'event': 'lunch',
    'message': 'I bought {numApples} apples and {numBananas}',
    'numApples': 3,
    'numBananas': 1313131313,
})

@ndrwrbgs
Copy link
Author

ndrwrbgs commented Nov 5, 2018

Thanks Yuri, this unfortunately doesn’t satisfy the usage scenario as it requires the use of named template fields which are not always available. See C#’s FormattableString type which is the language support for templates strings.

Also the spec doesn’t support the standard of using a later KVP value in a former one ala “{numApples}” in your case. If that were part of the spec, then I could achieve my goal by arbitrary naming.

@yurishkuro
Copy link
Member

You said language agnostic format. It's irrelevant what C# has because the string interpolation will be done by the tracing backend.

@ndrwrbgs
Copy link
Author

ndrwrbgs commented Jan 10, 2019

Thanks @yurishkuro but there is nothing in the spec that says {numApples} should refer to 'numApples' log key. If the spec can be updated to do that, that would satisfy my concern. But right now your convention in the code above is not documented on the semantic conventions, which would mean if someone instrumented that your tracer should consider {numApples} to literally mean "{numApples}".

@ndrwrbgs
Copy link
Author

You said language agnostic format. It's irrelevant what C# has because the string interpolation will be done by the tracing backend.

FYI in case you missed it in my earlier message before you said that:

Also the spec doesn’t support the standard of using a later KVP value in a former one ala “{numApples}” in your case. If that were part of the spec, then I could achieve my goal by arbitrary naming.

Completely agree that it doesn't matter what C# has, as this is the specification, but see the later clause wherein if we do codify what you're using as a convention then I can unblock the representation of the data.

@austinlparker
Copy link
Member

Would it be worthwhile to extend the conventions to say that tracers should attempt string interpolation of messages, using available fields of the log message?

@ndrwrbgs
Copy link
Author

ndrwrbgs commented Jan 10, 2019

Thanks @austinlparker, I would shy away from that actually, as I think the tracer itself should preserve what it is given (for structured logging scenarios where the backend db can index based on the template if given it). This is why I would put it as a 'convention' about the data rather than a recommendation to the tracer itself.. This is branching into an area that I think OT will need to tackle eventually (if we want pervasive tooling) which is focused around UI and interop between non-uniform tracing systems - it's more a statement about what tooling, storage systems, and UIs should do with the trace data by convention rather than that the tracer itself should interpolate.

What I need is an official statement on what style this interpolation pattern is expected to be and then I can finish things like https://github.com/ndrwrbgs/OpenTracing-SemanticConventions/blob/88b5a4e7b98c9374cb39bc479a2ee3c6b541e3fe/src/Library/LogBasedExtensions.cs#L62

@ndrwrbgs
Copy link
Author

Should I send a PR for this, or would one of the common editors of the document prefer to immortalize the above?

@yurishkuro
Copy link
Member

The example that I showed above does not require any changes to the specification, and can be used right now. For a human reading that span log it's very obvious how to interpret it.

In order to make it a part of the specification, we need to have backends that are actually capable of interpreting that format. I realize it's a chicken & egg problem, but semantic specification is meant to capture existing practices.

@ndrwrbgs
Copy link
Author

ndrwrbgs commented Feb 15, 2019

For a human reading that span log

Unfortunately, humans reading the span logs would remove the need for any conventions, specifications help computers read the logs which is what I need here. I did mention this above. Thanks for coming back to the conversation, but could you reply to the callouts to you from before that were missed? They may help you to regain context on this and why it should be in the conventions.

Based on your earlier statements, this proposal is already an existing practice for your tracers, so it does not sound like a chicken & egg problem from where I'm standing - it seems like filling a gap in the conventions that even you made implicit assumptions of being the standard (re {numApples}). To address your concerns more directly - could you enumerate what, from your point of view, would be the required next step to unblock this work. I'd like if it doesn't stagnate with a non-answer for many more months considering there doesn't appear to be any disagreement on this 'obvious' proposal.

@yurishkuro
Copy link
Member

I'm not aware of any existing practice, that's the issue, that's why I'm reluctant to include in the spec something that nobody supports. My comment about humans is meant to say that if you start using that notation today in your instrumentation, it will be very readable to human in a tracing system that doesn't extrapolate the strings (eg in Jaeger), yet your data will be fully structured & suitable for machine processing. In other words, you don't need to wait for a standard to start using this notation. If we then have at least 1 tracing backend actually support it in some way (eg a PR to Jaeger UI to extrapolate the string,😉), then we can propose it to the specification based on established practice.

@ndrwrbgs
Copy link
Author

I didn't realize that the OpenTracing specification was driven off of what the Jaeger product has pre-existing support for, apologies for my missing that :(

@yurishkuro
Copy link
Member

I did not say that, I said "e.g. Jaeger", as example. The point is that proposing a convention based on an existing implementation (especially open source one) is quite different from proposing a convention that no system actually supports.

@ndrwrbgs
Copy link
Author

In that case let me note that my company as a tracer that does do this, an existing implementation (does not meet your especially but meets the non parenthetical conditions) - so can we look at the proposal for it's merits?

If it's a chicken-and-egg problem as you say then it doesn't seem fair to choose a single member's decision of which is the chicken over anyone else's, without that being established somewhere -- and from what I understand OT is the root, as it was supposed to be vendor agnostic so programming it to a specific vendor would be silly - and waiting for EVERY vendor would be nigh impossible without a standard (like OT?) to be working off of.

But I digress, if you're so against this proposal. It doesn't block me, it just seemed like something that could help your users.

@yurishkuro
Copy link
Member

You misunderstand me, but that's ok. I am not against the parameterized string. I do prefer named version over param.# syntax, in fact I would change my example to shell-style variables, which is universally recognizable. It's also more search-friendly, numApples = 3 vs. param1 = 3.

span.log_kv({
    'event': 'lunch',
    'message': 'I bought ${numApples} apples and ${numBananas}',
    'numApples': 3,
    'numBananas': 1313131313,
})

@jpkrohling
Copy link

Other than logging frameworks, SQL libraries also have a similar feature when dealing with "prepared statements". In that case, ? used to be the standard, until named parameters started being used for different reasons.

I personally prefer the named variant as well, especially because it removes ambiguity.

In time: the original proposal has the message with 0-based indexes, whereas the first parameter is named "param1" (instead of "param0"). If that's indeed how we'll move forward, we should at least be consistent :-)

@ndrwrbgs
Copy link
Author

The discussion wasn't about which to use, as, as I said, the named parameters one is a superset of the prefixed 'param0' one. The discussion as I understood it was me saying 'let's do it' - others chiming in 'yes', but then when it comes to doing it I was told 'not until Jaeger supports it'.

It sounds again like we all agree on it, so I reiterate my previous question - would one of the maintainers like to add or or will I be blocked if I do so?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants