-
Notifications
You must be signed in to change notification settings - Fork 897
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Representation of multi-valued attributes #341
Comments
Other use case for multi-valued attributes: HTTP headers (would require a multimap), list of HTTP proxies used. |
See also #76, a call for structured attribute values. |
This has been discussed a while ago and we didn't have a specific use-case to demonstrate the need. I think this shows the need clearly. The relevant part of the spec [1] that needs to be changed is:
Note that this also requires a support in exporting protocols. Proposed OTLP [2] can support this, so can Jaeger Protobuf [3]. OpenCensus proto cannot support this directly because attributes are a map there [4], so we will need to come up with a reasonable way to translate to formats that require attribute keys to be unique. [1] - https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/api-tracing.md#set-attributes |
Why not support list-valued attributes? I would prefer not to change the specification that states:
|
List of what thought? Is it a list of primitives only? Can it be a list of lists (and recursively so)? That makes it equivalent to #76 This is also difficult to support in exporters. How do we translate a list-valued attribute to Jaeger tags for example? Do we flatten the list? If most of exporters are going to flatten the list then why don't we start with a flat list in the first place? |
I suggest this because it would allow future exporters to enrich their data structure and support any JSON-type value. In a structured logging system, it will be very natural to have structured key-values, recursively so, and I'd like to maintain consistency between span event key-values and span attributes. |
Also to answer your question, I would format these values as JSON and set them as strings. |
I am not against structured values but I think that's a separate topic. Even with structured values I would still suggest to support multi-values (essentially creating a tree of multimaps). Let me keep this debate going though since I am not convinced yet one way or another. I want to see more arguments in favor of the approaches. One thing I don't like about list-valued attributes proposal is that it results in multi- and single-value attributes to have a different data structure, e.g. given the usecase in this issue I can have one of the following 2 structures depending on whether I am getting a single or multiple keys from Memchached: {
"keys": "my-first-key"
} {
"keys": ["my-first-key", "my-second-key"]
} If I now want to search for all Spans representing Memcached get of "my-first-key" I need to have a complicated (possibly recursive) tree search. If it was a multi-valued list my search approach would most likely be the same regardless of how many values of the same key I have. Assuming I am storing the data in a RDBMS the flat version is more natural and likely faster unless my RDBMS supports JSON indexing (and even if it does my search is likely still more complicated than the flat version). |
For the record, the LightStep tracer supports JSON-valued inputs. The OpenTracing APIs support this to some extent, so for example the Golang To me, multimap semantics raise all sorts of new complicated questions about the API. Should we support a separate API to overwrite? Should we support deletion by key:value? I see your concerns about RDBMS representation as out of place at the API level. If you received a JSON-encoded list, you could certainly choose to index it by flattening the list. If you receive a JSON-encoded dictionary, you could recursively expand it and index flat key:values. These are questions about the schema of the data. That leaves only one real question, to me. It's convenient to say there are multi-map semantics, because it lets a user call "set" multiple times without remembering any state on their own. If we only support unique map semantics, then the user is required to remember all their values and assemble a list on their own. This would be my preference because it simplifies the API considerably. Set has overwrite semantics, and if you want to set multiple times with a list, you can. But it doesn't require any new API surface to replace an already-set value. |
This is true. I think I agree with what you are saying. As soon as we are able to support multiple values unambiguously and there is a clear convention on how it must be done I am OK with using array values for that. If we have a convention the backend is free to flatten or store however it wants. @fbogsany thoughts on this? Does it work for you? |
👍 I'm fine with a clear convention. |
Here is a proposal to support multi-value attributes. Please comment. API ChangesIn Tracing API specification, Set Attributes section replace this:
By this:
Protocol SupportOTLPAdd support for array types to
Protocols with no built-in support for arraysTo store an array of values encode it as a JSON and store as a string value. For example to store an array of 3 elements
Semantic ConventionsAdd the following paragraph to Semantic Conventions document:
|
I would not require support for non-homogenous arrays and I would even go as far as forbidding semantic conventions to use them. From the top of my head, I can't think of any use cases for them and I think that they will lead to complicated or ugly APIs in some languages. |
Agreed. My use case only requires homogenous arrays. I'm looking for an array of strings, but I could also see a use for arrays of booleans (which keys in a memcached multi |
Updated the proposal to require homogeneous arrays. |
Note that the requirement to have homogeneous arrays also places limits in arbitrary nested structures that we may want to support in the future. If such support is added most likely there will be a desire to allow non-homogeneous array elements (so that you can e.g. directly represent JSON data). |
I don't think the json representation is useful for Metrics. Hanse one more reason to not have the same representation for attributes in tracing and labels in metrics |
Resolves open-telemetry#341 Array values are useful for representing attributes that can have multiple values (e.g. representing a Memcached `get` that can be over multiple keys and making the keys a span attribute). This change allows homogeneous array values for span attributes and specifies that array values should be JSON encoded stirng when exporting using protocols that do not natively support array values.
Resolves open-telemetry#341 Array values are useful for representing attributes that can have multiple values (e.g. representing a Memcached `get` that can be over multiple keys and making the keys a span attribute). This change allows homogeneous array values for span attributes and specifies that array values should be JSON encoded stirng when exporting using protocols that do not natively support array values.
The homogeneous requirement is fine, thanks! |
Resolves #341 Array values are useful for representing attributes that can have multiple values (e.g. representing a Memcached `get` that can be over multiple keys and making the keys a span attribute). This change allows homogeneous array values for span attributes and specifies that array values should be JSON encoded stirng when exporting using protocols that do not natively support array values.
simple joining on comma is cheaper to index split and join and more supportable vs json arrays. I know folks are already doing this and it requires no more coordination on backend than making a json array. the only thing you gain with json is a an escaping syntax. I know ship has sailed, but this one needs an alternative opinion mentioned for posterity |
also the use case of tagging the input to multi-get is generalizable as request tagging. it is very common to have json strings when serializing requests into tags. A concrete non-request tagging use case for array values are things like experiment IDs. Usually these are bounded to much lower cardinality, yet need to be indexed for lookup. It is possible that someone wants to index all redis keys, also, but yeah I think this is more a request logging thing vs anything else. hope the context helps for the next similar discussions |
Resolves open-telemetry#341 Array values are useful for representing attributes that can have multiple values (e.g. representing a Memcached `get` that can be over multiple keys and making the keys a span attribute). This change allows homogeneous array values for span attributes and specifies that array values should be JSON encoded stirng when exporting using protocols that do not natively support array values.
Memcached operations such as
get
can specify multiple keys. If we want to add an attributekeys
with a list of the keys included in the operation, we can do so with an arbitrary separator (e.g.span.set_attribute('keys', keys.join('|'))
), however this requires coordination between instrumentation and analysis backends. Can we either define semantic conventions for representation of multi-valued attributes, or allow for multiple attributes on a span with the same key?cc @tigrannajaryan
The text was updated successfully, but these errors were encountered: