-
Notifications
You must be signed in to change notification settings - Fork 50
Conversation
```go | ||
type Resource { | ||
Type string | ||
Labels map[string]string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your implementation had Labels as Tags. Are we going with Labels?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we changed it from labels to tags. @Ramonza asked elsewhere yesterday whether "labels" or "properties" would make more sense.
Personally I don't feel strongly – there are arguments for either. Happy to change it to whatever you decide on.
The key MUST be seperated from the value by a single `=`. Values MAY be quoted with a single | ||
leading and trailing `"`. If a values contains whitespaces, `=`, or `"` characters it MUST be | ||
quoted and `"` characters MUST be escaped. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please give examples of OC_RESOURCE_TYPE and OC_RESOURCE_LABELS to demonstrate the format?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
} | ||
``` | ||
|
||
Exporter libraries MAY provide a default translation for well-known input resource types and labels. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be good to add a list of all known label keys here in the future.
Then, implementors can look up for the whole list and decide what they can map.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the known label keys should be specific to the auto-detection implementations. The one's living in census-ecosystem can be documented there.
In general users may implement their own detection with their own set of keys – thus maintaining an authoritative set in the core spec seems problematic. Hence the proposal to use domain namespacing.
resource/README.md
Outdated
@@ -0,0 +1,10 @@ | |||
# OpenCensus Library Resource Package | |||
This documentation serves to document the "look and feel" of the open source resource package. | |||
It describes they key types and overall behavior. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: s/they/their
resource/README.md
Outdated
|
||
The primary purpose of resources as a first-class concept in the core library is decoupling | ||
of discovery of resource information from exporters. This allows for independent development | ||
of either and easy customization for users that need to integrate with closed source environments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove "either and"?
resource/Resource.md
Outdated
The resource library primarily defines a type that captures about information about the entity | ||
for which stats or traces are recorded. It further provides a framework for detection of | ||
resource information from the environment and progressive population as signals propagate | ||
from the core instrumentation library to the a backend's exporter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"the a" -> "a"
resource/README.md
Outdated
@@ -0,0 +1,10 @@ | |||
# OpenCensus Library Resource Package | |||
This documentation serves to document the "look and feel" of the open source resource package. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: s/open source/OpenCensus
resource/Resource.md
Outdated
unicode character. | ||
|
||
Implementations MAY define a `Resource` data type, constructed from the parameters above. | ||
Resource MAY have getters for retrieving all the information used in `Resource` definition. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think getters is a "must".
utils/MonitoredResource.md
Outdated
@@ -1,84 +0,0 @@ | |||
# Monitored Resource |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this page can be left as additional information for the "Auto-detection" section.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thinking was that it's more scalable to keep these docs close to the code they belong to. OTOH, with spread across languages, this repo may indeed still be the better option – maybe with separate file per detection mechanism then as those will probably grow.
resource/Resource.md
Outdated
@@ -0,0 +1,133 @@ | |||
# Resource API Overview | |||
The resource library primarily defines a type that captures about information about the entity |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit; s/about information/the information/
resource/Resource.md
Outdated
|
||
## Populating resources | ||
Resource information MAY be populated at any point between startup of the instrumented | ||
application and passing it the a backend-specific exporter. This explicitly includes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit; s/passing it the a/passing it to a/
resource/Resource.md
Outdated
|
||
Additionally, exporters SHOULD provide configuration hooks for users to provide their own | ||
translation unless the exporter's backend does not support resources at all. For such backends, | ||
exporters SHOULD allow attaching converting resource labels to metric labels. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure what you mean. what are metric labels?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That should be tags probably.
My assumptions about Resource.md:
Thoughts on these approaches: i. metadataI see two mechanisms for this (although there might be others):
Probing doesn't feel like a great option because it makes a lot of assumptions that are liable to break (e.g. what if I'm using a custom image, what if I have a default deny policy on my machine, etc). Importing also doesn't feel like a great option as it requires a recompilation to deploy to a new environment as I assume it would have an import that looks like this; import _ "oc/metadata/aws" I'm sure they can be made to work well but I don't think they're ideal as they feel too much like implicit magic.I can't deny their convenience in the contexts where it would just work. ii. environment variablesI think you could iterate over variables with a given prefix (e.g. AWS
k8s
My assumption with the above is that OC would truncate the prefix and down-case the remaining variable. ii. config file(s)I think simple property files could be used here with a default path relative to the binary and optionally overridden by environment variable or config option. It's late where I am will come back to this. |
This documentation serves to document the "look and feel" of the open source resource package. | ||
It describes they key types and overall behavior. | ||
|
||
The primary purpose of resources as a first-class concept in the core library is decoupling |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be useful to start with a definition of what a resource is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other file starts with the definiton
The resource library primarily defines a type that captures
information about the entity for which stats or traces are recorded.
If that sounds reasonable, I can move it to the readme as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please resolve this. I think it is important to specify what a "resource" is, maybe include examples.
I saw that def but it is not precise. Specifically, is the use of term "entity" intended to generalize to more than just an application process / workload? I.e. what else an entity could be, an endpoint? |
I agree that "entity" is as generic as it gets and I believe that's the best we can do in the end. Signals can refer to pretty much anything (consider IoT for example – "toothbrush" could be perfectly valid) and I don't think an instrumentation library should constrain that set artificially, especially since we can't draw any benefits from it. |
@nfisher regarding the approaches: I think the user should be enabled to use what works for their environment. Thus the goal is to not enforce a single specific solution but to provide a thin integration package for resources – with envvars being the fallthrough, that can always be relied on. In some environments that can mean automatic detection of resource information, which should be provided through libraries. As you said, this is unideal in the sense that it adds dependencies to the code. For environments where auto-detection is not possible, runtime configuration is ultimately required. Envvars seemed like an easy way to integrate. Config files are also a way to integrate of course. My main concern is that they make deployment more tedious (generate file, get file into the right place, permissions, ...) whereas envvars are easy to inject in most systems I'm aware of. Does that provide some background on motivation? Anything you'd like to see in the spec to clarify things? |
@fabxc agree on flexibility especially if controlled by the user as to when and how it happens. Will place a review for any further comments. To clarify for my own understanding where would you see these labels being attached during the initialisation process? func initTracer() {
sdexporter, err := stackdriver.NewExporter(stackdriver.Options{ProjectID: "testing"})
if err != nil {
return err
}
trace.RegisterExporter(sdexporter)
trace.ApplyConfig(trace.Config{DefaultSampler: trace.ProbabilitySampler(0.01)})
http.ListenAndServe(":50030", &ochttp.Handler{})
} |
The resource library primarily defines a type that captures information about the entity | ||
for which stats or traces are recorded. It further provides a framework for detection of | ||
resource information from the environment and progressive population as signals propagate | ||
from the core instrumentation library to a backend's exporter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a difference between core library
and core instrumentation library
? Core library seems to be most consistently used throughout.
Is it worth giving an explicit definition of what the core library
is? I assume it is any libraries provided under the Github census-instrumentation
org which provide a common abstraction to backend exporters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd agree with the definition you gave, especially as exporters are moved to census-ecosystem.
What's the official take @bogdandrutu?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think core library is better.
resource/Resource.md
Outdated
an agent attaches further labels about the underlying VM, the cluster, or geo-location. | ||
|
||
### From environment variables | ||
Population of resource information from environment varibales MUST be provided by the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/varibales/variables
|
||
For example, process-identifying information may be populated through the library while | ||
an agent attaches further labels about the underlying VM, the cluster, or geo-location. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth adding a From OC data structure
section?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mh, what do you mean specifically? Like when a resource is just passed around via internal OC APIs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Labels defined in files. Main reason I arrived here to burden you with my nonsense was a tweet by @rakyll about difficulty of label discovery in k8s.
The downward API that you referenced in the deleted document seems the most reasonable/consistent approach within k8s to get those details but I'm not sure how well its constraints map to the previously outlined mechanisms (e.g. OC_RESOURCE_LABELS
env variable and auto-detection).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue with the downward API is that it can be used to set arbitrary environment variables. Currently the auto-detection OC's Stackdriver exporter does relies on a documentation example but it's not standardized.
We could define a standard of course, which users have to adhere to. But in practice no one wants to configure every single deployment/pod with the downward API or something else. Using Kubernetes' admission control to inject this information automatically is much more feasible and powerful. And when using that, one can simply resort to the generic OC envvars proposed here without adding yet another one-off standard.
I have a working implementation for this and can hopefully release a design soon.
That said, I'd like to understand a use case where a file with information is easier than an environment variable. Even with admission control in Kubernetes as described above, implementation is a lot simpler with environment variables and a lot less likely to collide with other admission restrictions, e.g. "pods in this namespace must not mount volumes".
A resource object MUST NOT be mutated further once it is passed to a backend-specific exporter. | ||
From the provided resource information, the exporter MAY transform, drop, or add information | ||
to build the resource identifying data type specific to its backend. | ||
If the passed resource does not contain sufficient information, an exporter MAY drop |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the feedback loop on dropped/ignored data? (e.g. is it a silent error, printout to stderr, returned error, etc)
My preference would be towards failing fast with a validation error at the point of library re/initialisation. Alternatively but less preferred would be a returned error at the point of capture.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, as this would happen during initialization, fail-fast would be best. Will add that.
For the update case described above, emitting an error on update failures and continuing best effort with the old resource? The risk is data being a bit skewed. That seems better though than dropping all data. Especially since only small pieces of information are likely to change, which may not even be used by the exporter.
@nfisher thanks for all your comments. This is really helpful and I added some clarifications. |
Added back @nfisher I take we largely agree modulo potentially adding a file-based generic path? That could be added without conflicts at a later point. @jbd @bogdandrutu could you agree on whether you'd prefer to go with resource labels or tags? |
I have no strong opinions on tags vs labels. But you should consider that there is a cost of introducing new terminology. We are working on this field full time and can automatically can associate them. It is not true for an average user. @bogdandrutu, any more to add? |
@fabxc hrm common auto-detected path would need to think about that could see it working with JSON files in the format you've outlined? But generally speaking the more I think about files the more I think it might be best left as an exercise for the user that doesn't need to be encoded in the spec. It would be nice to have a reference implementation elsewhere though. RE: downward API vs admission control I think users are likely to arrive at a solution from a number of directions as I think "best practises" are still emerging with k8s. Also what you might do with a small number of services (e.g. 10's is likely to be different to 100's, or 1,000's) so I'm not sure there's a 1 size fits all there. With a lack of a common standard/best practise I'm not sure auto-detection will work with k8s. In general my leaning is towards explicit over magic. I'm ok with a little explicit wire-up, especially if there's a mechanism to "verify" my work. Assuming it doesn't exist already it would good to have a |
With respect to @rakyll comment. I would opt to use whatever OpenTracing and Dapper paper use. |
@nfisher Good point on providing the labels as debug output to a zpage. That should definitely be possible with the proposed APIs, i.e. the recommended detector functionality, will provide a function as you suggested. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some high-level comment. I think we have 2 things to solve here: uniqueness for timeseries, usability (query capability) for metrics (show me latency per container). This document kind of tries to address both but I struggle to understand how do we solve the first part.
resource/Resource.md
Outdated
|
||
Type strings and label keys MAY start with a domain name and MAY be further namespaced through | ||
single slashes. All other characters MUST be alphanumeric. Label values MAY contain any valid | ||
unicode character. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because you are using strings you need to define an encoding for them, otherwise use bytes. We prefer to support a subset of characters maybe consider printable ascii.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, that should've just said "utf8" probably. But I'm fine with a subset, which it will be anyway in practice.
Only thing I'd be worried about is, that that might not add up with backends and thus make OC restrictive without much gain.
Prometheus supports utf8 strings for label values and it hasn't been an issue yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We (ab)use prometheus at work and have much wider range in use than ASCII. 👍 to UTF-8.
application and passing it to a backend-specific exporter. This explicitly includes | ||
the path through future OpenCensus components such as agents or services. | ||
|
||
For example, process-identifying information may be populated through the library while |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have problems to understand what is a resource vs what is a task/process identifying. Seems that resource can identify a VM instances or containers which means the process identity is lacking.
Do we have to ensure uniqueness of timeseries exported via the resource? What if in a GCE VM we have multiple linux tasks how does the gce_instance resource uniquely identifies a timeseries?
Not sure I understand how you envision this to be used. We currently also added in our exporters a notion of Node (see https://github.com/census-instrumentation/opencensus-proto/blob/master/src/opencensus/proto/agent/common/v1/common.proto#L33) does this completely replace that? Not sure that the current proposal can completely replace that unless we try to add some mandatory fields that can get us some guarantees about uniqueness of generated timeseries.
Seems to me that this is a bit too fragile for users (out of the box experience will not work properly), unless they set the env variables or they run the agent (may work) or they link one of the libraries that we have for auto detection of the resources.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems that resource can identify a VM instances or containers which means the process identity is lacking.
The resource can be as granular as the OC user likes. I don't believe this is something we should or even can solve generically.
If the user wants to export with resource labels granular enough down to a process, they can. But this will not scale for many backends. Imagine something spawns a new process very often but never runs it concurrently. If each spawning of the process created a new set of series, virtually all backends would break under the load.
I believe at some point the user has to check at which level they can ensure no collisions take place and go with that – e.g. if they know their containers only run a single process exposing signals.
Note that this proposal does not amend anything about the <language>-<pid>@<hostname>
label attached to all metrics right now, which theoretically guarantees what you are concerned about.
Though I think it raises the problems mentioned above and that should be revisited eventually.
What if in a GCE VM we have multiple linux tasks how does the gce_instance resource uniquely identifies a timeseries?
Then a pure VM-based resource is the wrong choice (it almost always is). More granular labels must be provided.
Not sure that the current proposal can completely replace that unless we try to add some mandatory fields that can get us some guarantees about uniqueness of generated timeseries.
Sounds like valuable information that exporters can make use of additional to the resource info in building the final samples they send.
Is this intended to replace the fixed <language>-<pid>@<hostname>
metric label?
* `type`: a string which describes a well-known type of entity. It SHOULD be namespaced | ||
to avoid collisions across different environment, e.g. `k8s.io/container`, | ||
`cloud.google.com/gce/instance`. | ||
* `labels`: a dictionary of labels with string keys and values that provide information |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment on the strings here. Strings in go for example are encoded utf8 but in java default is utf16 what do we do here? I think it is always better to start with printable characters or a set smaller than "everything". Also if we support any character we need to do sanitization for almost all the backends, and the biggest problem is that after sanitization the keys may no longer be unique for example hence you break the protocol.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mh, does the in-memory encoding really matter? Sounds like this is up to the exporter to convert/normalize. Even if we restricted to alphanum, dashes and underscores, some backend may not like dashes and the exporter has to mutate before sending.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the in-memory representation matters for one reason: we are giving people an option to set these strings via environment variables so we have to define what a valid input it is for the key/value strings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need more details on intended use of API. If it's just a local endpoint scenario - I'd suggest to name is such, Resource
is a very rich term, let's not waste it
@@ -0,0 +1,154 @@ | |||
# Resource API Overview |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this API also addresses this issue: #135? Does it cover only local endpoint or also a remote endpoint?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The resource does not aim describe individual service endpoints. In particular for remote endpoints, that's straight out impossible to do in the same way I'd say.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But it does try to describe a local endpoint, right? Should it be called local endpoint then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean renaming resource
to local endpoint
? That possibly makes sense from a tracing perspective but not so much for other signals I believe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what other signals it wouldn't make sense for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A process doing file process can still expose metrics about its internal state without doing any request/response flows at all. Same for logs/events should they become part of OC at some point.
resource/Resource.md
Outdated
|
||
## Resource type | ||
A `Resource` describes the entity for which a signal was collected through two fields: | ||
* `type`: a string which describes a well-known type of entity. It SHOULD be namespaced |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be useful to have some semantical, predefined fields on top of labels. Like name
and instanceId
. Those fiedls can be extracted from labels to provide some out-of-the box expirience
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd only consider doing this in the form of standardized tag keys potentially – which can be done later on when we've a better grasp of how people are using resources in different environments.
Dedicated fields will inevitably make user-side configuration more complex and the overall system harder to understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd imagine system will want to draw applicaton map with the names taken from resource identity. In this case having name
or role
will be helpful as UI shouldn't worry about knowing all the different "providers" of such information.
Alernative may be to define a well-known postfix for those properties so exporter can do generic translation of labels to strongly-defined fields .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what you are getting at. I agree that in the system where the data ends up, one wants to have more structure to enable certain workflows.
This proposal does not extend as far as telling backend system to have a resource presentation in form of a type string and tag map. Instead, the specific exporters should convert the rather structure-less resource notion proposed here into something more structured.
For Stackdriver for example, the exporter is configured with a mapping that converts known resource type and tag keys to Stackdriver Monitored Resources (e.g. k8s_container) which have fixed fields.
Exporters will by default do a mapping of tags set by well-known resource detectors.
So ultimately, what's proposed here is so flexible to accomodate for the fact that backends may handle this very differently. For example, some backends may rather convert the OC resource into a single string like container.k8s.io/namespace/ns1/pod/nginx-1234/container/nginx
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. What I'm trying to accomplish is some hint from resource-auto-discovery logic to exporter on what is what. So perhaps some postfix in names that help determine whether this tag is about unique ID of instance or logical group of instances (role). Or have those two on the level with type
. Otherwise every exporter will need to know about every resource auto-detector.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd expect the number of active detectors in a given environment to be very low so that configuration isn't a particularly big problem. Plus, if most common environments are supported by default, only a small fraction of users has to do configuration at all.
Coming up with a canonical set of schemas (in the fashion of schema.org) will eat lots of time and is hard to get right. If it only relieves a few lines if configuration for a minority of users, I'm not sure that's worth it.
Again Prometheus as a data point – the provided data has even weaker structure and no default behavior at all, i.e. all users have to write rather awkward configuration. Regardless, they largely consider this to be a feature.
@@ -0,0 +1,154 @@ | |||
# Resource API Overview |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When reading "resource" there is also a scenario when you want to record operation on certain resource. Like query all traces that created, updated, read properties of, etc of a certain resource. Where resource may be some business entity. Is this an intendent use of an API?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resource is really about identifying the source of signal data and with that would be generally static. I didn't really see individual signals mutating information about their own source.
Could you give an example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When Azure creates, modifies, etc. VMs it marks all traces performing operations on them with the VM ID. We call it resource ID =)
I can imagine the same thing applicable everywhere where business defines some resources that has a lifetime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the traces get tagged with the VM they are coming from?
That would essentially be exactly what this proposal aims at – just that machine identifiers are often to coarse and in the case of metrics, basically always.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, with the VMs they operate on. VM here is a customer VM created via call to Azure. Sorry for confusion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see, the Azure API itself here being what's instrumented.
I think in that case the VM ID would just be part of the signal tag itself, i.e. a tag of trace rather than a tag of the resource of the trace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this comment comes from the very short intro of this document. Maybe make it more explicit what is resource.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
## Populating resources | ||
Resource information MAY be populated at any point between startup of the instrumented | ||
application and passing it to a backend-specific exporter. This explicitly includes | ||
the path through future OpenCensus components such as agents or services. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the property like "health" be a valid field for it? If one want to mark the node as unhealthy and then healthy again? Or it's set once, change never kind of properties?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, volatile metadata isn't helpful in identifying a resource – so I'd say no. At the same time, such data could be useful for filtering noisy/spammy signals before ingesting them.
So I think it should be allowed but exporters wouldn't actually make this data part of signal data itself.
Basically, for the near future, only static properties are of relevance IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so resource may have property whether it's "master" or configuration "version" that is currently being used.
Since this PR defines resources - I'd suggest to define resources lifetime and health as well.
Mostly my comment comes from the term "resource". It is so wide and general that I natirally see the desire to use it to design health and lifetime management of that resource.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think whether a resource has health or lifetime info to begin with really depends on the resource. If it has this information, a library implementing a resource detector can easily provide this info through dedicated labels.
Though, as hinted at in other comments, I think things should mostly be constrained to static resource information. Backends can pull into volatile metadata at query time from other sources.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
health and lifetime tracking requires more responsive mechanism to detect changes that pulling. If this is a scenario - it should be taken into consideration
core library. It provides the user with an ubiquitious way to manually provide information | ||
that may not be detectable automatically through available integration libraries. | ||
|
||
Two environment variables are used: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so the assumption is that there is a single resource type per process? I'm still trying to wrap my head around what is resource thus the question. Will there be option to have admin
part of site reporting resource separately from user
part?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, that information should generally remain part of the signal tags/labels. The resource is about identifying the signal source itself. In 90% of cases, that just means being able to identify the process through a set of tags that are meaningful within the context of the environment later, e.g. the k8s_container resource in Stackdriver.
Currently, this is only possible through the <language>-<pid>@<hostname>
label on the signal itself, which rarely makes sense to an end user later.
I get that the mental line can get kinda of blurry though.
For example, if the actual source of signals isn't able to export data itself (e.g. some embedded device) but instead a third process exports signals about N sources by proxy. Then being able to set different resources explicitly is valuable.
That would generally be an exception case that can be added by further API extensions. I'd defer that though so the design of such an extension can be guided by real-world use cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So does process defines the boundary or DI container? It needs to be clearly outlined in spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My opinion differs on this a bit. I think OC should provide clear docs around best practices – we can probably also some SHOULD/SHOULD NOT sentences to the spec.
But a clear line is not possible with this in my experience. There'll always be things we'd clearly see as "MUST NOT do this" that people might have a valid one-off use case for. If we don't enable those, even if they don't yield ideal results, OC becomes significantly less valuable for those users.
If 1 out of 20 teams in an org cannot use the instrumentation system of choice, the negative impact is far greater than 5%.
resource/Resource.md
Outdated
|
||
Two environment variables are used: | ||
* `OC_RESOURCE_TYPE`: defines the resource type. Leading and trailing whitespaces are trimmed. | ||
* `OC_RESOURCE_LABELS`: defines resource labels as a comma-seperated list of key/value pairs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BNF definition would be useful here
|
||
### Merging | ||
As different mechanisms are run to gain information about a resource, their information | ||
has to be merged into a single resulting resource. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
automatically or on demand? It feels that there is a singleton for a resource (to my previous question). Would this singleton be per tracer? Or it should be global?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would happen explicitly during initialization/setup.
See this PR for the core library extension I proposed (Go).
With that, the only change in general would be an option when setting up the exporter, e.g.:
exporter, err := stackdriver.NewExporter(stackdriver.Options{
ResourceDetector: resource.ChainedDetector(
resource.FromEnv,
aws.Detect,
azure.Detect,
gcp.Detect,
),
})
Or, with a helper package that chains the common auto-detectors together:
exporter, err := stackdriver.NewExporter(stackdriver.Options{
ResourceDetector: auto.Detect,
})
resource/Resource.md
Outdated
### Updates | ||
Over the runtime of an application, resource information may change in some cases. To | ||
allow exporters to incorporate those changes, they SHOULD take a resource getter as a | ||
configuration parameter. The exporter SHOULD periodically rerun the getter and incorporate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should there be notification mechanism for subscribers on updates? It will be useful to have if you have an exporter that needs to react on these changes. Otherwise there will be a mess of threads checking for updates from each other
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now I think a periodic rerun of the detector and comparing with the previous result to detect changes will be sufficient. But there may very well be other use cases I didn't think of.
Note that the exporter-specific part of this spec is largely SHOULD, so there's enough flexibility to deviate/extend where necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
heatlth and lifetime management are scenarios that might require more immidiate update of properties.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SergeyKanzhelev I would like to understand the "heatlth and lifetime management are scenarios". Can you give me more details about what you have in mind.
My personal opinion is to keep simplicity and actually not support runtime config for these in the library at all, unless we have a clear use-case which I miss here, but I can be convinced if you have one in mind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The key of heatlth and lifetime management is that different modules may contribute into the common state and make decisions based on it. Like if CPU is too high - make sampling more aggressive and report yellow state of the app on this node. Or if exporter failed to upload telemetry with many retries - switch an app to red state and potentially node out of rotation. Or when app was swapped into production from staging - start collecting more metrics.
I agree with you that resource concept like proposed can be implemented if there is a way to set global attributes like proposed in #167. Maybe make it more elaborate and implement it via callbacks to resource auto-discovery modules (without calling them that).
So static and almost static informaiton is not that hard to implement without introducing new concepts (except global attributes or attributes callbacks).
resource/Resource.md
Outdated
``` | ||
|
||
### Updates | ||
Over the runtime of an application, resource information may change in some cases. To |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some words on how often are required. Is health change or config update notification should be using this API?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per above, I'd see whether users report use cases.
Prometheus has a similar resource info model (called target info there) – different integrations update via watches/notification or just via periodic refreshes. I cannot recall a single use cases where users complained about periodic refreshes being too laggy.
It may also be interesting to see what (meta)data we added to different integrations over time based on user demand.
Most non-identifying information was added a) for filtering or b) because people wanted to use attach it to the signal directly.
While b) should be discourage, I think OpenCensus must allow for odd one-off use cases since it becomes most valuable, when users can use it for everything.
vendors, MUST be implemented outside of the core libraries in third party or | ||
[census-ecosystem][census-ecosystem] repositories. | ||
|
||
### Merging |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how the type
property will be merged?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Already set labels or type fields MUST NOT be overwritten.
Thanks for the detailed feedback @SergeyKanzhelev. I tried to answer all your questions and concerns with some context. |
@fabxc thank you! I understand it much better now. Since it's all in-proc I believe more responsive update notification mechanism may be beneficial for health and lifetime tracking of the resource. Also some strongly typed fields might be helpful to simplify writing exporters. Do you think it's something this spec can accomodate? |
I personally struggle a bit on this so I would like to clarify the overall picture as I understand it. I think we need two different concepts (we may have overlap between them, that's probably the cause of confusion for me at least):
I think we can conclude that one process can produce signals that belong to different resources, so the resource is a property of the signal not a property of the producer task. There are different use-cases for
Questions:
I really hope that I did not confused people more, and I would like to here others opinion about this. |
@bogdandrutu it makes sense. If you think of resources as a way to design this type of hierarchy - there are also layers like role (set of endpoints - like "health", "admin") and endpoint itself. It also calls to association of an individual span with the resource. Not for exporter to pull data from resources. Is it what you are thinking about? |
Signed-off-by: Fabian Reinartz <[email protected]>
Signed-off-by: Fabian Reinartz <[email protected]>
Signed-off-by: Fabian Reinartz <[email protected]>
Added some clarifications around the allowed character set (it's more restrictive now and matches tags). @SergeyKanzhelev I changed the section on updates following a discussion with @mayurkale22 and @bogdandrutu to discourage usage of any mutable attributes in resource labels. On the one hand this is definitely the safe route and we can always lift restrictions based on the future feedback. On the other hand, it seems like mutable runtime information like health serves a quite different purpose from the resource feature discussed here. |
This documentation serves to document the "look and feel" of the open source resource package. | ||
It describes they key types and overall behavior. | ||
|
||
The primary purpose of resources as a first-class concept in the core library is decoupling |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please resolve this. I think it is important to specify what a "resource" is, maybe include examples.
`Resource` MUST have getters for retrieving all the information used in `Resource` definition. | ||
|
||
Example in Go: | ||
```go |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a TODO to add a link to the proto definition.
resource/Resource.md
Outdated
type Detector func(context.Context) (*Resource, error) | ||
|
||
// Returns a detector that always returns a specific resource. | ||
func NewDetectorFromResource(*Resource) Detector |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is more like a helper method. Not necessary to have it in the specs.
func NewDetectorFromResource(*Resource) Detector | ||
|
||
// Returns a detector that runs all input detectors sequentially and merges their results. | ||
func ChainedDetector(...Detector) Detector |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Chained does not sound very good to me. Maybe ResolveResourceFromDetector or maybe you have better suggestions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mh, it doesn't resolve a resource though but rather just turns a list of Detectors into a single Detector again that has to be called first.
Best alternative I can think of is CombinedDetector
.
resource/Resource.md
Outdated
``` | ||
|
||
### Updates | ||
Over the runtime of an application, attributes of a resource may change in some cases. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here this sentence implies we support dynamic changes of the resource fields. Or maybe I am just wrong in understanding this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SergeyKanzhelev what is your opinion on this?
LGTM for me with small comments sent earlier.
Signed-off-by: Fabian Reinartz <[email protected]>
@SergeyKanzhelev I would like you to take a final look at this and also here where we define some examples: |
@SergeyKanzhelev did you have a chance to take a look at this? |
@fabxc thank you for addressing feedback. Looks good to me. Sorry for delay |
This spec aims to cover my proposal for a resource package in the core library (Go reference implementation).
I dropped the existing spec that was covering specific cloud vendor auto-detection since part of the proposal is that those live independently in
census-ecosystem
or entirely elsewhere.I took my best guesses on MAY vs. SHOULD vs. MUST based on existing specs but don't feel strongly in most cases. Any guidance is appreciated.