Add resource specs #162

fabxc · 2018-08-23T12:59:24Z

This spec aims to cover my proposal for a resource package in the core library (Go reference implementation).
I dropped the existing spec that was covering specific cloud vendor auto-detection since part of the proposal is that those live independently in census-ecosystem or entirely elsewhere.

I took my best guesses on MAY vs. SHOULD vs. MUST based on existing specs but don't feel strongly in most cases. Any guidance is appreciated.

rakyll · 2018-08-23T16:42:55Z

resource/Resource.md

+```go
+type Resource {
+	Type   string
+	Labels map[string]string


Your implementation had Labels as Tags. Are we going with Labels?

Yes, we changed it from labels to tags. @Ramonza asked elsewhere yesterday whether "labels" or "properties" would make more sense.
Personally I don't feel strongly – there are arguments for either. Happy to change it to whatever you decide on.

rakyll · 2018-08-23T16:45:03Z

resource/Resource.md

+The key MUST be seperated from the value by a single `=`. Values MAY be quoted with a single
+leading and trailing `"`. If a values contains whitespaces, `=`, or `"` characters it MUST be
+quoted and `"` characters MUST be escaped.
+


Can you please give examples of OC_RESOURCE_TYPE and OC_RESOURCE_LABELS to demonstrate the format?

rakyll · 2018-08-23T16:49:32Z

resource/Resource.md

+}
+```
+
+Exporter libraries MAY provide a default translation for well-known input resource types and labels.


It might be good to add a list of all known label keys here in the future.

Then, implementors can look up for the whole list and decide what they can map.

I think the known label keys should be specific to the auto-detection implementations. The one's living in census-ecosystem can be documented there.
In general users may implement their own detection with their own set of keys – thus maintaining an authoritative set in the core spec seems problematic. Hence the proposal to use domain namespacing.

songy23 · 2018-08-23T16:47:43Z

resource/README.md

@@ -0,0 +1,10 @@
+# OpenCensus Library Resource Package
+This documentation serves to document the "look and feel" of the open source resource package.
+It describes they key types and overall behavior.


nit: s/they/their

songy23 · 2018-08-23T16:48:29Z

resource/README.md

+
+The primary purpose of resources as a first-class concept in the core library is decoupling
+of discovery of resource information from exporters. This allows for independent development
+of either and easy customization for users that need to integrate with closed source environments.


Remove "either and"?

songy23 · 2018-08-23T16:49:35Z

resource/Resource.md

+The resource library primarily defines a type that captures about information about the entity
+for which stats or traces are recorded. It further provides a framework for detection of
+resource information from the environment and progressive population as signals propagate
+from the core instrumentation library to the a backend's exporter.


"the a" -> "a"

songy23 · 2018-08-23T16:50:02Z

resource/README.md

@@ -0,0 +1,10 @@
+# OpenCensus Library Resource Package
+This documentation serves to document the "look and feel" of the open source resource package.


nit: s/open source/OpenCensus

songy23 · 2018-08-23T16:58:48Z

resource/Resource.md

+unicode character.
+
+Implementations MAY define a `Resource` data type, constructed from the parameters above.
+Resource MAY have getters for retrieving all the information used in `Resource` definition.


I think getters is a "must".

songy23 · 2018-08-23T17:03:03Z

utils/MonitoredResource.md

@@ -1,84 +0,0 @@
-# Monitored Resource


I think this page can be left as additional information for the "Auto-detection" section.

My thinking was that it's more scalable to keep these docs close to the code they belong to. OTOH, with spread across languages, this repo may indeed still be the better option – maybe with separate file per detection mechanism then as those will probably grow.

rghetia · 2018-08-23T20:52:01Z

resource/Resource.md

@@ -0,0 +1,133 @@
+# Resource API Overview
+The resource library primarily defines a type that captures about information about the entity


nit; s/about information/the information/

rghetia · 2018-08-23T20:55:43Z

resource/Resource.md

+
+## Populating resources
+Resource information MAY be populated at any point between startup of the instrumented
+application and passing it the a backend-specific exporter. This explicitly includes


nit; s/passing it the a/passing it to a/

rghetia · 2018-08-23T21:30:40Z

resource/Resource.md

+
+Additionally, exporters SHOULD provide configuration hooks for users to provide their own
+translation unless the exporter's backend does not support resources at all. For such backends,
+exporters SHOULD allow attaching converting resource labels to metric labels.


not sure what you mean. what are metric labels?

That should be tags probably.

nfisher · 2018-08-24T01:05:40Z

My assumptions about Resource.md:

this document is to outline how to associate environment related labels to spans and metrics (e.g. zone, region, pod, etc).
The primary goal in doing so is to provide flexibility for drill-down, and fine-grained groupings in downstream tools.
There appears to be 3 approaches being considered for associating the labels:
1. metadata API end-points.
2. environment variables.
3. config file(s).

Thoughts on these approaches:

i. metadata

I see two mechanisms for this (although there might be others):

probing.
import at build time.

Probing doesn't feel like a great option because it makes a lot of assumptions that are liable to break (e.g. what if I'm using a custom image, what if I have a default deny policy on my machine, etc).

Importing also doesn't feel like a great option as it requires a recompilation to deploy to a new environment as I assume it would have an import that looks like this;

import _ "oc/metadata/aws"

I'm sure they can be made to work well but I don't think they're ideal as they feel too much like implicit magic.I can't deny their convenience in the contexts where it would just work.

ii. environment variables

I think you could iterate over variables with a given prefix (e.g. OC_LABEL_) and use them as labels. The terms region, zone, host, application, application_group can map to most cloud providers:

AWS

OC_LABEL_REGION=ca-central-1
OC_LABEL_ZONE=ca-central-1a
OC_LABEL_HOST=hostname.aws
OC_LABEL_APPLICATION=my-app
OC_LABEL_APPLICATION_GROUP=my-app-asg

k8s

OC_LABEL_REGION=ca-central-1
OC_LABEL_ZONE=ca-central-1a
OC_LABEL_HOST=pod-instance-id1234
OC_LABEL_APPLICATION=my-app
OC_LABEL_APPLICATION_GROUP=pod

My assumption with the above is that OC would truncate the prefix and down-case the remaining variable.

ii. config file(s)

I think simple property files could be used here with a default path relative to the binary and optionally overridden by environment variable or config option.

It's late where I am will come back to this.

yurishkuro · 2018-08-24T02:12:09Z

resource/README.md

+This documentation serves to document the "look and feel" of the open source resource package.
+It describes they key types and overall behavior.
+
+The primary purpose of resources as a first-class concept in the core library is decoupling


It would be useful to start with a definition of what a resource is.

The other file starts with the definiton

The resource library primarily defines a type that captures information about the entity for which stats or traces are recorded.

If that sounds reasonable, I can move it to the readme as well.

Please resolve this. I think it is important to specify what a "resource" is, maybe include examples.

yurishkuro · 2018-08-24T14:24:55Z

I saw that def but it is not precise. Specifically, is the use of term "entity" intended to generalize to more than just an application process / workload? I.e. what else an entity could be, an endpoint?

fabxc · 2018-08-27T08:10:51Z

I saw that def but it is not precise. Specifically, is the use of term "entity" intended to generalize to more than just an application process / workload? I.e. what else an entity could be, an endpoint?

I agree that "entity" is as generic as it gets and I believe that's the best we can do in the end. Signals can refer to pretty much anything (consider IoT for example – "toothbrush" could be perfectly valid) and I don't think an instrumentation library should constrain that set artificially, especially since we can't draw any benefits from it.

fabxc · 2018-08-27T13:25:34Z

@nfisher regarding the approaches: I think the user should be enabled to use what works for their environment. Thus the goal is to not enforce a single specific solution but to provide a thin integration package for resources – with envvars being the fallthrough, that can always be relied on.

In some environments that can mean automatic detection of resource information, which should be provided through libraries. As you said, this is unideal in the sense that it adds dependencies to the code.
That may be perfectly fine for some users – after all they'd be doing the same for the exporters they use. Both could eventually be offloaded to an OC agent/proxy that includes those dependencies once, while application binaries only send their data through a generic protocol.
The alternative is that the user has to perform the same actions the library does as a deployment step. That's certainly less magical, but also more error prone in total and a worse user experience.
Client libraries for cloud vendors also use what seems like magic to determine credentials so the user doesn't have to care about that. The auto-detection of resource information hooks into the exact same mechanisms.
In general, dumping the gathered resource information should be relatively straightforward way to debug what information was gathered or not.

For environments where auto-detection is not possible, runtime configuration is ultimately required. Envvars seemed like an easy way to integrate.
Using prefix-based variables is indeed an option. However, they character set becomes automatically limited and namespaced label keys such as k8s.io/pod_name aren't possible. This is rather important as the goal is to detach gathering of resource information from interpreting it, which happens in the exporters. That ensures that exporters are truly swappable.

Config files are also a way to integrate of course. My main concern is that they make deployment more tedious (generate file, get file into the right place, permissions, ...) whereas envvars are easy to inject in most systems I'm aware of.
I think both could exist, but maybe it's worth waiting until a need for files arises?

Does that provide some background on motivation? Anything you'd like to see in the spec to clarify things?

nfisher · 2018-08-27T14:56:34Z

@fabxc agree on flexibility especially if controlled by the user as to when and how it happens. Will place a review for any further comments.

To clarify for my own understanding where would you see these labels being attached during the initialisation process?

func initTracer() {
    sdexporter, err := stackdriver.NewExporter(stackdriver.Options{ProjectID: "testing"})
    if err != nil {
        return err
    }
    trace.RegisterExporter(sdexporter)
    trace.ApplyConfig(trace.Config{DefaultSampler: trace.ProbabilitySampler(0.01)})
    http.ListenAndServe(":50030", &ochttp.Handler{})
}

resource/Resource.md

nfisher · 2018-08-27T15:00:51Z

resource/Resource.md

+The resource library primarily defines a type that captures information about the entity
+for which stats or traces are recorded. It further provides a framework for detection of
+resource information from the environment and progressive population as signals propagate
+from the core instrumentation library to a backend's exporter.


Is there a difference between core library and core instrumentation library? Core library seems to be most consistently used throughout.

Is it worth giving an explicit definition of what the core library is? I assume it is any libraries provided under the Github census-instrumentation org which provide a common abstraction to backend exporters?

I'd agree with the definition you gave, especially as exporters are moved to census-ecosystem.
What's the official take @bogdandrutu?

I think core library is better.

nfisher · 2018-08-27T15:02:00Z

resource/Resource.md

+an agent attaches further labels about the underlying VM, the cluster, or geo-location.
+
+### From environment variables
+Population of resource information from environment varibales MUST be provided by the


s/varibales/variables

nfisher · 2018-08-27T15:03:22Z

resource/Resource.md

+
+For example, process-identifying information may be populated through the library while
+an agent attaches further labels about the underlying VM, the cluster, or geo-location.
+


Is it worth adding a From OC data structure section?

Mh, what do you mean specifically? Like when a resource is just passed around via internal OC APIs?

Labels defined in files. Main reason I arrived here to burden you with my nonsense was a tweet by @rakyll about difficulty of label discovery in k8s.

The downward API that you referenced in the deleted document seems the most reasonable/consistent approach within k8s to get those details but I'm not sure how well its constraints map to the previously outlined mechanisms (e.g. OC_RESOURCE_LABELS env variable and auto-detection).

The issue with the downward API is that it can be used to set arbitrary environment variables. Currently the auto-detection OC's Stackdriver exporter does relies on a documentation example but it's not standardized.
We could define a standard of course, which users have to adhere to. But in practice no one wants to configure every single deployment/pod with the downward API or something else. Using Kubernetes' admission control to inject this information automatically is much more feasible and powerful. And when using that, one can simply resort to the generic OC envvars proposed here without adding yet another one-off standard.

I have a working implementation for this and can hopefully release a design soon.

That said, I'd like to understand a use case where a file with information is easier than an environment variable. Even with admission control in Kubernetes as described above, implementation is a lot simpler with environment variables and a lot less likely to collide with other admission restrictions, e.g. "pods in this namespace must not mount volumes".

nfisher · 2018-08-27T15:06:28Z

resource/Resource.md

+A resource object MUST NOT be mutated further once it is passed to a backend-specific exporter.
+From the provided resource information, the exporter MAY transform, drop, or add information
+to build the resource identifying data type specific to its backend.
+If the passed resource does not contain sufficient information, an exporter MAY drop


What is the feedback loop on dropped/ignored data? (e.g. is it a silent error, printout to stderr, returned error, etc)

My preference would be towards failing fast with a validation error at the point of library re/initialisation. Alternatively but less preferred would be a returned error at the point of capture.

Yes, as this would happen during initialization, fail-fast would be best. Will add that.
For the update case described above, emitting an error on update failures and continuing best effort with the old resource? The risk is data being a bit skewed. That seems better though than dropping all data. Especially since only small pieces of information are likely to change, which may not even be used by the exporter.

fabxc · 2018-08-29T13:13:40Z

@nfisher thanks for all your comments. This is really helpful and I added some clarifications.

fabxc · 2018-08-31T09:15:07Z

Added back utils/MonitoredResource.md since this will only correspond to the independent new resource package. It can be replaced with new docs in parallel when the corresponding resource auto-detection mechanisms are added.

@nfisher I take we largely agree modulo potentially adding a file-based generic path? That could be added without conflicts at a later point.

@jbd @bogdandrutu could you agree on whether you'd prefer to go with resource labels or tags?

rakyll · 2018-08-31T12:53:28Z

I have no strong opinions on tags vs labels. But you should consider that there is a cost of introducing new terminology. We are working on this field full time and can automatically can associate them. It is not true for an average user. @bogdandrutu, any more to add?

nfisher · 2018-08-31T13:24:39Z

@fabxc hrm common auto-detected path would need to think about that could see it working with JSON files in the format you've outlined? But generally speaking the more I think about files the more I think it might be best left as an exercise for the user that doesn't need to be encoded in the spec. It would be nice to have a reference implementation elsewhere though.

RE: downward API vs admission control I think users are likely to arrive at a solution from a number of directions as I think "best practises" are still emerging with k8s. Also what you might do with a small number of services (e.g. 10's is likely to be different to 100's, or 1,000's) so I'm not sure there's a 1 size fits all there. With a lack of a common standard/best practise I'm not sure auto-detection will work with k8s. In general my leaning is towards explicit over magic. I'm ok with a little explicit wire-up, especially if there's a mechanism to "verify" my work. Assuming it doesn't exist already it would good to have a func ResolvedLabels(w io.Writer) or ResolvedLabels() []Labels that would allow the user to inspect what OpenCensus views as it's label set for the process which could also be added to pagez end-point. I would probably use it to print the resolved labels as part of my start-up process. Not sure if it fits this spec though.

nfisher · 2018-08-31T13:25:06Z

With respect to @rakyll comment. I would opt to use whatever OpenTracing and Dapper paper use.

fabxc · 2018-09-06T11:56:04Z

@nfisher Good point on providing the labels as debug output to a zpage. That should definitely be possible with the proposed APIs, i.e. the recommended detector functionality, will provide a function as you suggested.

bogdandrutu

Some high-level comment. I think we have 2 things to solve here: uniqueness for timeseries, usability (query capability) for metrics (show me latency per container). This document kind of tries to address both but I struggle to understand how do we solve the first part.

bogdandrutu · 2018-09-21T22:49:11Z

resource/Resource.md

+
+Type strings and label keys MAY start with a domain name and MAY be further namespaced through
+single slashes. All other characters MUST be alphanumeric. Label values MAY contain any valid
+unicode character.


Because you are using strings you need to define an encoding for them, otherwise use bytes. We prefer to support a subset of characters maybe consider printable ascii.

Right, that should've just said "utf8" probably. But I'm fine with a subset, which it will be anyway in practice.
Only thing I'd be worried about is, that that might not add up with backends and thus make OC restrictive without much gain.

Prometheus supports utf8 strings for label values and it hasn't been an issue yet.

We (ab)use prometheus at work and have much wider range in use than ASCII. 👍 to UTF-8.

bogdandrutu · 2018-09-21T22:59:51Z

resource/Resource.md

+application and passing it to a backend-specific exporter. This explicitly includes
+the path through future OpenCensus components such as agents or services.
+
+For example, process-identifying information may be populated through the library while


I have problems to understand what is a resource vs what is a task/process identifying. Seems that resource can identify a VM instances or containers which means the process identity is lacking.

Do we have to ensure uniqueness of timeseries exported via the resource? What if in a GCE VM we have multiple linux tasks how does the gce_instance resource uniquely identifies a timeseries?

Not sure I understand how you envision this to be used. We currently also added in our exporters a notion of Node (see https://github.com/census-instrumentation/opencensus-proto/blob/master/src/opencensus/proto/agent/common/v1/common.proto#L33) does this completely replace that? Not sure that the current proposal can completely replace that unless we try to add some mandatory fields that can get us some guarantees about uniqueness of generated timeseries.

Seems to me that this is a bit too fragile for users (out of the box experience will not work properly), unless they set the env variables or they run the agent (may work) or they link one of the libraries that we have for auto detection of the resources.

Seems that resource can identify a VM instances or containers which means the process identity is lacking.

The resource can be as granular as the OC user likes. I don't believe this is something we should or even can solve generically.
If the user wants to export with resource labels granular enough down to a process, they can. But this will not scale for many backends. Imagine something spawns a new process very often but never runs it concurrently. If each spawning of the process created a new set of series, virtually all backends would break under the load.
I believe at some point the user has to check at which level they can ensure no collisions take place and go with that – e.g. if they know their containers only run a single process exposing signals.

Note that this proposal does not amend anything about the <language>-<pid>@<hostname> label attached to all metrics right now, which theoretically guarantees what you are concerned about.
Though I think it raises the problems mentioned above and that should be revisited eventually.

What if in a GCE VM we have multiple linux tasks how does the gce_instance resource uniquely identifies a timeseries?

Then a pure VM-based resource is the wrong choice (it almost always is). More granular labels must be provided.

Not sure that the current proposal can completely replace that unless we try to add some mandatory fields that can get us some guarantees about uniqueness of generated timeseries.

Sounds like valuable information that exporters can make use of additional to the resource info in building the final samples they send.
Is this intended to replace the fixed <language>-<pid>@<hostname> metric label?

bogdandrutu · 2018-09-21T23:13:10Z

resource/Resource.md

+* `type`: a string which describes a well-known type of entity. It SHOULD be namespaced
+to avoid collisions across different environment, e.g. `k8s.io/container`,
+`cloud.google.com/gce/instance`. 
+* `labels`: a dictionary of labels with string keys and values that provide information


Same comment on the strings here. Strings in go for example are encoded utf8 but in java default is utf16 what do we do here? I think it is always better to start with printable characters or a set smaller than "everything". Also if we support any character we need to do sanitization for almost all the backends, and the biggest problem is that after sanitization the keys may no longer be unique for example hence you break the protocol.

Mh, does the in-memory encoding really matter? Sounds like this is up to the exporter to convert/normalize. Even if we restricted to alphanum, dashes and underscores, some backend may not like dashes and the exporter has to mutate before sending.

I think the in-memory representation matters for one reason: we are giving people an option to set these strings via environment variables so we have to define what a valid input it is for the key/value strings.

SergeyKanzhelev

I need more details on intended use of API. If it's just a local endpoint scenario - I'd suggest to name is such, Resource is a very rich term, let's not waste it

SergeyKanzhelev · 2018-09-24T16:56:57Z

resource/Resource.md

@@ -0,0 +1,154 @@
+# Resource API Overview


Does this API also addresses this issue: #135? Does it cover only local endpoint or also a remote endpoint?

The resource does not aim describe individual service endpoints. In particular for remote endpoints, that's straight out impossible to do in the same way I'd say.

But it does try to describe a local endpoint, right? Should it be called local endpoint then?

You mean renaming resource to local endpoint? That possibly makes sense from a tracing perspective but not so much for other signals I believe.

what other signals it wouldn't make sense for?

A process doing file process can still expose metrics about its internal state without doing any request/response flows at all. Same for logs/events should they become part of OC at some point.

SergeyKanzhelev · 2018-09-24T16:57:45Z

resource/Resource.md

+
+## Resource type
+A `Resource` describes the entity for which a signal was collected through two fields:
+* `type`: a string which describes a well-known type of entity. It SHOULD be namespaced


It would be useful to have some semantical, predefined fields on top of labels. Like name and instanceId. Those fiedls can be extracted from labels to provide some out-of-the box expirience

I'd only consider doing this in the form of standardized tag keys potentially – which can be done later on when we've a better grasp of how people are using resources in different environments.
Dedicated fields will inevitably make user-side configuration more complex and the overall system harder to understand.

I'd imagine system will want to draw applicaton map with the names taken from resource identity. In this case having name or role will be helpful as UI shouldn't worry about knowing all the different "providers" of such information.

Alernative may be to define a well-known postfix for those properties so exporter can do generic translation of labels to strongly-defined fields .

I see what you are getting at. I agree that in the system where the data ends up, one wants to have more structure to enable certain workflows.

This proposal does not extend as far as telling backend system to have a resource presentation in form of a type string and tag map. Instead, the specific exporters should convert the rather structure-less resource notion proposed here into something more structured.
For Stackdriver for example, the exporter is configured with a mapping that converts known resource type and tag keys to Stackdriver Monitored Resources (e.g. k8s_container) which have fixed fields.
Exporters will by default do a mapping of tags set by well-known resource detectors.

So ultimately, what's proposed here is so flexible to accomodate for the fact that backends may handle this very differently. For example, some backends may rather convert the OC resource into a single string like container.k8s.io/namespace/ns1/pod/nginx-1234/container/nginx.

Makes sense. What I'm trying to accomplish is some hint from resource-auto-discovery logic to exporter on what is what. So perhaps some postfix in names that help determine whether this tag is about unique ID of instance or logical group of instances (role). Or have those two on the level with type. Otherwise every exporter will need to know about every resource auto-detector.

I'd expect the number of active detectors in a given environment to be very low so that configuration isn't a particularly big problem. Plus, if most common environments are supported by default, only a small fraction of users has to do configuration at all.

Coming up with a canonical set of schemas (in the fashion of schema.org) will eat lots of time and is hard to get right. If it only relieves a few lines if configuration for a minority of users, I'm not sure that's worth it.

Again Prometheus as a data point – the provided data has even weaker structure and no default behavior at all, i.e. all users have to write rather awkward configuration. Regardless, they largely consider this to be a feature.

SergeyKanzhelev · 2018-09-24T17:00:09Z

resource/Resource.md

@@ -0,0 +1,154 @@
+# Resource API Overview


When reading "resource" there is also a scenario when you want to record operation on certain resource. Like query all traces that created, updated, read properties of, etc of a certain resource. Where resource may be some business entity. Is this an intendent use of an API?

Resource is really about identifying the source of signal data and with that would be generally static. I didn't really see individual signals mutating information about their own source.
Could you give an example?

When Azure creates, modifies, etc. VMs it marks all traces performing operations on them with the VM ID. We call it resource ID =)

I can imagine the same thing applicable everywhere where business defines some resources that has a lifetime.

So the traces get tagged with the VM they are coming from?
That would essentially be exactly what this proposal aims at – just that machine identifiers are often to coarse and in the case of metrics, basically always.

no, with the VMs they operate on. VM here is a customer VM created via call to Azure. Sorry for confusion

Oh I see, the Azure API itself here being what's instrumented.

I think in that case the VM ID would just be part of the signal tag itself, i.e. a tag of trace rather than a tag of the resource of the trace.

this comment comes from the very short intro of this document. Maybe make it more explicit what is resource.

SergeyKanzhelev · 2018-09-24T17:02:59Z

resource/Resource.md

+## Populating resources
+Resource information MAY be populated at any point between startup of the instrumented
+application and passing it to a backend-specific exporter. This explicitly includes
+the path through future OpenCensus components such as agents or services.


Does the property like "health" be a valid field for it? If one want to mark the node as unhealthy and then healthy again? Or it's set once, change never kind of properties?

In general, volatile metadata isn't helpful in identifying a resource – so I'd say no. At the same time, such data could be useful for filtering noisy/spammy signals before ingesting them.

So I think it should be allowed but exporters wouldn't actually make this data part of signal data itself.
Basically, for the near future, only static properties are of relevance IMO.

so resource may have property whether it's "master" or configuration "version" that is currently being used.

Since this PR defines resources - I'd suggest to define resources lifetime and health as well.

Mostly my comment comes from the term "resource". It is so wide and general that I natirally see the desire to use it to design health and lifetime management of that resource.

I think whether a resource has health or lifetime info to begin with really depends on the resource. If it has this information, a library implementing a resource detector can easily provide this info through dedicated labels.

Though, as hinted at in other comments, I think things should mostly be constrained to static resource information. Backends can pull into volatile metadata at query time from other sources.

health and lifetime tracking requires more responsive mechanism to detect changes that pulling. If this is a scenario - it should be taken into consideration

SergeyKanzhelev · 2018-09-24T17:05:03Z

resource/Resource.md

+core library. It provides the user with an ubiquitious way to manually provide information
+that may not be detectable automatically through available integration libraries.
+
+Two environment variables are used:


so the assumption is that there is a single resource type per process? I'm still trying to wrap my head around what is resource thus the question. Will there be option to have admin part of site reporting resource separately from user part?

No, that information should generally remain part of the signal tags/labels. The resource is about identifying the signal source itself. In 90% of cases, that just means being able to identify the process through a set of tags that are meaningful within the context of the environment later, e.g. the k8s_container resource in Stackdriver.
Currently, this is only possible through the <language>-<pid>@<hostname> label on the signal itself, which rarely makes sense to an end user later.

I get that the mental line can get kinda of blurry though.
For example, if the actual source of signals isn't able to export data itself (e.g. some embedded device) but instead a third process exports signals about N sources by proxy. Then being able to set different resources explicitly is valuable.
That would generally be an exception case that can be added by further API extensions. I'd defer that though so the design of such an extension can be guided by real-world use cases.

So does process defines the boundary or DI container? It needs to be clearly outlined in spec.

My opinion differs on this a bit. I think OC should provide clear docs around best practices – we can probably also some SHOULD/SHOULD NOT sentences to the spec.

But a clear line is not possible with this in my experience. There'll always be things we'd clearly see as "MUST NOT do this" that people might have a valid one-off use case for. If we don't enable those, even if they don't yield ideal results, OC becomes significantly less valuable for those users.
If 1 out of 20 teams in an org cannot use the instrumentation system of choice, the negative impact is far greater than 5%.

SergeyKanzhelev · 2018-09-24T17:05:47Z

resource/Resource.md

+
+Two environment variables are used:
+* `OC_RESOURCE_TYPE`: defines the resource type. Leading and trailing whitespaces are trimmed.
+* `OC_RESOURCE_LABELS`: defines resource labels as a comma-seperated list of key/value pairs.


BNF definition would be useful here

SergeyKanzhelev · 2018-09-24T17:08:01Z

resource/Resource.md

+
+### Merging
+As different mechanisms are run to gain information about a resource, their information
+has to be merged into a single resulting resource.


automatically or on demand? It feels that there is a singleton for a resource (to my previous question). Would this singleton be per tracer? Or it should be global?

That would happen explicitly during initialization/setup.

See this PR for the core library extension I proposed (Go).

With that, the only change in general would be an option when setting up the exporter, e.g.:

exporter, err := stackdriver.NewExporter(stackdriver.Options{ ResourceDetector: resource.ChainedDetector( resource.FromEnv, aws.Detect, azure.Detect, gcp.Detect, ), })

Or, with a helper package that chains the common auto-detectors together:

exporter, err := stackdriver.NewExporter(stackdriver.Options{ ResourceDetector: auto.Detect, })

SergeyKanzhelev · 2018-09-24T17:11:10Z

resource/Resource.md

+### Updates
+Over the runtime of an application, resource information may change in some cases. To
+allow exporters to incorporate those changes, they SHOULD take a resource getter as a
+configuration parameter. The exporter SHOULD periodically rerun the getter and incorporate


should there be notification mechanism for subscribers on updates? It will be useful to have if you have an exporter that needs to react on these changes. Otherwise there will be a mess of threads checking for updates from each other

Right now I think a periodic rerun of the detector and comparing with the previous result to detect changes will be sufficient. But there may very well be other use cases I didn't think of.

Note that the exporter-specific part of this spec is largely SHOULD, so there's enough flexibility to deviate/extend where necessary.

heatlth and lifetime management are scenarios that might require more immidiate update of properties.

@SergeyKanzhelev I would like to understand the "heatlth and lifetime management are scenarios". Can you give me more details about what you have in mind.

My personal opinion is to keep simplicity and actually not support runtime config for these in the library at all, unless we have a clear use-case which I miss here, but I can be convinced if you have one in mind.

The key of heatlth and lifetime management is that different modules may contribute into the common state and make decisions based on it. Like if CPU is too high - make sampling more aggressive and report yellow state of the app on this node. Or if exporter failed to upload telemetry with many retries - switch an app to red state and potentially node out of rotation. Or when app was swapped into production from staging - start collecting more metrics.

I agree with you that resource concept like proposed can be implemented if there is a way to set global attributes like proposed in #167. Maybe make it more elaborate and implement it via callbacks to resource auto-discovery modules (without calling them that).

So static and almost static informaiton is not that hard to implement without introducing new concepts (except global attributes or attributes callbacks).

SergeyKanzhelev · 2018-09-24T17:11:56Z

resource/Resource.md

+```
+
+### Updates
+Over the runtime of an application, resource information may change in some cases. To


Some words on how often are required. Is health change or config update notification should be using this API?

Per above, I'd see whether users report use cases.

Prometheus has a similar resource info model (called target info there) – different integrations update via watches/notification or just via periodic refreshes. I cannot recall a single use cases where users complained about periodic refreshes being too laggy.
It may also be interesting to see what (meta)data we added to different integrations over time based on user demand.

Most non-identifying information was added a) for filtering or b) because people wanted to use attach it to the signal directly.
While b) should be discourage, I think OpenCensus must allow for odd one-off use cases since it becomes most valuable, when users can use it for everything.

SergeyKanzhelev · 2018-09-24T17:13:23Z

resource/Resource.md

+vendors, MUST be implemented outside of the core libraries in third party or
+[census-ecosystem][census-ecosystem] repositories.
+
+### Merging


how the type property will be merged?

Already set labels or type fields MUST NOT be overwritten.

fabxc · 2018-09-24T19:03:52Z

Thanks for the detailed feedback @SergeyKanzhelev. I tried to answer all your questions and concerns with some context.

SergeyKanzhelev · 2018-09-24T22:13:05Z

@fabxc thank you! I understand it much better now. Since it's all in-proc I believe more responsive update notification mechanism may be beneficial for health and lifetime tracking of the resource. Also some strongly typed fields might be helpful to simplify writing exporters. Do you think it's something this spec can accomodate?

bogdandrutu · 2018-09-25T01:26:32Z

I personally struggle a bit on this so I would like to clarify the overall picture as I understand it. I think we need two different concepts (we may have overlap between them, that's probably the cause of confusion for me at least):

A notion of a Node (windows/linux process) that tries to uniquely identify the process instrumented with OpenCensus that produces signals. A Node is not required to be human readable (it can contain ip address, pid, etc.).
A notion of a Resource that tries to describe the "owner" of the signal. To clarify this we can say that the producer is always a process (windows/linux) but the owner can be a container/vm/service etc (e.g. vm_cpu_usage metric, this metric is produced by a process but it belongs to the vm resource). A Resource is human readable.

I think we can conclude that one process can produce signals that belong to different resources, so the resource is a property of the signal not a property of the producer task.

There are different use-cases for Resource and Node, here are some examples:

Signal aggregation by Resource. We believe that it is very useful to aggregate signals by the Resource (e.g. show me all traces for this k8s_container, or show me vm_cpu_ussage breakdown by vm_instance, or show me all logs for this k8s_container). Conclusion: this information is useful for the back-end to know about every exported signal.
Use Node to uniquely identify signals that belong to the same Resource. An example can be a k8s_container that runs multiple process and they export the same signal (e.g. http_server_latency metric). Conclusion: this information is required when the Resource does not uniquely identify a signal.
Use Node for configuration push. If we develop a central service that allows users to configure all running process instrumented with OpenCensus then Node can be used to identify all these tasks. I think here is where my confusion is: if the Node has only host, pid, etc. it is not easy to be consumed by the users of this service, it will be nice to allow them to say "set this sampling probability for this service". Conclusion: We may be tempted to say that Node also belongs to a Resource.

Questions:

Should we have these as separate concepts? Based on the use-cases that I thought of they are always used together.
If they are separate concepts (which kind of I like because they describe two different things about the signal), should the Node include a Resource to facilitate use-case 3?

I really hope that I did not confused people more, and I would like to here others opinion about this.

SergeyKanzhelev · 2018-09-25T05:37:11Z

@bogdandrutu it makes sense. If you think of resources as a way to design this type of hierarchy - there are also layers like role (set of endpoints - like "health", "admin") and endpoint itself.

It also calls to association of an individual span with the resource. Not for exporter to pull data from resources. Is it what you are thinking about?

Signed-off-by: Fabian Reinartz <[email protected]>

fabxc · 2018-10-09T17:48:56Z

Added some clarifications around the allowed character set (it's more restrictive now and matches tags).

@SergeyKanzhelev I changed the section on updates following a discussion with @mayurkale22 and @bogdandrutu to discourage usage of any mutable attributes in resource labels. On the one hand this is definitely the safe route and we can always lift restrictions based on the future feedback. On the other hand, it seems like mutable runtime information like health serves a quite different purpose from the resource feature discussed here.
Thus it's probably a good idea to think about it as an orthogonal feature in OC as well. Would you like to create a separate feature request issue so we can discuss use cases and potential implementation there?

bogdandrutu · 2018-10-09T21:14:15Z

resource/README.md

+This documentation serves to document the "look and feel" of the open source resource package.
+It describes they key types and overall behavior.
+
+The primary purpose of resources as a first-class concept in the core library is decoupling


Please resolve this. I think it is important to specify what a "resource" is, maybe include examples.

bogdandrutu · 2018-10-09T21:16:10Z

resource/Resource.md

+`Resource` MUST have getters for retrieving all the information used in `Resource` definition.
+
+Example in Go:
+```go


Maybe add a TODO to add a link to the proto definition.

bogdandrutu · 2018-10-09T21:16:43Z

resource/Resource.md

+type Detector func(context.Context) (*Resource, error)
+
+// Returns a detector that always returns a specific resource.
+func NewDetectorFromResource(*Resource) Detector


I think this is more like a helper method. Not necessary to have it in the specs.

bogdandrutu · 2018-10-09T21:17:28Z

resource/Resource.md

+func NewDetectorFromResource(*Resource) Detector
+
+// Returns a detector that runs all input detectors sequentially and merges their results.
+func ChainedDetector(...Detector) Detector


Chained does not sound very good to me. Maybe ResolveResourceFromDetector or maybe you have better suggestions.

Mh, it doesn't resolve a resource though but rather just turns a list of Detectors into a single Detector again that has to be called first.

Best alternative I can think of is CombinedDetector.

bogdandrutu · 2018-10-09T21:18:20Z

resource/Resource.md

+```
+
+### Updates
+Over the runtime of an application, attributes of a resource may change in some cases.


Here this sentence implies we support dynamic changes of the resource fields. Or maybe I am just wrong in understanding this.

bogdandrutu

@SergeyKanzhelev what is your opinion on this?

LGTM for me with small comments sent earlier.

Signed-off-by: Fabian Reinartz <[email protected]>

bogdandrutu · 2018-10-11T01:20:30Z

@SergeyKanzhelev I would like you to take a final look at this and also here where we define some examples:
https://github.com/census-ecosystem/opencensus-go-resource/pull/1/files#diff-53eec5f6ec725746ddb5c8b1c394865d

fabxc · 2018-10-23T09:53:11Z

@SergeyKanzhelev did you have a chance to take a look at this?

SergeyKanzhelev · 2018-10-29T19:07:37Z

@fabxc thank you for addressing feedback. Looks good to me. Sorry for delay

rakyll reviewed Aug 23, 2018

View reviewed changes

songy23 reviewed Aug 23, 2018

View reviewed changes

songy23 requested review from bogdandrutu, semistrict and rghetia August 23, 2018 17:03

rghetia reviewed Aug 23, 2018

View reviewed changes

yurishkuro reviewed Aug 24, 2018

View reviewed changes

nfisher reviewed Aug 27, 2018

View reviewed changes

rakyll mentioned this pull request Aug 30, 2018

pkg/monitoredresource.Autodetect(): causes me to vendor aws-sdk-go census-ecosystem/opencensus-go-exporter-stackdriver#35

Closed

rakyll approved these changes Aug 30, 2018

View reviewed changes

fabxc force-pushed the resource branch from 704e08c to c7907c1 Compare August 31, 2018 09:14

fabxc force-pushed the resource branch from c7907c1 to dfe8282 Compare September 7, 2018 08:51

bogdandrutu reviewed Sep 21, 2018

View reviewed changes

SergeyKanzhelev suggested changes Sep 24, 2018

View reviewed changes

songy23 mentioned this pull request Sep 25, 2018

trace, metrics: add Resource census-ecosystem/opencensus-go-exporter-ocagent#17

Closed

songy23 mentioned this pull request Oct 1, 2018

opencensusexporter/stackdriver: Node should translate to unique opencensusTaskKey census-instrumentation/opencensus-service#46

Closed

rghetia mentioned this pull request Oct 5, 2018

Add GetMonitoredResource option census-ecosystem/opencensus-go-exporter-stackdriver#46

Merged

Fabian Reinartz added 4 commits October 9, 2018 13:44

Add resource specs

cb9ff91

Signed-off-by: Fabian Reinartz <[email protected]>

Address comments

dbf34cb

Add clarifications

9a5136e

Signed-off-by: Fabian Reinartz <[email protected]>

Add clarifications

ba58394

Signed-off-by: Fabian Reinartz <[email protected]>

fabxc force-pushed the resource branch from e4fd641 to ba58394 Compare October 9, 2018 17:45

bogdandrutu reviewed Oct 9, 2018

View reviewed changes

bogdandrutu approved these changes Oct 9, 2018

View reviewed changes

Address comments

61148fd

Signed-off-by: Fabian Reinartz <[email protected]>

SergeyKanzhelev approved these changes Oct 29, 2018

View reviewed changes

Merge branch 'master' into resource

c553e9d

fabxc merged commit ed99dcb into census-instrumentation:master Oct 30, 2018

fabxc deleted the resource branch October 30, 2018 10:45

odeke-em mentioned this pull request Nov 2, 2018

all: add a section describing the resource entity census-instrumentation/opencensus-website#416

Open

		@@ -0,0 +1,10 @@
		# OpenCensus Library Resource Package
		This documentation serves to document the "look and feel" of the open source resource package.

		@@ -0,0 +1,133 @@
		# Resource API Overview
		The resource library primarily defines a type that captures about information about the entity


		For example, process-identifying information may be populated through the library while
		an agent attaches further labels about the underlying VM, the cluster, or geo-location.

Add resource specs #162

Add resource specs #162

Conversation

fabxc commented Aug 23, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nfisher commented Aug 24, 2018 • edited Loading

i. metadata

ii. environment variables

ii. config file(s)

Choose a reason for hiding this comment

fabxc Aug 24, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yurishkuro commented Aug 24, 2018

fabxc commented Aug 27, 2018

fabxc commented Aug 27, 2018

nfisher commented Aug 27, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fabxc commented Aug 29, 2018

fabxc commented Aug 31, 2018 • edited Loading

rakyll commented Aug 31, 2018

nfisher commented Aug 31, 2018

nfisher commented Aug 31, 2018

fabxc commented Sep 6, 2018

bogdandrutu left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SergeyKanzhelev left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fabxc Sep 24, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fabxc commented Aug 23, 2018 •

edited

Loading

nfisher commented Aug 24, 2018 •

edited

Loading

fabxc Aug 24, 2018 •

edited

Loading

fabxc commented Aug 31, 2018 •

edited

Loading

fabxc Sep 24, 2018 •

edited

Loading