Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should file configuration merge environment variable configuration? #3752

Closed
jack-berg opened this issue Oct 31, 2023 · 126 comments · Fixed by #3948
Closed

Should file configuration merge environment variable configuration? #3752

jack-berg opened this issue Oct 31, 2023 · 126 comments · Fixed by #3948
Assignees
Labels
[label deprecated] triaged-accepted [label deprecated] Issue triaged and accepted by OTel community, can proceed with creating a PR spec:miscellaneous For issues that don't match any other spec label

Comments

@jack-berg
Copy link
Member

jack-berg commented Oct 31, 2023

The conversation about whether file configuration should completely ignore the sdk environment variable scheme came up in #3744, but that PR doesn't actually contain any language related to this.

The original file configuration OTEP stated:

Interpret the configuration model and return SDK TracerProvider, MeterProvider, LoggerProvider which strictly reflect the configuration object's details and ignores the opentelemetry environment variable configuration scheme.

As mentioned here, file configuration doesn't actually contain language describing this behavior. It was included originally included in #3437 but was lost in the PR review shuffle - accidentally, not in response to feedback.

@tedsuo argues in favor file configuration respecting env vars with:

The common expectation among developers is that env vars will automatically overwrite config parameters. If we do it the other way, I am concerned that until the end of time we will have a steady stream of users lodging issues about this and then becoming very frustrated when we explain that they need to modify the config file to use an env var.

The reason that these users will be upset is that they are in a situation where having to define the env vars in the config file is a non-starter. Their use case is that for some reason they either can't get at or are not allowed to modify the config file template, but they really need to override a parameter.

For reference, this is an issue that has come up on many OSS projects I have been involved with where it is common to have both operators and application developers wanting to configure the same thing. Often it's some kind of emergency situation where the person with the rights to change the config file is unavailable.

I should note that of course the opposite situation could be true, where for some reason you want to disable an env var but don't have access to it. But it's probably easier to give users the ability to disable an env var via the config file than it is to give them the ability to disable a config parameter via an env var.

@MrAlias argues in favor of ignoring env vars with:

To be fair, from experience, you're going to get people complaining either way. If you choose environment variable priority over a configuration users will complain that their deployment was altered and failed when an environment variable was set that took precedence.

However, if you make environment variables take precedence you will also need to make some pretty sever and subjective choices on how they are mapped to a config. Do the BSP environment variables apply to all batch span processors or just one? Is the sampler environment variable use for all tracer providers in the config, even ones that specify alternate samplers? Should propagators be merged or overridden? If an exporter is defined by environment, does that stop the console logging exporter used for debugging as well as all the other exporters defined in configuration?

Ultimately, I think the current changes are going to be the most appropriate. They allow users to make their own choice in precedence without making subjective choices for them on how to map things. If a user want environment variables to take precedence, all they need to do is use the OTel environment keys in the related parts of their configuration. In doing so they will answer each of the above questions their own way.

@trask supports the feeling of users expecting env vars to override file configuration, but also says merging configuration from multiple sources is hard:

This is my feeling as well, especially for things like:

OTEL_SDK_DISABLED
OTEL_RESOURCE_ATTRIBUTES
OTEL_SERVICE_NAME
OTEL_LOG_LEVEL
OTEL_PROPAGATORS
OTEL_TRACES_SAMPLER_ARG
I totally understand the nightmare that is merging configuration from multiple sources though.

I wonder if we would have created many of the other env vars (e.g. OTEL_BSP_, OTEL_BLRP_) if we had configuration file support from the beginning? And if so, maybe we can deprecate those other env vars in favor of configuration file?

This topic came up several times during the lengthly review of the file configuration OTEP. Below are links to a number of and relevant points:

open-telemetry/oteps#225 (comment)

Layering of config as described below would make it more difficult to reason about what my program is actually being run with.

Additionally, perhaps give them a helper config file that uses env var substitution in the right places so that they can migrate easily (and still get a warning until they get rid of env variables and move everything to the config file).

open-telemetry/oteps#225 (comment)

What about 'Solution 3: fail when both environment configuration and file configuration are present'?

I think we could log a warning when we detect this, but failing is too strong. Consider the implications if a user is operating in an environment they don't fully control (i.e. where an ops team configures environment variables by default which they extend / layer on top of).

open-telemetry/oteps#225 (comment)

I have a (rather strong) opinion that setting set via env vars has higher priority (takes precedence) over a setting set via configiuration file and it should be marked as a goal.

Any scheme where environment variables have priority over a config file will require some sort of standard mapping between the environment variable schema and file config scheme. IMO, its impossible to define such a mapping which is intuitive in all cases, so better not to try.

Nothing is forcing users to use file based config - its opt in. If they do opt in, they're opting into the documented behavior in which the config file represents the source of truth for configuration. If they wish to customize the experience with additional layers / overrides, they have a couple of tools:

  1. They can use the fact the Configure(config) API accepts a config model as an argument, and provide their own customizations to the model after the initial parsing of the file via Parse(file). An example of such a customization would be to interpret environment variables and apply them to the model in a way they decide makes sense.
  2. They can use environment variable substitution to reference environment variables directly in a configuration file.

Update 3/15/2024

The current state of this issue is:

  • The configuration working group proposed and executed a process to move forward after several months of failing to reach consensus
  • As a part of that process, the configuration maintainers reviewed proposals and have recommended moving forward with option 2.b as described in this comment
  • As a part of that process, people who disagree with the recommendation can / should escalate to a TC decision

Update 3/28/2024

Please see this comment updating the status of the issue:

Per @tedsuo’s request, we discussed this issue in the 3/24/27 TC meeting and have made a decision:
Generally, we will follow @trask's comment, proceeding with this PR with a few changes:

  • Rename OTEL_CONFIG_FILE to OTEL_EXPERIMENTAL_CONFIG_FILE, reflecting the fact that the semantics around how the value of env var are subject to breaking changes as the file configuration spec and schema continue to evolve.
  • Ensure that env vars which don’t interop with file config are deprecated when file config is ready for stabilization, reflecting that we do not want to recommend multiple competing configuration stories. This could be ensured via an explicit note in the markdown, or a blocking issue - both achieve the same effect. Deprecate environment variables which do not interoperate with file config when ready for stabilization #3967
  • Ensure that file config has an interop where platforms (i.e. Azure functions, otel operator, etc) contribute to config. We should proceed with #3948 without being prescriptive about how that mechanism works. In the TC meeting, 4 distinct solutions were discussed which had different tradeoffs and limitations. It is clear that we still need to learn more about the requirements and constraints of this use case and let the findings inform the solution. The config working group should prioritize this discussion, but an answer shouldn’t block this PR. We should open a new issue to track the requirements and discuss solutions, and ensure that we treat that issue as blocking for any sort of stabilization effort (although it should ideally be solved much sooner). Ensure that file config has solution for platforms which contribute to config #3966
@jack-berg jack-berg added the spec:miscellaneous For issues that don't match any other spec label label Oct 31, 2023
@jack-berg
Copy link
Member Author

My point of view is that while I understand that there is precedence in other systems to have environment variables override other configuration sources, we shouldn't do it. The environment variable configuration scheme does not map cleanly to file configuration, leading to unintuitive behavior when trying to merge. A partial mapping of some of the environment variables is also unintuitive. If we give support environment variable substitution and default values (#3744) that will allow us to provide a file configuration "template" to users as a starting point, complete with comments and env var substitution references to the environment variable scheme with default fallbacks. Note this was the conclusion of OTEP conversation open-telemetry/oteps#225 (comment).

What's not to like? Users start out with the template that essentially is performing the merging of the environment variable scheme with the file configuration scheme (where it makes sense). The template has comments which reinforce that if they delete the env var substitution references, those environment variables will not be considered during parse / create. This should make migrating from environment variable schema to file configuration smooth, while also allowing the implementation to have simple intuitive rules that are easy to explain and which are logically consistent. While some users will inevitably ignore the documentation and expect environment variables to trump file configuration anyway, they'll surely be able to understand the motivation when pointed to the docs.

@yurishkuro
Copy link
Member

yurishkuro commented Oct 31, 2023

I agree with @jack-berg (#3752 (comment)). It's better to have a clean break in both user interface and SDK components interface: components should no longer support env vars directly, they should always take parameters from a config object, and the config object can support env vars as placeholders, but without any semantic meaning attached to their names (it's up to the user to pick those names). It is minimalistic design with clean encapsulation and separation of responsibilities. In that sense env vars do override config file, but only because user explicitly writes them this way, not because of some magic mapping between env var name and the exact position in the config.

Having said that, before we go this way we need to have a very clear migration strategy to avoid introducing breaking changes. The default config template that pulls in all the env vars already defined in the spec is a good approach. What's not clear to me there is whether it requires support for conditionals, because, for instance, in order to have a place in the config to refer to some var related to OTLP exporter, we need to know that OTLP exporter is what the user actually wants to use. It may be possible to accommodate using the declare/use separation used in the Collector config, i.e. the template config will always have a section for OTLP exporter (and all its env vars), but the tracer config portion may not reference the exporter as "I want to use it".

So basically I think we need a working prototype of the config to validate that this approach is workable.

Another migration concern is whether it can be incremental - there are many components in the SDK and requiring they all to be upgraded to config object style of initialization (vs env vars) before anything can be released is going to be a problem. The above approach with a default config template seems like it could be incremental.

One more open question - how such default config template will play with the ability to minimize runtime dependencies of the SDK?

@jsuereth jsuereth added the [label deprecated] triaged-accepted [label deprecated] Issue triaged and accepted by OTel community, can proceed with creating a PR label Nov 15, 2023
@jsuereth jsuereth assigned jack-berg and unassigned yurishkuro Nov 15, 2023
@brunobat
Copy link
Contributor

brunobat commented Jan 18, 2024

Please beware that popular frameworks have their config systems in place for years, creating yet another config file might be nice from the OTel point of view but not for the frameworks that already instaciate an OTel SDK of their own.

Placing all possible configs in a OTel config will require frameworks to scan for services there, as example, if we want them to work on native mode.
Frameworks already do many other things like, providing defaults, validations and value transformations.
All this is already done and would be bypassed by the new OTel Config file.

From my point of view, the properties supplier from the SDK must have the highest priority, higher then the OTel config file.
More, there shouldn't be any configuration of the SDK that cannot be performed by using the properties supplier.

In relation to the env. vars., in all frameworks I can remember, they take precedence over any other kind of configuration. OTel shouldn't be special.

@jack-berg
Copy link
Member Author

From my point of view, the properties supplier from the SDK must have the highest priority, higher then the OTel config file.
More, there shouldn't be any configuration of the SDK that cannot be performed by using the properties supplier.

These are java specific concepts @brunobat. Can you generalize this feedback to the spec level?

@jack-berg
Copy link
Member Author

Please beware that popular frameworks have their config systems in place for years, creating yet another config file might be nice from the OTel point of view but not for the frameworks that already instaciate an OTel SDK of their own.
Frameworks already do many other things like, providing defaults, validations and value transformations.
All this is already done and would be bypassed by the new OTel Config file.

Popular frameworks have their own config systems. That's great - if those frameworks want to make opentelemetry a first class citizen, they can evolve those frameworks to be able to able to configure the things opentelemetry users expect (the flat scheme provided by environment variables very quickly runs into problems expressing very real config scenarios). But not all users buy into these frameworks - do we leave these users behind, only providing a suboptimal environment variable schema?

File configuration isn't and doesn't need to be the only configuration option. It doesn't erase the environment variable scheme, or programatic configuration which enables all sorts of alternative configuration mechanisms like those provided by frameworks. But we do need a language agnostic way to fully express the desired configuration of an SDK.

Is the request to do something different with file configuration or to not have a file configuration option at all?

Placing all possible configs in a OTel config will require frameworks to scan for services there, as example, if we want them to work on native mode.

This appears to be no different than the problem we face today in opentelemetry-java with the SPIs to load custom exporters. With opentelemetry-sdk-extension-autoconfigure, you can set OTEL_TRACES_EXPORTER=foo, and if there is a ConfigurableSpanExporterProvider implementation on the classpath corresponding to foo, autoconfigure will use it to create the corresponding SpanExporter.

In relation to the env. vars., in all frameworks I can remember, they take precedence over any other kind of configuration. OTel shouldn't be special.

Below I've listed several examples where trying to merge environment variables with file configuration yields unintuitive / unexpected results. For me to take this request seriously, I need to see proposals on how these situations would be resolved:

Simple Priority

We have the same options in env vars and config file, but different values. Which wins? Easy enough - environment variables always win. Although, we'll no doubt have users complaining that what they see in their config file isn't reflected in reality.

Env vars

OTEL_TRACES_EXPORTER=otlp
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317

File config

tracer_provider:
  processors:
    - batch:
         exporter:
           otlp: 
             endpoint: http://some-other-endpoint:4317

Conflict example 1

We have env variable information which can be merged with file configuration such that both are true. In this case, env variables specify to use the OTLP exporter, while file configuration says zipkin. One person might expect to see the OTLP exporter overwrite zipkin. Another might expect to see spans exported to both OTLP and zipkin. Yet another might expect to see only zipkin.

Env vars

OTEL_TRACES_EXPORTER=otlp

File config

tracer_provider:
  processors:
    - batch:
         exporter:
           zipkin: http://localhost:9411/api/v2/spans

Conflict example 2

In this case we have configuration which can be merged, but doing so will almost certainly yield the wrong result. The environment variables specify the OTLP trace endpoint, assuming the http/protobuf default protocol, which includes the path. The config file specifies grpc protocol. If the environment variable takes priority over file configuration, http/protobuf variant of the endpoint specified via environment variable will be used to configure the OTLP grpc exporter specified in file config, causing an error.

Env vars

OTEL_TRACES_EXPORTER=otlp
OTEL_EXPORTER_OTLP_TRACES_ENDPOINT=http://some-other-endpoint:4318/foo/bar/v1/traces

File config

tracer_provider:
  processors:
    - batch:
         otlp:
             endpoint: http://some-endpoint:4317
             protocol: grpc

Conflict example 3

In this case we see how the flat nature of the environment variable scheme falls short when trying to merge with the structure of a config file. The config file specifies that the sdk should export spans to two different endpoints, but an environment variable is specified which sets the OTLP endpoint to something else. If you override both from the config file you're almost certainly not doing what the user would want. And if you change the effective config to result in a single OTLP exporter with the endpoint in the environment variable, you're also almost surely not doing what the user wants. Presumably, they want to override one of the endpoints in the config file, but its impossible to know that they really want this, or which in the config file to override.

Env vars

OTEL_TRACES_EXPORTER=otlp
OTEL_EXPORTER_OTLP_ENDPOINT=http://yet-another-endpoint:4318

File config

tracer_provider:
  processors:
    - batch:
         otlp:
             endpoint: http://some-endpoint:4318
    - batch:
         otlp:
             endpoint: http://some-other-endpoint:4318

I can keep listing examples, but the point is that I don't know of any strategy for overlaying environment variables on top of a configuration file that will make sense to most users (yet alone all) in most cases. Instead, we should aim to give simple primitives which can be combined to accommodate simple and complex use cases:

  • Provide an in memory representation of a config model
  • Provide an operation to parse a config file to an in memory config model
  • Provide an operation to create a configured SDK from an in memory config model
  • Provide a way to perform environment variable substitution in a config file
  • Provide an environment variable with a value pointing to a config file, which triggers file configuration

Most uses will will be fine just saying OTEL_CONFIG_FILE=/path/to/sdk-config.yaml, and be content with the simply ability to do environment variable substitution in a config file.

Users with complex requirements can combine these primitives in all sorts of interesting way:

  • produce an in memory config model from some alternative source
  • define merge logic for combining multiple in memory config models
  • define logic for overlaying the environment variable scheme on top of a in memory config model in a way that makes sense for them

@brunobat
Copy link
Contributor

These are java specific concepts @brunobat. Can you generalize this feedback to the spec level?

Yes, will send a PR to the sdk-configuration file.

The spec actually doesn't define any priority over config methods, just states "The SDK MUST provide a programmatic interface for all configuration". I agree with the statement.
This can be interpreted as the file config is just another SDK configuration method and shouldn't take precedence over the base generic programatic interface, which in the Java case is represented by the properties supplier.

The OTel file config must use the base generic programatic interface and shouldn't prevent the use of other existing or future configuration systems of the SDK... Which seems totally reasonable.

@brunobat
Copy link
Contributor

Simple Priority

We have the same options in env vars and config file, but different values. Which wins? Easy enough - environment variables always win. Although, we'll no doubt have users complaining that what they see in their config file isn't reflected in reality.

Sure someone will complain, but overriding configs defined in files with env. vars. is widely accepted and common place in the container world.
It can be decided either way if properly documented, however not giving higher priority to env. vars. nowadays seems counter intuitive and agains the industry practice.

Conflict example 1

In this case the exporter should be OTLP and the endpoint the one specified in the file. Note that the endpoint shouldn't be specific to zipkin, but owned by the exporter.

These overrides happen all the time in a microservice. Files define the base config and env. vars. define the exceptions to particular attributes. I acknowledge that there can be many overrides.

Conflict example 2

Yes, the resulting merged config doesn't make sense, but the error can also happen with the file based config. Validation will be always needed and should be implemented to save people's time.

Conflict example 3

This is an excellent example of things that cannot be properly configured by env. vars. with the current abstraction, which leads me to the main point....

I think we are mixing the definition of a "programmatic interface for all configuration" with the definition of the file based configuration. This makes the programatic interface effectively useless for other configuration systems because the file config takes precedence and will include things that cannot be done in any other way.
The programmatic interface should be independent if we want to properly configure things with files, env. vars. and other config systems, as discussed above.

We should have a generic reusable programmatic interface for the config, a builder pattern for it, and only then the file format configuration for it. Merging and validating the config attributes should also be a task for the builder.

@jack-berg
Copy link
Member Author

I think we are mixing the definition of a "programmatic interface for all configuration" with the definition of the file based configuration. This makes the programatic interface effectively useless for other configuration systems because the file config takes precedence and will include things that cannot be done in any other way.

Why does file configuration make the programmatic interface effectively useless?

File configuration is an abstraction that is built on top of the programmatic configuration interface. It will be natural to package file configuration in a separate artifact than the core SDK components it configures. By definition, it can not be more expressive than the programmatic interface since ultimately all options need to be translated to programmatic equivalents. The proposal in #3805 in which file configuration takes priority over the environment variable scheme is only true if the user opts into it by: 1. Including the necessary file configuration artifact. 2. Specifying OTEL_CONFIG_FILE=...

A framework that has its own configuration system can avoid file configuration entirely by not including the artifact, and / or by providing equivalent functionality. I.e. the framework provides its own format and schema for specifying configuration, and provides logic which parses, validates that configuration and uses the programmatic configuration interface to produce an SDK according to the configuration.

We should have a generic reusable programmatic interface for the config, a builder pattern for it, and only then the file format configuration for it.

💯 That's exactly what's happening.

@brunobat
Copy link
Contributor

brunobat commented Jan 22, 2024

💯 That's exactly what's happening.

That's excellent.

Why does file configuration make the programmatic interface effectively useless?
From what's written in the spec, nothing. However, when discussing the particular java implementation of the file configuration it was mentioned that some configs would only be available through the file config.

If all configurations will be available programmatic interface, which in the Java implementation case is represented by the properties supplier, I'm ok.

@jackshirazi
Copy link

If I have 2 different envs, say test and prod, does the file spec allow for me to specify an env var for that and effectively create 2 different configs from the one file, or are there restrictions which would mean for some things I would have to have 2 different files?

Obviously for the existing description I can provide 2 different files and select one in OTEL_CONFIG_FILE, but my use case here is where I hold a single central template file (thinking opamp in the future) and I want to setup config from that one template based on which env the agent runs

@jackshirazi
Copy link

jackshirazi commented Jan 22, 2024

As for the overall question, I've seen this page reading like this for a long time now.

So it would be surprising to me that suddenly OTEL_SERVICE_NAME (to choose just the top one there) no longer works when I - or anyone else in the chain of operators who could be involved in setting up my application+agent - set a file config. Yes, I agree it's opt-in so I should know that using the file means the env var is then ignored. It's still surprising. And we generally prefer the principle of least surprise. I would tend towards merging

For the conflict objections, I would do the simplest merge, and with an option to output all values, it's straightforward to debug. So yes there would be conflicts, but they would mostly be easily caught in dev/test and resolved.

What we see from our customers is that they like to use different configuration capabilities for different things. File config as a base that they can distribute easily. Environment to customize that. Central config for ease of changing config across many different applications (especially for dynamically adjustable options). The current proposal is to implement the file config so it accepts environment, but not the ones already supported, which means that for the existing deployments they already have configured, the file config would need to explicitly include each of those. That is, somewhere in my file config I'll need to have the service name defined and to use ${OTEL_SERVICE_NAME} - otherwise I either can't use file config or I have a painful adjustment in my systems to define a bunch of new things.

As I write this, I'm thinking maybe a reasonable compromise is to provide a file config with all those variables already defined in the file as a template. Of course that doesn't cover all situations and effectively adds boilerplate which is another anti-pattern, but it would be something worth doing if we stay with no merging

@trask
Copy link
Member

trask commented Jan 22, 2024

I think it's going to be continually surprising to users that all of the standard OTEL_* environment variables are ignored as soon as you introduce a yaml configuration file (e.g. to configure metric views).

At least in the Java world, it's super standard (and people expect) for env vars to override configuration files (and not in an all-or-nothing way).

I appreciate that this problem may not have a perfect solution, but I'd like to explore a compromise that could be less surprising to users.

For each env var we could define the minimal affect it has on the config file (as opposed to an all-or-nothing approach).

OTEL_TRACES_EXPORTER is probably the most complex, and so let's see what we could do for it first.

We could say that OTEL_TRACES_EXPORTER=abc means drop other exporters besides abc, while adding abc exporter with default configuration if there is not already at least one abc exporter present. (Another option is to drop all exporters including abc, and add abc exporter with the default configuration, but I think that would be more surprising because it drops all existing configuration of the abc exporter).

OTEL_EXPORTER_OTLP_TRACES_ENDPOINT is simpler, and we could say it overrides every instance in the config file of tracer_provider.processors.batch.exporter.otlp.endpoint.

Applying these rules to @jack-berg's examples:

Conflict example 1

OTEL_TRACES_EXPORTER=otlp

and

tracer_provider:
  processors:
    - batch:
        exporter:
          zipkin:
            endpoint: http://localhost:9411/api/v2/spans

would result in

tracer_provider:
  processors:
    - batch:
        exporter:
          otlp:

Conflict example 2

OTEL_TRACES_EXPORTER=otlp
OTEL_EXPORTER_OTLP_TRACES_ENDPOINT=http://some-other-endpoint:4318/foo/bar/v1/traces

and

tracer_provider:
  processors:
    - batch:
        exporter:
          otlp:
            endpoint: http://some-endpoint:4317
            protocol: grpc

would result in

tracer_provider:
  processors:
    - batch:
        exporter:
          otlp:
            endpoint: http://some-other-endpoint:4318/foo/bar/v1/traces
            protocol: grpc

Note: this is probably an incorrect configuration (using an http/protobuf endpoint and grpc protocol), but I think it's probably the least surprising merge given the user only used env var to override the endpoint and not the protocol.

Conflict example 3

OTEL_TRACES_EXPORTER=otlp
OTEL_EXPORTER_OTLP_TRACES_ENDPOINT=http://yet-another-endpoint:4318

and

tracer_provider:
  processors:
    - batch:
        exporter:
          otlp:
            endpoint: http://some-endpoint:4318
    - batch:
        exporter:
          otlp:
            endpoint: http://some-other-endpoint:4318

would result in

tracer_provider:
  processors:
    - batch:
        exporter:
          otlp:
            endpoint: http://yet-another-endpoint:4318
    - batch:
        exporter:
          otlp:
            endpoint: http://yet-another-endpoint:4318

Note: this is probably an incorrect configuration (having two otlp exporters pointing to the same endpoint), but again I think it's probably the least surprising merge, and therefore probably the best.

As @jackshirazi says above:

For the conflict objections, I would do the simplest merge, and with an option to output all values, it's straightforward to debug. So yes there would be conflicts, but they would mostly be easily caught in dev/test and resolved.

@jack-berg
Copy link
Member Author

Its not perfect but I could get on board with that merge logic. If we were to do this, we should give the user a way to understand what the resolved configuration model looks like. The natural thing to do is to log out the resolved model after applying environment variable overlays, but considering it may have secrets in it, we wouldn't always be able to do that. Instead, we could:

  • Detect if any environment variables were present and if so, log a warning message indicating the resolved configuration is not what was specified in the configuration file.
  • Provide details in the log message on how to opt-in to printing the full resolved configuration (akin to otel.javaagent.debug=true) where the user assumes the risk of any secrets that may be printed

@ocelotl
Copy link
Contributor

ocelotl commented Jan 23, 2024

There was something else we mentioned in the SIG meeting. Let's think about what happens right now for any implementation that does not have the configuration prototype:

Let's say that these environment variables are defined (I'll use here the same example @jack-berg used during the SIG meeting):

OTEL_TRACES_EXPORTER=otlp
OTEL_EXPORTER_OTLP_ENDPOINT=http://yet-another-endpoint:4318

And the user has this code in the application:

SdkTracerProvider.builder()

        .addSpanProcessor(BatchSpanProcessor.create(OtlpGrpcSpanExporter.create("http://some-endpoint:4318")))

        .addSpanProcessor(BatchSpanProcessor.create(OtlpGrpcSpanExporter.create("http://some-other-endpoint:4318")))

When that combination of environment variables and application code are executed something happens, let's give this something a name: X.

Now, the code above is equivalent to some configuration file, because when a certain configuration file is used, we also get X.

If we think about this project for a moment, we can also see it not as "configuration of OpenTelemetry" but as "configuration of a script that creates some certain OpenTelemetry objects before the application is run". Every configuration file makes this script be execute in a certain way whose results could also be achieved by executing some certain code directly in the application.

So, we can see this problem in this way:

What happens when OTel is executed when there are environment variables set and also a configuration file?

And the answer to that question could be:

The same that would happen if the equivalent objects that are created with the configuration file would have been instantiated in the application with the same environment variables set.

I think users could understand this configuration project better if we present it not as configuration of OpenTelemetry in the way that environment variables configure OpenTelemetry, but as a way for them to define which and how certain components are instantiated before the rest of the application runs.

@ocelotl
Copy link
Contributor

ocelotl commented Jan 23, 2024

I gave this issue more thought and now I think that this could be not an issue at all. Instead of trying to find an algorithm to either merge or override or something else, let's just tell the users, this is what the configuration file component would end up instantiating if you run OpenTelemetry with the environment variables that are set right now. To do this, I propose we add to the configuration file component an option (maybe named dry_run or something similar) that instead of running anything would just print the code that would instantiate the same objects that would be instantiated by the configuration file component if it was executed normally. In this way, the user can adjust their environment variables and find out the result that changing them would have on the instantiated objects and we don't have to figure out any algorithm to solve conflicts between the environment variables and configuration files.

I implemented an example that partially prints the code that would correspond to an instantiation of a tracer provider, running this produces this:

TracerProvider(
    Sampler(
        always_off=None,
        always_on=None,
        jaeger_remote=None,
        parent_based=ParentBasedSampler(
            local_parent_not_sampled=LocalParentNotSampledSampler(
                always_off=None,
                always_on=None,
                jaeger_remote=None,
                parent_based=ParentBasedSampler(
                    local_parent_not_sampled=None,
                    local_parent_sampled=None,
                    remote_parent_not_sampled=RemoteParentNotSampledSampler(
                        always_off=None,
                        always_on=None,
                        jaeger_remote=None,
                        parent_based=None,
                        trace_id_ratio_based=ParentBasedTraceIdRatio(
                            TraceIdRatioBased(
                                0.0001
                            ),
                            StaticSampler(
                                Decision(
                                    2
                                )
                            ),
                            StaticSampler(
                                Decision(
                                    0
                                )
                            ),
                            StaticSampler(
                                Decision(
                                    2
                                )
                            ),
                            StaticSampler(
                                Decision(
                                    0
                                )
                            ),
                        ),
                    ),
                    remote_parent_sampled=None,
                    root=None,
                ),
                trace_id_ratio_based=None,
            ),
            local_parent_sampled=LocalParentSampledSampler(
                always_off=None,
                always_on=StaticSampler(
                    Decision(
                        2
                    )
                ),
                jaeger_remote=None,
                parent_based=None,
                trace_id_ratio_based=None,
            ),
            remote_parent_not_sampled=RemoteParentNotSampledSampler(
                always_off=StaticSampler(
                    Decision(
                        0
                    )
                ),
                always_on=None,
                jaeger_remote=None,
                parent_based=None,
                trace_id_ratio_based=None,
            ),
            remote_parent_sampled=RemoteParentSampledSampler(
                always_off=None,
                always_on=StaticSampler(
                    Decision(
                        2
                    )
                ),
                jaeger_remote=None,
                parent_based=None,
                trace_id_ratio_based=None,
            ),
            root=RootSampler(
                always_off=None,
                always_on=None,
                jaeger_remote=None,
                parent_based=None,
                trace_id_ratio_based=ParentBasedTraceIdRatio(
                    TraceIdRatioBased(
                        0.0001
                    ),
                    StaticSampler(
                        Decision(
                            2
                        )
                    ),
                    StaticSampler(
                        Decision(
                            0
                        )
                    ),
                    StaticSampler(
                        Decision(
                            2
                        )
                    ),
                    StaticSampler(
                        Decision(
                            0
                        )
                    ),
                ),
            ),
        ),
        trace_id_ratio_based=None,
    ),
    Resource(
        reprOrderedDict(
            [
                (
                    "telemetry.sdk.language",
                    "python",
                ),
                (
                    "telemetry.sdk.name",
                    "opentelemetry",
                ),
                (
                    "telemetry.sdk.version",
                    "1.23.0.dev0",
                ),
                (
                    "service.name",
                    "unknown_service",
                ),
            ]
        ),
        schema_url="https://opentelemetry.io/schemas/1.16.0",
    ),
)

@Emily-Jiang
Copy link

Emily-Jiang commented Jan 23, 2024

My big concern with having the yaml file overrule all of the configuration:

  1. Yaml is not as easy as defining an environment variable.
  2. If you use the environment variables as well as the yaml file, you have not duplicate the environment variables in the yaml file. Otherwise, the variables will be get ignored.

With this, I support to have the file configuration automatically include the environment variables. If the same configuration defines in the file and also as an environment variable, the value from the file rules.
By the way, in MicroProfile Config, the environment variable and system properties are also opt in to provide configurations by default.

@yurishkuro
Copy link
Member

I think it's going to be continually surprising to users that all of the standard OTEL_* environment variables are ignored as soon as you introduce a yaml configuration file (e.g. to configure metric views).

@trask I don't know how surprising it would be (after all they are making a decision to use a config), but overall I would strive for less complexity rather than more. The existing situation with env vars is already pretty complex, and devising overlaying rules with config adds even more complexity (the content of your comment is a perfect illustration of that complexity). I don't feel that this complexity is warranted, given that standard variable substitution in the config provides the same customization capabilities, it's a well-understood solution without additional mental overhead.

@jack-berg
Copy link
Member Author

I propose we add to the configuration file component an option (maybe named dry_run or something similar) that instead of running anything would just print the code that would instantiate the same objects that would be instantiated by the configuration file component if it was executed normally. In this way, the user can adjust their environment variables and find out the result that changing them would have on the instantiated objects and we don't have to figure out any algorithm to solve conflicts between the environment variables and configuration files.

@ocelotl The idea of having code that prints code seems brittle and hard to maintain.

But I'm also having a little trouble understanding what you are suggesting, but let me to restate my understanding:

  • You're coming from a standpoint where a language, like python, implements the interpretation of the environment variable scheme directly in components rather than in a separate artifact. I.e. the Otlp exporters directly interpret OTEL_EXPORTER_OTLP_* environment variables.
  • File configuration strongly suggests decoupling interpreting the configuration model from the SDK components, but ultimately, the Create operation still has to instantiate components like the OTLP exporter, which in some languages may be performing additional interpretation of environment variables.
  • You're saying that we don't need to solve the merge problem because languages that interpret environment variables inside of components already have merge semantics. So instead, just give tools for printing out how how these things interact.

I don't think I like this. It essentially leaves the merge semantics to be a language level decision, which reduces portability of configuration.

If I'm misinterpreting the proposal please let me know.

@yurishkuro
Copy link
Member

It essentially leaves the merge semantics to be a language level decision, which reduces portability of configuration.

I think this is a great argument against overlaying! We already have a mess of compatibility matrices with varied support for different env vars. Saying No to overlaying just removes this problem altogether - languages only need to implement one generic variable substitution, not a mish-mash of implementations in each and every component.

@ocelotl
Copy link
Contributor

ocelotl commented Jan 23, 2024

@ocelotl The idea of having code that prints code seems brittle and hard to maintain.

Maybe it is different for other languages but in Python it's quite simple, we only need to add a well-known method to every class.

  • You're coming from a standpoint where a language, like python, implements the interpretation of the environment variable scheme directly in components rather than in a separate artifact. I.e. the Otlp exporters directly interpret OTEL_EXPORTER_OTLP_* environment variables.

Correct. That's what we currently do in Python. This is an important point, more about this later.

  • File configuration strongly suggests decoupling interpreting the configuration model from the SDK components, but ultimately, the Create operation still has to instantiate components like the OTLP exporter, which in some languages may be performing additional interpretation of environment variables.

Ok, I think I agree with this...

  • You're saying that we don't need to solve the merge problem because languages that interpret environment variables inside of components already have merge semantics. So instead, just give tools for printing out how how these things interact.

Mostly correct, more about this later too.

I don't think I like this. It essentially leaves the merge semantics to be a language level decision, which reduces portability of configuration.

The merge semantics are already a language decision. If they are currently a problem, we have to fix that problem where it currently is, not by adding the configuration file component. The file configuration component cannot be the solution for this problem because:

  1. Even after the file configuration component is added, users will still be able to run OTel without using it.
  2. The file configuration component trying to merge or override the environment variables would be an additional problem because there is no right way of doing this (or maybe it is (Dynaconf may have an algorithm, more below)? 🤔) kind of operation.

@jack-berg I think that one of your goals for the file configuration component is to provide developers with a clean, decoupled mechanism to obtain configuration values (something like configuration = Configuration(); configuration.otel_exporter_timeout == 10). That would be great (in fact I even tried to do the same thing long time ago). While working on that I realized there already existed a Python project that did the same thing, Dynaconf.

Now, 20 seconds ago, while writing this I looked into this project again and found that they may have an algorithm that we could use (hope this helps!).

If I understand @yurishkuro comments right, I think @yurishkuro and I agree (@yurishkuro please correct me if I am wrong). We should not try to add an arbitrary or not-so-well defined algorithm to merge the environment variables and configuration files.

So, to summarize a few things:

  1. Regardless of what we decide on this merging/overriding/etc issue, I see value in allowing the user to know what objects are to be instantiated by the configuration file component. This can help users clearly understand what the file configuration component is doing which is essential for debugging. For Python I have implemented this feature by printing the equivalent code, maybe for other languages there are better approaches.
  2. I am not opposed to merging environment variables with configuration values per se. I am opposed to merging environment variables with configuration values using an arbitrary or not-so-well defined approach. If we can't find a clean way of doing this, we are better not doing it at all.
  3. It would be great to have a "configuration" object that we can use in our code that would abstract developers from having to use the "raw" values of environment variables (or maybe configuration files too). From my past experience implementing that I remember it was nice to have something that would automatically transform an enviromnent value string "true" into a boolean value True that we could use in the SDK. Nevertheless, if we want to have this feature work with environment variables and configuration files too, we first need to find a clean, proper algorithm to do the merging/overriding, if not, we are better without this feature as well.

@brunobat
Copy link
Contributor

Now, 20 seconds ago, while writing this I looked into this project again and found that they may have an algorithm that we could use (hope this helps!).

In Java there is Microprofile Config. It's a spec with many implementations to manage configurations: https://github.com/eclipse/microprofile-config

It would be great to have a "configuration" object that we can use in our code that would abstract developers from having to use the "raw" values of environment variables (or maybe configuration files too).

I agree.

I think it makes sense to discuss about configuration sources. We now have file and env./sys. vars. (there might be more in the future?) and I think it make sense to decide if we are going to have a hierarchy of sources or if they will be flat and it's up to the implementations to figure out the merge.
I think leaving the merge behaviour to the language specific implementations would lead to inconsistencies and mess up the config of large microservice deployments, with services implemented in many languages.

@lmolkova
Copy link
Contributor

lmolkova commented Mar 18, 2024

And allowing platforms to set resources via env vars makes a lot of sense: developers do not necessarily know which resources a platform has available to it.

Platforms should set platform specific resource attributes via resource detectors. OTEL_RESOURCE_ATTRIBUTES is the wrong tool for the job.

It would mean that every platform would need to implement a detector in every language and ship a package containing this detector.
I suspect they won't, at least smaller ones (from Azure perspective, I doubt that we will be able do it for every lang).
It would make user experience worse - there won't be a consistent way for cloud providers/etc, no consistent way for users across languages, and there would be more packages, version conflicts, dependencies, etc.

If we expect everyone who wants to integrate with otel do something, we should provide them a convenience. Existing resource env vars do and are already used for this purpose, so why is it a wrong tool?

@lmolkova
Copy link
Contributor

lmolkova commented Mar 18, 2024

Thinking more on this. The issue with supporting the old env vars is that how they should be applied is ambiguous. Except, for env vars that set resources it is actually unambiguous. The resource section is always at the same path in the config file, correct? So there's no potential confusion as to how they would be applied.

I do support this proposal. I still think we might need a few extra env vars for exporter and propagators

  • AWS lambda and xray wants to give priority to xray propagator
  • I assume providers might want to provide default exporter choice (otlp and the endpoint)

We can either investigate if more env vars should interop with the config or start with resource ones and leave a room for more env vars to be added in the future to the 'interop' list if we'll find them necessary.

I also support @trask proposal to deprecate some env vars. I suggest starting with those which don't have a good interop story with the config.

@tedsuo
Copy link
Contributor

tedsuo commented Mar 19, 2024

Platforms should set platform specific resource attributes via resource detectors. OTEL_RESOURCE_ATTRIBUTES is the wrong tool for the job.

Resource detectors are for handling platforms which are not opentelemetry-aware, as we don't have any other choice. It is much better for platforms to set their resources directly in a language-independent manner, free of maintenance overhead and version mismatches.

@jack-berg
Copy link
Member Author

Existing resource env vars do and are already used for this purpose, so why is it a wrong tool?

I regret bringing this up because I think its besides the point of this conversation. With so much debate on this particular issue, its especially important to not get sidetracked.

The ideas that have been discussed most recently in this thread are NOT compatible with the configuration working group's recommendation. The main idea seems to be to have the existing env vars override the contents of a config file where a clear mapping is possible, and to deprecate the existing env vars which do not have a clear mapping. If this were the direction, we would not want to have starter templates which reference all the existing env vars and their defaults using the substitution syntax, since doing so would muddy the waters even more. And without this requirement, we probably would think twice about introducing the env var substitution default syntax. Additionally, the config working group recommendation only needs to define the behavior of ignoring existing env vars when OTEL_CONFIG_FILE is specified, but this idea would need to define whether or not existing env vars take precedence when parse and / orcreate are called, or only when OTEL_CONFIG_FILE is specified.

This is to say that the recent ideas in the thread are an entirely new proposal, rather than a small modification to the recommendation. As I've mentioned several times, I'm unsure how to move on from here. There were 6 proposals considered by the SIG. This set of ideas represents a 7th. It would be incorrect for us to go and change the recommendation after concluding the process, and furthermore I personally wouldn't want to. The courses to take from here appear to be:

@yurishkuro
Copy link
Member

yurishkuro commented Mar 19, 2024

@jack-berg I think the issue with the evaluation document is that it did not provide an evaluation framework, and thus the recommendation is difficult to justify (see my rant on "pros & cons"). There is rarely an option that beats all other options on all decision dimensions, but when those criteria are not even defined clearly then the comparison will always look not convincing. On the other hand, if the decision was documented with a decision framework, then adding proposal #7 to it would be relatively straightforward and a new (or same) recommendation can be issued.

One very useful decision framework is Traffic Lights (based on quick googling this blog post does a decent job explaining it). It's not difficult to transform the existing pros & cons already collected in the doc and in this thread into a traffic lights matrix. Some of the criteria that I would consider important are:

  • now much extra work on maintainers a proposal creates
  • if a given proposal is chosen, whether it prevents other proposals from being implemented

The important aspect of traffic lights method is to make sure that people's concerns are heard and reflected in the evaluation. For example, @lmolkova is worried about existing documentation and backwards compatibility - so we should add that to the matrix and compare different options on this axis (e.g. 2b would be yellow, Ted's proposal of resource-only env vars would also be yellow).

@jack-berg
Copy link
Member Author

The document summarized the options, but was paired with a comment which did describe an evaluation framework. It was not a traffic lights framework, but included evaluation criteria with an emphasis on the high priority criteria, a defense of the recommendation including explanation of weaknesses, and summary of why the others were rejected.

Maybe otel could / should adopt a decision making framework like traffic lights, but no such precedent exists. One problem I can think without having used traffic lights in anger is disagreement over the color of the light, and over the relative significance / weight of each of the evaluation criteria. In the process for making this recommendation, the configuration-maintainers voted repeatedly until a winner was selected - similar to what the TC might do. How do you describe each individual's process for deciding how to vote in a summary of a recommendation?

There's an additional meta-issue with this: We described a process to collect and summarize all the different points of view, and make a decision. We advertised that process and executed it in good faith. It was a significant investment in time. We drew a line in the sand, and the process wasn't perfect but it was pretty good and organized compared to what I've encountered elsewhere in this project. Re-initializing the conversation sends a signal that processes like this have no teeth, and hurts decision making since there's no penalty for not participating (i.e. the conversation is never really over). (Note: This is not targeted at anyone, or even at this particular issue. I've noticed this with a number of issues over the years and it seems relevant now.)

@lmolkova
Copy link
Contributor

lmolkova commented Mar 20, 2024

I find the process you outline here #3752 (comment) to be awesome.

I agree with @yurishkuro that there could be more evaluation criterias. I'd add something around these lines:

  • existing experience with env vars should not become worse. It should not break existing integrations. (you can read it as backward compatibility or user experience, both apply)
  • we found a reasonable set of scenarios and prototyped them and here're the results

I think user experience is more important than "how much extra work on maintainers a proposal creates", but both have a good place in the evaluation criteria list.

If we had something like this as prereqs for the significant changes (does not degrade existing features, prototyped and evaluated, not too complex), there would be less last-minute feedback.

@jackshirazi
Copy link

If we had something like this as prereqs for the significant changes (does not degrade existing features, prototyped and evaluated, not too complex), there would be less last-minute feedback.

The equivalent of this was made clear in the discussions, the higher support cost from this and equivalent proposals was highlighted in the discussion agenda. The TC decided to not prioritize that sufficiently, we have to accept that. I thought my subsequent proposal to make the default support env vars using a merge of yaml was consistent with the chosen proposal, but @jack-berg points out it isn't because the proposal wording includes templates for these. We have to accept that too, (though I think this is a good compromise and we should be a little flexible about process vs best outcomes). So it's clear that to add this we need to escalate for a TC decision. For me, I feel like the people who would decide this have viewed the points here in the discussion and are not supportive of adding these defaults, so I don't see the point in such an escalation, though I'll happily support anyone who decides the effort is worthwhile

@lmolkova
Copy link
Contributor

I want to emphasize, if a new feature degrades/breaks existing stable experience, it should not happen. No amount of "keep things simple" or "decision has already been made" can justify breaking/degrading user experience.

@tedsuo
Copy link
Contributor

tedsuo commented Mar 22, 2024

@jack-berg having been on both sides of this process multiple times, I agree that it feels broken when a SIG ends up having to rehash everything when a proposal is brought to the community, and essentially having the entire debate over again so that the proposal can be approved.

Your (very helpful) design breakdown does a good job of listing the criteria that the design proposals are evaluated against. Because of the work done by you and the SIG, I actually think the debate happening now is helpful, but could be structured better. Let me explain why.

I've been involved in many of the major design decisions in Otel (tracing, context propagation, error handling, etc, etc). For major design decisions, it is often the case that when the proposals go public, requirements that the designers miss are brought in by community members who were not part of the internal design process. This is normal; ensuring that our designs meet all requirements is one of the reasons we have a public review process. OTel must work across many languages, runtimes, and platforms, and it must be careful about breaking compatibility. Metrics is a major example of a design that took three complete rewrites before meeting all requirements. That was unfortunate, but if we had refused to honor those late-breaking requirements, our metrics solution would have been a failure. Honestly, with something as major as a new configuration model, requirements and feedback from the wider community should be expected. Especially if the design proposes a perceived break in compatibility!

Anyways, my point is that I don't actually think that the debate we are having here is unnecessary. Right now a new requirement is being proposed – namely, that our definition of compatibility should be stricter than the definition that was used to drive the current design. Compatibility is very important, it's reasonable that we would spend more time gathering real-world examples and finding clarity on what we actually need here.

What is perhaps making this conversation difficult is that we are mixing requirement gathering with designing. Proposed requirements are getting glued together with particular solutions. That's crazy making. I agree with @yurishkuro's comments, and I recommend that we back off from talking about solutions for a bit. Let's first get agreement on what our compatibility requirements actually are. Once we actually agree on requirements, I suspect that the design solution will be fairly obvious.

If we want to elevate this decision to the TC, that's fine. But we need the TC to decide on the requirements, not the solution. Let's spend this next week getting our requirements gathering into a clean document, so that we can actually see what we are talking about. For debated requirements, we can record the different opinions on what the requirement actually is, along with real world examples. No mention of solutions until we finish this work. Perhaps the result of this process will improve how we make proposals in the future.

@tigrannajaryan
Copy link
Member

Let's first get agreement on what our compatibility requirements actually are.

I am going to look into this and if necessary will take to the TC for clarification. I see confusion and variety in opinions, which need to be clarified regardless of what we decide for config. I will post back when I have an update.

@pyohannes
Copy link
Contributor

I think the approach proposed by the workgroup is a reasonable choice. It brings a welcomed conceptual simplification, and sets us up well for things like remote SDK configuration.

Below I collected some points about features that the proposed solution will not support, as compared to our current solution. I don't see any of these as a blocker, but I want to list them here as part of this discussion, as it seems some people consider them as "implicit requirements" in the context of backward compatibility.

  • It's not possible anymore to override implicit defaults.

    The loss of this ability surprised some users to whom I talked about the proposed configuration approach. Currently, in .NET for example, users can add exporters with default values, which then are overwritten by environment variables. For example, .AddOtlpExporter() is called on the tracer provider, and then OTEL_EXPORTER_OTLP_ENDPOINT is set. As far as I can see, this is a use case that will not be supported by the proposed model: all defaults that should be overwritten must be specified explicitly. While I don't think this is necessarily a bad change, it is a behavioral change that will surprise some users.

  • I'm not sure if the template approach can cover all currently supported environment variables.

    Is it possible to have a template that honors all existing environment variables? Or will this be a best-effort approach? It seems to me that environment variables using key-value pairs and lists cannot be used with this approach. I'm also not sure how variables like OTEL_TRACES_EXPORTER can be supported.

  • We don't have guaranteed consistency anymore in environment variable overrides.

    Although we'll provide templates, nothing will stop users from using ${OTLP_ENDPOINT} in one place, and ${OTEL_OTLP_ENDPOINT} in another. The consistent set of environment variables we have now is hard to maintain, however, it's a nice-to-have from a user's point of view: it gives a consistent experience across languages, and it avoids unpleasant surprises.

@jack-berg
Copy link
Member Author

The loss of this ability surprised some users to whom I talked about the proposed configuration approach. Currently, in .NET for example, users can add exporters with default values, which then are overwritten by environment variables. For example, .AddOtlpExporter() is called on the tracer provider, and then OTEL_EXPORTER_OTLP_ENDPOINT is set. As far as I can see, this is a use case that will not be supported by the proposed model: all defaults that should be overwritten must be specified explicitly.

I'm not sure how this maps to the normal .NET programmatic config process. I know in .NET the pattern involves a combination of programmatic config with elements that implicitly read from env vars. File config implies more of a one-liner config process, where the user calls something like OpenTelemetrySdk.initialize() (or equivalent) which detects that OTEL_CONFIG_FILE is set, parses it, creates SDK components from the model, and returns those SDK components to the caller. I'm sure OpenTelemetry .NET will find a way to balance the API so that a user can start with a config file and optionally layer on additional config programmatically, but its really a different paradigm. Where the current env var based config almost necessitates some programmatic config because so many options are missing, the file config model aims to be a near exhaustive representation of what can be done programmatically. Maybe some users will still want to combine programmatic config with file config, but that's kind of missing the point.

Is it possible to have a template that honors all existing environment variables?

No, it would be a best effort approach. #3948 represents the recommendation, and has an associated PR in opentelemetry-configuration: https://github.com/open-telemetry/opentelemetry-configuration/pull/76/files. See the proposed starter template comments for a list of env vars which don't map well and would be ignored.

We don't have guaranteed consistency anymore in environment variable overrides.

Yes, this is correct.

@jack-berg
Copy link
Member Author

A point I haven't heard made yet with respect to compatibility and the env vars:

The goal of the env var spec is to standardize names of env vars where there is commonality between implementations. SDKs are not required to implement the env vars, and how they are implemented is left intentionally open ended:

The goal of this specification is to unify the environment variable names between different OpenTelemetry implementations.

Implementations MAY choose to allow configuration via the environment variables in this specification, but are not required to.

Environment variables MAY be handled (implemented) directly by a component, in the SDK, or in a separate component (e.g. environment-based autoconfiguration component).

This certainly leaves room for the introduction of a more prescriptive (and possibly net-new) component for handling file based configuration. I'm not sure how to read the intentionally loose language and conclude that we are restricted from introducing something new, quite different, and opinionated.

@lmolkova
Copy link
Contributor

lmolkova commented Mar 22, 2024

Environment variables are marked as stable. SDKs that implement them (virtually all) are mostly marked stable. SDKs can't stop supporting env vars without major version update.

Users that set env vars should be able to keep setting them and if we deprecate some, we'd need to provide at least some back-compat support for them anyway.

Please let's stop prioritizing new non-existent features over existing and popular ones.

@jack-berg
Copy link
Member Author

jack-berg commented Mar 23, 2024

SDKs can't stop supporting env vars without major version update.

I'm not advocating for stopping supporting env vars. Just providing an alternative door (door b). I see no reason to not support door a indefinitely - at least in opentelemetry-java where we strongly oppose revving the major version.

Please let's stop prioritizing new non-existent features over existing and popular ones.

Popular due to lack of alternative. The single most common type of issue I respond to in opentelemetry-java is asking for things not supported by the env vars. My response is always that new env vars need to be added to the spec, but that there is a moratorium in place making it difficult to add new ones. Just look how many times #2891 has been linked to. The user story is painful when env vars don't exist to express what you want, which occurs quite often. The single most popular issue (61 up votes at time of writing) in java instrumentation is about a lack of expressiveness in the env var syntax for describing non-trivial sampling situations. We wrote a dedicated view file config tool in opentelemetry-java to stop the bleeding and provide desperately needed configuration of things like explicit bucket boundaries, reducing cardinality, and dropping unneeded metrics. Based on the number of times I refer people to it answering issues, its quite popular.

If file config turns out to not be popular / useful, then people won't set OTEL_CONFIG_FILE and there's no problem with conflicting env vars to worry about 😁.

@lmolkova
Copy link
Contributor

lmolkova commented Mar 23, 2024

If we stabilized something imperfect in the past, we can't just say we do something else now.
We have to find a way to update gracefully and keep all the good things that we had.

So I'm suggesting to keep both doors open at the same time in #3948 (comment).

We can and should review existing env vars and fix/deprecate some of them. In the current proposal there is no attempt to fix env vars, there is an implicit attempt to drop them eventually (or make sure nobody uses them) in ungraceful manner.

@trask
Copy link
Member

trask commented Mar 26, 2024

Without taking a position on whether the existing env vars should or should not interop:

I think we should deprecate any env vars that do not interop with yaml config.

Why?

I think OpenTelemetry should have a clear recommendation for users on how to configure SDK + Instrumentation.

We know that yaml config is required to support some popular user requests, namely metric views and attribute-based sampling (the 4th most upvoted issue across all of OpenTelemetry).

So, if our recommendation is to "start with env vars", then we know that we are steering a lot of users down a one-way path.

I think it will be a much better user experience for everyone to start directly with yaml and not need to rewind and go down a different path later.

SDKs of course will have to support the deprecated env vars (without yaml interop) at least until their next major bump.

But by deprecating the env vars that do not interop with yaml config, we give a clear signal to users about the path they should take when onboarding to OpenTelemetry.

@jack-berg
Copy link
Member Author

I think we should deprecate any env vars that do not interop with yaml config.

I'm not opposed to that, but we should consider the timing:

File config is still a very experimental idea. This is in large part because how contentious the PRs have been (note that these PRs were for the most part just restating things that had already been approved in the original otep):

There's still a lot of work to be done, especially:

  • Building of prototypes in a variety of languages to prove concepts
  • Build out the configuration model schema
  • Figure out how stability guarantees work with the schema. I.e. what types of things can be changed in patch, minor, major releases.

Based on the history of getting things done for file config, and of getting similar schemas like opentelemetry-proto from experimental to stable, I'm disappointed to say that I don't see a stable file config spec as realistic in the short / medium term future.

Deprecating env vars significantly before file config is set to be stable sends a bad signal to the user: env vars are stable but deprecated, but the replacement is experimental without a target date for stability. I'd like to see env vars and file config coexist without deprecating env vars until there's a realistic prospect of marking file config stable.

@tigrannajaryan
Copy link
Member

+1 to eventually deprecating some or all old env vars, ideally to have only one way of configuring the SDK. This of course can only happen sometime after the new way of configuring has a stable spec and is widely implemented by SDKs.

@jack-berg
Copy link
Member Author

Hi All - Please see this comment updating the status of the issue:

Per @tedsuo’s request, we discussed this issue in the 3/24/27 TC meeting and have made a decision:
Generally, we will follow @trask's comment, proceeding with this PR with a few changes:

  • Rename OTEL_CONFIG_FILE to OTEL_EXPERIMENTAL_CONFIG_FILE, reflecting the fact that the semantics around how the value of env var are subject to breaking changes as the file configuration spec and schema continue to evolve.
  • Ensure that env vars which don’t interop with file config are deprecated when file config is ready for stabilization, reflecting that we do not want to recommend multiple competing configuration stories. This could be ensured via an explicit note in the markdown, or a blocking issue - both achieve the same effect. Deprecate environment variables which do not interoperate with file config when ready for stabilization #3967
  • Ensure that file config has an interop where platforms (i.e. Azure functions, otel operator, etc) contribute to config. We should proceed with #3948 without being prescriptive about how that mechanism works. In the TC meeting, 4 distinct solutions were discussed which had different tradeoffs and limitations. It is clear that we still need to learn more about the requirements and constraints of this use case and let the findings inform the solution. The config working group should prioritize this discussion, but an answer shouldn’t block this PR. We should open a new issue to track the requirements and discuss solutions, and ensure that we treat that issue as blocking for any sort of stabilization effort (although it should ideally be solved much sooner). Ensure that file config has solution for platforms which contribute to config #3966

carlosalberto pushed a commit to carlosalberto/opentelemetry-specification that referenced this issue Oct 31, 2024
…v var substitution default syntax (open-telemetry#3948)

Fixes open-telemetry#3752.

Implementation of the configuration working group recommendation as
[described
here](open-telemetry#3752 (comment)).

- Adds definition for `OTEL_EXPERIMENTAL_CONFIG_FILE `, which ignores
other env vars when evaluating, except env var substitution references
- Adds new env var substitution default syntax `${ENVVAR:-defaultValue}`
- Paired with
open-telemetry/opentelemetry-configuration#76,
which defines a new
[sdk-config.yaml](https://github.com/jack-berg/opentelemetry-configuration/blob/starter-template/examples/sdk-config.yaml)
starter template referencing all env vars that map cleanly.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment