Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Decision to be made] Approach to handle experimental types of telemetry #4705

Closed
dmitryax opened this issue Jan 18, 2022 · 18 comments
Closed
Labels
release:required-for-ga Must be resolved before GA release

Comments

@dmitryax
Copy link
Member

dmitryax commented Jan 18, 2022

Before declaring Collector GA we need to decide on how we handle new experimental telemetry types. Currently we have logs signal which is in Beta, and pdata parts related to logs can be changed.

We need to decide if we can:

  • Mark unstable parts of the public API as Experimental. This is the easiest approach and allow us to proceed with no changes to existing public API just marking API related to log as experimental. The same approach can be applied to any other signals that can be added in future.

  • Split all the existing packages based on telemetry type: pdata, consumer, component. This seems to be the the most disruptive way and requires many changes. Another downside of this approach is that we will not be able add support of experimental data type in stable components like batch processor, memorylimiter processor, otlp receiver/collector with the current design. It means that those components cannot be marked as GA while logs signal is still in Beta unless we want rework them as well or separate out the logs support.

@Aneurysm9
Copy link
Member

  • Mark unstable parts of the public API as Experimental. This is the easiest approach and allow us to proceed with no changes to existing public API just marking API related to log as experimental. The same approach can be applied to any other signals that can be added in future.

Would this be done in conjunction with a change to the version of the collector and pdata modules to v1.0.0 or not? I do not believe that it is appropriate to use a v1.0.0 module version but say "some parts of the public API of this module may change at any time". The spec for Go module versions is rather clear:

Each part of a version indicates whether the version is stable and whether it is compatible with previous versions.

  • The major version must be incremented and the minor and patch versions must be set to zero after a backwards incompatible change is made to the module’s public interface or documented functionality, for example, after a package is removed.

I would prefer that we take the approach of separating stable modules from unstable modules. We have already introduced the use of the multimod tool that enabled us to manage this split in the Go SDK, so I think that all of the tooling needed to make this workable already exists and is integrated.

Another downside of this approach is that we will not be able add support of experimental data type in stable components like batch processor, memorylimiter processor, otlp receiver/collector with the current design. It means that those components cannot be marked as GA while logs signal is still in Beta unless we want rework them as well or separate out the logs support.

This part is less clear to me. If we can stabilize the public API and are willing to allow stable modules to depend on unstable modules (this would require some change to the multimod tool, as it currently enforces dependency flow only in the direction of stability) then we can have those stable components provide support for experimental signals while abstracting away the volatile details.

For instance, do we expect that there will be any changes to the consumer.Logs interface? If not, and I would be shocked if there were any reason for it to change, then it becomes a question of whether we need the pdata.Logs structure to be stable or simply to have an interface that will always demand that type.

@bogdandrutu
Copy link
Member

For instance, do we expect that there will be any changes to the consumer.Logs interface? If not, and I would be shocked if there were any reason for it to change, then it becomes a question of whether we need the pdata.Logs structure to be stable or simply to have an interface that will always demand that type.

What is the advantage of doing this? If users are using the "unstable" part we have the same problems.

@Aneurysm9
Copy link
Member

Aneurysm9 commented Jan 19, 2022

What is the advantage of doing this?

If the OTLP receiver is stabilized but uses an unstable data model it would still be possible for users constructing collector instances to know that the OTLP receiver has a stable public API and to use it without concern that it will suddenly require that they change the way they use it. This seems similar to how the Java SIG currently has a stable Meter API but not a stable SDK implementing it. Users can start to rely on the API knowing that it will be stable even if the implementation they use may change.

It also conforms to the Go module versioning rules, while the other approach does not.

If users are using the "unstable" part we have the same problems.

I'm not sure what problems, specifically, you are referring to. Do you have an enumeration?

@dmitryax
Copy link
Member Author

Would this be done in conjunction with a change to the version of the collector and pdata modules to v1.0.0 or not?

Yes, this is the idea. This issue is supposed to be a prerequisite for go.opentelemetry.io/collector/model (or at least stable parts of it) 1.0 release plan.

For instance, do we expect that there will be any changes to the consumer.Logs interface? If not, and I would be shocked if there were any reason for it to change, then it becomes a question of whether we need the pdata.Logs structure to be stable or simply to have an interface that will always demand that type.

This would add a requirement that any new experimental data model has to be defined in stable API, at least its name and presence. But an experimental data type possibly can be removed or renamed after. If we allow stable component like OTLP receiver/exporter to depend on an unstable pdata modules, maybe we should still split all other packages (consumer, component, helper etc) by pdata type? That's a lot of changes.

@tigrannajaryan
Copy link
Member

How does Kubernetes Go API do this? Kubernetes marks individual featues as Stable/Beta/Alpha. What do they do with the corresponding APIs in Go client?

@tigrannajaryan
Copy link
Member

From k8s client-go docs:

Please note that alpha APIs may vanish or change significantly in a single release.

@mx-psi
Copy link
Member

mx-psi commented Jan 19, 2022

How does Kubernetes Go API do this? Kubernetes marks individual featues as Stable/Beta/Alpha. What do they do with the corresponding APIs in Go client?

Kubernetes, when used as a library, uses v0.x versions, so I don't think it is a good example (we want to mark at least parts of go.opentelemetry.io/collector as stable, right?). I think the stability of user-facing features is different from the stability of the Collector as a library.

@bogdandrutu
Copy link
Member

@mx-psi grpc is 1.0 and it does that, grpc is not the only one that does it.

I think the stability of user-facing features is different from the stability of the Collector as a library.

Unfortunately people just ask for > 1.0, not sure how to solve that without having that API stability.

@mx-psi
Copy link
Member

mx-psi commented Jan 19, 2022

grpc is 1.0 and it does that, grpc is not the only one that does it.

grpc-go's approach to versioning is not compliant with the Go module spec, which, since it is a widely used dependency, has caused important parts of the Go ecosystem to be broken, because eventually someone forgets to read the docs and see that something is marked as "experimental".

@Aneurysm9
Copy link
Member

It is probably worth reviewing the guidance given to OTel client libraries regarding versioning.

@mx-psi
Copy link
Member

mx-psi commented Jan 19, 2022

etcd-io/etcd#12124 is a good example of the pain that grpc-go's approach to semver has caused

@bogdandrutu
Copy link
Member

@mx-psi let's assume we put the experimental "API" in a separate module, and people start using that, you have the same problem. Once something is public and experimental independent of "godoc" comment in a stable module, or an experimental module that ends up in the final build the problem is the same that you are mentioning and none of the proposals that I've heard will fix that.

The design problem that we have in this case is that for testing/developing we need to be able to start components (receivers/processors/exporters) which means we need to depend on the new signal in the top level module (the one that includes the current "service" package) so we will always have a transitive dependency on that "experimental" signal, hence if anyone uses that signal you will have the same problem. One argument can be made that "service" depends only on the factory of that signal so can do some "replace in go mod tricks", indeed that works if that is the only use, but for sure we will want to have things like the "exporter helper" support for that, or like "batching" support for that, or like "memory usage processors" which need to use some parts at least of the new signal pdata definition.

So I think none of the proposed solutions will work. The only option that I can think of for the moment is simply to trust users to not build components that depend on a "experimental" signal outside of core/contrib.

@mx-psi
Copy link
Member

mx-psi commented Jan 20, 2022

@bogdandrutu I think that there are three problems to solve here:

  1. Approach for breaking Go API changes on go.opentelemetry.io/collector/model.
  2. Approach for breaking Go API changes on go.opentelemetry.io/collector (and any eventual modules under this namespace other than model).
  3. Approach for breaking user-facing changes on the OpenTelemetry Collector.

Problem (1) is the one that e.g. @dashpole is affected by and the first problem that needs to be solved. For this particular problem, I think "split packages based on telemetry type" would work: the version on each separate module would reflect the API stability, since this module is designed in a way that allows for easy splitting based on signal type. It is also the part of the codebase that I would expect is used the most as a library, so it is the place where it is the most important to have 'true' (Go module spec compliant) API stability.

For problem (2), I agree that "none of the proposed solutions will work" (or at least, none of the proposed solutions are sufficiently detailed to convince me that they will work). I think reaching Go API stability on go.opentelemetry.io/collector is less urgent, since it is less likely that go.opentelemetry.io/collector is used as a library, other than in Collector distros/other telemetry daemons. We don't have to solve Problem (2) on this issue: If we follow the "split" approach for problem (1), we don't limit ourselves in what way we will solve it in the future: splitting or not splitting could be done.

Problem (3) is related but not necessarily coupled with problems (1) and (2). I think the Kubernetes approach proves that Go API stability can be dealt separately from end-user behavior stability, and that the feature gate tooling is a good solution here.

@Aneurysm9
Copy link
Member

I think there is a path to a solution for problem (2) that entails inverting control of signal handling in the collector service. If we have the ability to register with it the means to build a component for a signal type rather than a fixed list of types a component can be built for, we can remove the need for the service to import handlers for specific types and pull that up into the layer responsible for creating a service.

That would then enable the use of feature gates to control the use of new signal types and put the dependence on experimental modules for new signal types in the application space rather than the framework space. I think we are all agreed that a stable application can depend on experimental modules if it does not break the behavior or usage patterns of stable modules. This makes problem (2) into problem (3), at least for the collector service. There may be other instances of problem (2), but I suspect they would be susceptible to similar solutions.

@dmitryax
Copy link
Member Author

dmitryax commented Feb 2, 2022

It was discussed during SIG meeting on Feb 2 that we are going to split the pdata into different modules at least to address Problem (1).

@punya
Copy link
Member

punya commented Feb 11, 2022

Now that #4832 has been carved out, can this broader issue be removed from the pdata stabilization milestone?

@dmitryax
Copy link
Member Author

Makes sense, removed from the milestone

@tigrannajaryan tigrannajaryan changed the title [Decision to be made] Approach to handle experimental types of temelemetry [Decision to be made] Approach to handle experimental types of telemetry Feb 14, 2022
@atoulme atoulme added the release:required-for-ga Must be resolved before GA release label Dec 19, 2023
@mx-psi mx-psi moved this to Todo in Collector: v1 Apr 18, 2024
@mx-psi
Copy link
Member

mx-psi commented Apr 19, 2024

I think we can close this:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release:required-for-ga Must be resolved before GA release
Projects
Archived in project
Development

No branches or pull requests

7 participants