-
Notifications
You must be signed in to change notification settings - Fork 894
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Defining transition period for pre-stability HTTP semconv breaking changes #3362
Comments
@vishweshbankwar FYI. |
This seems like a good plan. However, it is quite labor intensive, and I hope we don't feel like this type of process is necessary to make changes to any experimental semantic conventions. Hopefully http semantic conventions are the exception due to their wide use. |
Should the instrumentations emit exclusively |
I would recommend NOT doing this and instead introduce as a new major version (with breaking changes) or a new package and deprecate the previous (current) packages. This is several fold
|
I think this plan is too disruptive and does not give enough time for users and vendors to prepare for the change. I would prefer to be more careful and instead do this:
|
Is there a list of what exactly the breaking changes are? Transition period is most influenced by the scope of these changes. I see this table but it's unclear to me what is committed as a breaking change vs. what's just going to continue on with the existing name. |
the prototyped ECS onboarding changes are in this draft and the full list is in the PR description #3355 |
@tigrannajaryan do you see a path forward that would allow us to declare HTTP semantic conventions stable at the start of the transition period instead of at the end of the transition period? (this would allow people who are currently waiting on stable HTTP semantic conventions to move forward without needing to wait for the transition period to end) |
@trask yes, I think we can declare them stable starting from version
I think with my plan they do not need to wait. We can immediately go ahead and merge the PR that declares HTTP semantic convention stable and going into effect from 1.26.0, then we go ahead and implement the new conventions in instrumentation libraries and then anyone who wants to immediately use the new conventions can go ahead and enable the flag to opt-in early, without waiting for the new conventions to become the default. To be clear: I don't insist on my particular plan, it is just one possible option. I am happy with any other approach that gives a similar amount of time for people who still use the old (current) conventions to prepare for the change. |
I think there are two(?) reasons for the transition period:
To satisfy (1), it seems we need to have a time period (e.g. X months) between when the new HTTP semconv changes are merged and when instrumentations are allowed to emit them by default (an opt-in flag to emit them could be ok). To satisfy (2), it seems we need to have an additional time period (e.g. Y months) after any given instrumentation initially starts emitting the new HTTP semconv during which there’s also a supported version of the instrumentation which emits the old HTTP semconv. I think there are a couple of ways instrumentations can satisfy (2):
For Java we can commit to supporting a flag, but I think it would be good to have a lighter-weight option for languages that do not have full-time instrumentation maintainers.
A couple extra thoughts/questions:
|
I maybe wrong about this, I like @lmolkova's suggestion in #3355 to implement a |
I think what I was suggesting earlier was a slightly more formal version of this. The I think it is important to have this information recorded somewhere. If we don't want it to be formalized and added to yaml files I am fine with it (a text will probably do too). To re-iterate what I am looking for:
I think the answer should be yes. Backends typically don't try to change the behavior based on the source language or the popularity of the framework used. |
in the PR I just sent #3381, I put X=3 months for vendors to support the new conventions, and Y=3 months on top of that for users to migrate. this gives a minimum of 6 months before users MUST migrate I'd prefer to increase Y over increasing X if possible, because the larger X is, the more first-time users will onboard to the old semconv and will need to migrate |
@tedsuo proposed another alternative that I think is worth getting feedback on: Whenever an HTTP instrumentation authored by OpenTelemetry adopts the new HTTP semconv, it SHOULD bump its major version. The goal of this is to prevent auto-updates from users who have pinned to a major version, e.g. In practice, this would mean:
Then as far as transition plan goes
|
Not to complexify an already complex situation, but something that I think is missing here is some degree of accounting on the impact for tools/vendors that accept this data. For example, some backends can handle this change pretty smoothly by "double writing" events -- when it detects one name, we also writes the other -- this keeps all existing queries, alerts, SLOs, etc. working when they're based on older attributes and allows them to work in the face of mixed data. And as instrumentation gradually moves to be all or nearly all based on the new version, data retention can kick in by aging out the old stuff and then there's no "mess" to go clean up later. But I doubt it will be that smooth for every backend. Are there categories of backend that are more/less impacted by this change? Are there any general guidance we can offer, or is that getting too out of scope? It would also be good to get some degree of accounting on the impact this has for tools that sit in someone's observability pipeline. For example, a sampling proxy that lets you configure keys for sampling purposes will be impacted by this. Would an end-user need to make sure they upgrade all instrumentation across all services if they're doing tail sampling? Or is it more that end-users would need to write rules (assuming it's possible) that check for the existence of either key? Or could a tool "just handle it" somehow? I realize the scope of OTel is explicitly just on the instrumentation front, but if we don't want to put the burden of migration on every end-user of OTel, I think we need to elaborate a lot more on how tools int he ecosystem can and should generally respond to these changes. |
I fear this is going to cause issues with users, especially those in a complex micro services environment. Several SDKs have been stable with http auto-instrumentation for quite some time. A microservice that isn't frequently updated has an SDK in place and will continue to work as is. Why fix it if it isn't broken? With this change we will now have a set of microservices using a new attribute name and others still on the old. This will break what attributes users are querying and displaying on dashboards. |
Given the widespread adoption of OpenTelemetry HTTP semantic convention, and the extensive pre-stability breaking changes we are planning as part of ECS alignment, we are planning to provide a transition period to help give users and vendors time to adapt to these changes.
Here is an initial proposal from the HTTP semconv stability WG:
The text was updated successfully, but these errors were encountered: