-
Notifications
You must be signed in to change notification settings - Fork 297
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ASP.NET Core] GRPC instrumentation support #1726
Comments
Will this be added back in the next non-stable release? Or as experimental feature in stable? |
The current plan is to bring back in the next non-stable release (Post GA). |
Why not just wait that extra month? There's no rush to push out all the (massively) breaking changes in HTTP instrumentation. I know of zero end-users who even know about this, let alone those who only just read the blog post about HTTP semconv stabilizing in November. And that blog post also only has a tiny number of views despite us trying to get the word out that things were going to change. |
What benefit does waiting a month gives? gRPC conventions won't still be stable in a month, right? Maybe we should make an opt-in experimental feature to re-enable the gRPC part, and make it available in the first stable release itself? |
Sorry, I crossed some streams here -- yes, gRPC isn't going stable in a month. But I guess I don't understand what's going on anymore. If the goal is to stabilize the aspnetcore package, then adding gRPC support back in later makes it not stable anymore? What about new users who want the latest bug fixes (if they exist in the future) for aspnetcore instrumentation, and also want to use gRPC? |
I'll also reiterate that I don't think end-users have any hope at understanding what's happening here |
gRPC support will be under experimental feature flag only. The package will be stable, and offer aspnetcore instrumentation by default. Users who want to get gRPC feature, can opt-in for it, and it can remain opt-in until semantic conventions stabilize. I think the first stable release should contain the exprimental+opt-in capability to support grpc attributes.
Update to the newer versions having the bug fixes. For gRPC, same as my above comment : users can keep the opt-in feature for that. |
Can't really see how comments like this are useful to anyone! |
My specific feedback is just hold off on releasing for a moment. If the plan is to reintroduce gRPC instrumentation, why not just build what you're planning to build and release that? There's three points of churn here:
Why do three when you can do two? |
How would that help anyone? The 1st stable release of instrumentations are expected in under 2 weeks, so we should move forward and fix gRPC issue either by putting it under experimental feature flag in the stable release, or in the pre-release to be released on same time as stable. My personal preference is to put gRPC support back under experimental feature flag, and release it as part of the 1st stable release. |
I agree with this approach if gRPC instrumentation is what's needed then we can introduce an experimental flag Regarding
This is an expected churn and was communicated well in advance. We have followed the formal transition plan defined in semantic conventions. All the languages are expected to go through it. |
@vishweshbankwar Could you start/prepare a PR doing |
@cartermp Assume the above goes through, does it mitigate some/most/all concerns? |
Yes, thanks! I think that does help alleviate some concerns. |
Who is that you're expecting to have seen that issue and catered for it? The thousands of people affected by the issue who are downloading the package from Nuget after watching .NET Conf or one of the numerous talks about OpenTelemetry? |
Who is promising that this package is stable, and there won't be breaking change? The nuget is marked pre-release, and the readme.md clearly states its non-stable status. If someone else is falsely advertising the package as stable, what can this repo do? Did .NET Conf announced that OTel Instrumentation libraries are stable? OR are there talks which are falsely announcing OTel Instrumentation as stable? Its totally understandable that the users would be affected by the breaking changes to this package, but it is inevitable for any non-stable package. If you want to help, please suggest specific, actionable suggestion. Happy to hear more ideas on how to communicate breaking changes more effectively, if you believe people won't read docs or look at changelogs. A few ideas were discussed in SIG meeting today, but they were relatively minor/small changes, except clearly documenting website docs: https://opentelemetry.io/docs/instrumentation/net/libraries/ (that is a separate issue that the docs from this repo and website are not yet merged :( ) |
I think you are still seeing more concerns which are not addressed. How can we help bridge the gap? Today's SIG meeting discussed some ideas to better communicate breaking changes from this repo, pasting below for easy reference. Please add your feedback to the list as well. (This is separate from the overall opentelemetry.io blogs about semantic conventions changes etc., and is focused on what OTel .NET group can do)
|
I think you're fundamentally misunderstanding users here. When something is announced, and demoed at a major Microsoft conference, no amount of "It's in the README on Github" makes it visible to users that you could break their telemetry at any point. As I said, this is IMPLICIT, at no point during the Demos, or in the docs, does it say that we will remove functionality at any time, as we choose, and if you want to know about that you'll need to be reading the Github issues constantly. Unless I missed that part of the talks that were on .NET Conf? Maybe look here? https://learn.microsoft.com/en-us/azure/azure-monitor/app/opentelemetry-configuration?tabs=aspnetcore (which is Azure Monitor specific, but I don't think anyone is naïve enough to suggest that Azure docs aren't used by users to understand Microsoft support). You seem to be suggesting that noone should be using this library in production, and using the fact that it isn't "Stable" as a catchall for breaking things. That isn't OK. If we don't want anyone using this in production, noone should be demoing it in talks, let alone key Microsoft people in a flagship conference, atleast with explicit caveats that it will break, and that they need to watch the github issues to stay up to date. I get why it wasn't mentioned, because it's not a good look for Microsoft to be bigging it up like that, when it's not supported. I'm not a maintainer here, I'm the voice of the hundreds, if not thousands, of users of these libraries in production today that I speak to every day. I can help with messaging, I can help you understand the frustrations of users. I can't help if your answer is that this is not stable therefore we can break it any time. If the answer is that anything, at any time, can break, we need more explicit and loud warnings from Microsoft who are demoing this. Perhaps make every method as Experimental in the all the libraries, since that makes it explicit, and forces people to add flags to every library (we could actually then release everything as stable). To be clear, in other languages, there is no concept of "prerelease" in the way that nuget does it. Therefore upgrades of libraries in the 0.x range can be scoped to minor/hotfix only, which we can't do in .NET. That's the reason this is a more specific problem for us. |
@cijothomas I do, but it's not specific to this repo / it's not specifically a .NET SIG problem. I do think that bringing the gRPC instrumentation back in (behind a flag) at least does reduce some of the coming churn -- so thank you for doing that. The HTTP semconv work is massively breaking for a large portion of our customers in several ways and represents an enormous amount of work for them (and us) to migrate. For some customers this is on the order of needing to update thousands of alerts, SLOs, and dashboards -- most people don't use automation like terraform to set these things up, and in several cases this can't even follow an automated process such as terraform due to things like a And so there's several bad things that can quite easily happen here:
We've done what we can to tell customers what's coming with the information we had, as I'm sure other vendors have, but the reality with enterprise software is that most people don't listen unless (a) something is horribly broken, or (b) they need to renew a contract. And while the OTel community has attempted to communicate things a bit, the reality is that it's been unactionable for end-users all year, there's only been two blog posts about this topic (each with only a few hundred views):
We have also struggled to understand the full impact and when it lands across all of OTel. The work was open, yes, but not terribly discoverable. It's been hard to also audit all language contrib repos and instrumentation libraries to understand what's affected and when we can reasonably expect to see breaking changes land. |
Thank you for sharing your perspective, which aligns well with my understanding of the issue. If users are dependent on specific telemetry attributes, such as the attribute names etc. in their dashboards and alerts, migration becomes an inevitable and necessary step. Although it's a painful and costly process, it's something we must accept. Semantic Conventions Working Group did offer a migration plan in place to smoothen the process. (Given conventions were experimental, they could have simply made breaking changes, but they still took the effort to offer migration plans, which is really appreciated.) Will do our best to help further. There is already an issue discussing some options, hopefully to further smoothen the process. Given the challenging nature of preparing a perfect specification or convention on the first attempt, it's inevitable that OpenTelemetry components undergo the cycle of alpha/beta/stable/deprecated. Users who take the risk of using components before they reach stable status, and provide feedback, are making significant contributions to the greater good. If nobody used the components until they were stable, we would never receive any feedback, nor would we be able to make the necessary improvements to reach stability. It's essential to find the right balance between introducing new capabilities and being mindful of not disrupting existing users. Happy to hear more thoughts on improving the current process in this repo (and others). |
I'd like to emphasize that we all share the commitment to understanding and prioritizing OpenTelemetry users' needs. It's a team effort, and each perspective is valuable. If you have any specific feedback or insights regarding this repository, please do share them.
As I mentioned earlier, if Microsoft (or any company) falsely promised that OTel semantic conventions are stable, when they were not, then I don't think this repo can do anything about it. I do see such warnings already in their docs. If you have other feedback on Azure/Microsoft docs, please report it at their channels.
I never said anything like that. The component's status is mentioned in https://github.com/open-telemetry/opentelemetry-dotnet/tree/main/src/OpenTelemetry.Instrumentation.AspNetCore#aspnet-core-instrumentation-for-opentelemetry-net. It says breaking changes are possible, and if users are willing to accept that, then nothing wrong with using in production! |
I'll also have to deal with frustration related to this breaking change, however, it's inevitable. For me, a main point is that after this change nothing will break (without a main version increase), so I'm actually looking forward to this happening. Also, I think focusing on what was demoed on .NET Conf is a bit of a red herring, as the main problem lies with users who are using this packages for a longer time and rely on monitoring and alerting workflows that were built around it. The one thing I'd suggest to make users aware that they'll get breaking changes is to change the main version number, and maybe release a 2.0 of those libraries. Although it's not strictly necessary according to versioning guidelines, it might help to warn users that something major is changing. I'm not sure though about all the implications, likely one would also need to increment the SDK version to keep things in sync. |
Yes! We should be following this guidance, if we ever make breaking changes post the stable release. |
@pyohannes On reading again.. were you referring to releasing the 1st stable version of instrumentations as 2.0, instead of the current plan of 1.6.0? |
As this is the first stable release, going with 2.0 isn't a hard requirement for me, I just wanted to put out a proposal to possibly mitigate issues brought up here. However, the logic of strict coupling versions of the SDK and of instrumentations seems a bit problematic, as the question will arise again once there are breaking changes in a stable instrumentation library that don't affect the SDK (hopefully not anytime soon) or vice versa. You'll be forced to either have an awkward main version increase without compatibility break, or you'll have decoupled versions. |
Issue
open-telemetry/opentelemetry-dotnet#5097 removes support for grpc instrumentation. This is done to unblock stable release of instrumentation library which will contain support for http server instrumentation. Semantic conventions for grpc is still in experimental state and hence the grpc instrumentation cannot be marked stable. https://github.com/open-telemetry/semantic-conventions/blob/main/docs/rpc/rpc-spans.md
Workaround
Going, forward the grpc instrumentation will be supported under an experimental feature flag. https://github.com/open-telemetry/opentelemetry-dotnet/blob/main/src/OpenTelemetry.Instrumentation.AspNetCore/CHANGELOG.md#:~:text=Re%2Dintroduced%20support,enabled%20by%20default.
The text was updated successfully, but these errors were encountered: