-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not merge more metrics #459
Closed
gregschohn
wants to merge
12
commits into
opensearch-project:main
from
gregschohn:DoNotMerge_MoreMetrics
Closed
Do not merge more metrics #459
gregschohn
wants to merge
12
commits into
opensearch-project:main
from
gregschohn:DoNotMerge_MoreMetrics
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…thod for others to leverage. Signed-off-by: Greg Schohn <[email protected]>
Most of this is just playing, but making the StreamManager implement AutoCloseable gives a place to end spans to show how long a serializer/connection factory was relevant for. Signed-off-by: Greg Schohn <[email protected]>
…to the collector to prometheus, zipkin, etc Signed-off-by: Greg Schohn <[email protected]>
Signed-off-by: Greg Schohn <[email protected]>
…e config hierarchy. This was broken from the merge https://github.com/opensearch-project/opensearch-migrations/pull/376/files#diff-430f89dc33402ecf692b9a8372f66e585bb2f9215596433216580efc2a56795c. Signed-off-by: Greg Schohn <[email protected]>
…lotted within the same graph in prometheus. Signed-off-by: Greg Schohn <[email protected]>
…ics into an optional. Dropping the optionals makes the code simpler and if we don't want to do logging, we can just not fill in the configuration for the SDK. Signed-off-by: Greg Schohn <[email protected]>
…troduce some more typesafe wrappers for contexts. Lots more to come. Signed-off-by: Greg Schohn <[email protected]>
…explicitly passing strongly typed context objects. Signed-off-by: Greg Schohn <[email protected]>
Signed-off-by: Greg Schohn <[email protected]>
ed6abcd
to
3746a8e
Compare
Make sure that the context is using the right requestKey, which also will have the appropriate indices as per the test context. Signed-off-by: Greg Schohn <[email protected]>
See #460 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This change will break the flow of messages emitted by the existing MetricsLogger class into OpenSearch. The biggest reason for that is that the change is that I'm now using the otel/opentelemetry-collector:latest image rather than a bespoke one.
Aside from the breakage, there's a lot being added in the form of Metrics and Traces. We're using OpenTelemetry to send both, currently to a collector (just as we were doing with logs). The collector sends data to two new trace collection containers (Jaeger & Zipkin) and has a Prometheus container pulling metrics from it.
The Java application code itself eschews some of the typical OpenTelemetry techniques for instrumentation. Instead of using ThreadLocals to pass maybe-present values around within contexts, which each instrumentation point needs to determine how to use them, custom context classes for Connections, Requests, KafkaRecords, etc are constructed and explicitly passed into functions and into callbacks. Those classes implement IWithAttributes and the fillAttributes() function to select which fields should be included within the instruments that are being emitted.
The contexts themselves are tightly related to Spans. Usually a new context will have a new span, a new span will always require a new context. The context classes themselves have the ability to chain back to a parent scope. When the context is converted into an Attributes object for the instruments, the attributes from parent contexts will also be included (with key-values from subclasses overwriting their parent's values in cases of conflict).
There are some judicious uses of generic wildcard constraints to make it quicker and more foolproof to create spans so that they're appropriately associated as children with the containing context's span. There's also support to make it easier to store start timestamps to emit duration metrics.
There's a LOT left to do here, but there's a lot that's done and I'd like to get feedback on the patterns that are emerging. Some of the top remaining items.
More metrics, more traces
Getting separate namespaces for capture and replayer working for metrics
Figuring out our cloud story. Should we deploy some more ECS containers for the AWS CDK or should we use AWS cloud native stuff, AWS hosted stuff?
Tests, Tests, Tests - Otel has some easy to use facilities to simplify checking within tests.
Hardening the interfaces more. There are lots of inline strings. These should be managed from more centralized places and we should have tests to do double-entry book-keeping so that they don't change without warnings.
Moving the existing MetricsLogger code out.
Category - Enhancement
Why these changes are required? Visibility into what our services are doing.
Issues Resolved
Part of Improve Metrics Explanations
Testing
Lots of manual testing for now
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.