Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Latency histograms for get/insert/remove/clear of cache. #3018

Merged
merged 7 commits into from
Oct 29, 2024

Conversation

rescrv
Copy link
Contributor

@rescrv rescrv commented Oct 28, 2024

Description of changes

  • This wires up the cache to give us latency metrics for all operations.

Test plan

I manually verified that the traces are exported if I change the otel_endpoint to otel-collector:4317 instead of jaeger:4317

  • [ X] Tests pass locally with pytest for python, yarn test for js, cargo test for rust

@rescrv rescrv requested a review from codetheweb October 28, 2024 23:35
Copy link

Reviewer Checklist

Please leverage this checklist to ensure your code review is thorough before approving

Testing, Bugs, Errors, Logs, Documentation

  • Can you think of any use case in which the code does not behave as intended? Have they been tested?
  • Can you think of any inputs or external events that could break the code? Is user input validated and safe? Have they been tested?
  • If appropriate, are there adequate property based tests?
  • If appropriate, are there adequate unit tests?
  • Should any logging, debugging, tracing information be added or removed?
  • Are error messages user-friendly?
  • Have all documentation changes needed been made?
  • Have all non-obvious changes been commented?

System Compatibility

  • Are there any potential impacts on other parts of the system or backward compatibility?
  • Does this change intersect with any items on our roadmap, and if so, is there a plan for fitting them together?

Quality

  • Is this code of a unexpectedly high quality (Readability, Modularity, Intuitiveness)

@codetheweb
Copy link
Contributor

codetheweb commented Oct 28, 2024

I manually verified that the traces are exported if I change the otel_endpoint to otel-collector:4317 instead of jaeger:4317

I think metrics should automatically appear in Grafana with no config changes, let me know if that isn't the case.

edit: I think that actually might just be set up for Go services? but should be easy to update the env var for the Rust services too

Comment on lines 357 to +359
#[tracing::instrument(skip(self, key, value))]
async fn insert(&self, key: K, value: V) {
let _stopwatch = Stopwatch::new(&self.insert_latency);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious why this is measured separately instead of looking at the span duration?

@rescrv
Copy link
Contributor Author

rescrv commented Oct 29, 2024

It's not the case. You can set the endpoint to either jaeger or otlp-collector, but not both. Because two endpoints only matters for tilt, I'm fine not having these metrics show up locally. I confirmed that they do show up locally, though, if the address is right.

In production, we have honeycomb, which seems to have just one endpoint for both.

@rescrv rescrv requested a review from codetheweb October 29, 2024 00:04
@@ -96,4 +161,13 @@ pub(crate) fn init_otel_tracing(service_name: &String, otel_endpoint: &String) {

prev_hook(panic_info);
}));
let exporter = opentelemetry_otlp::new_exporter()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for my learning what is this new exporter and provider for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Metrics. They get installed as the exporter and provider for the global metrics. Do you see a way to reuse the other exporter? I'd like that, too.

@codetheweb
Copy link
Contributor

It's not the case. You can set the endpoint to either jaeger or otlp-collector, but not both. Because two endpoints only matters for tilt, I'm fine not having these metrics show up locally. I confirmed that they do show up locally, though, if the address is right.

hmm we should be able to just set CHROMA_OTEL_COLLECTION_ENDPOINT to otel-collector:4317 in values.yaml and have it forward to both Grafana and Jaeger
but nbd, out of scope

@rescrv rescrv merged commit a64633a into main Oct 29, 2024
72 checks passed
@rescrv rescrv deleted the rescrv/foyer-metrics branch October 29, 2024 21:11
codetheweb pushed a commit that referenced this pull request Nov 5, 2024
## Description of changes

* This wires up the cache to give us latency metrics for all operations.

## Test plan

I manually verified that the traces are exported if I change the
otel_endpoint to otel-collector:4317 instead of jaeger:4317

- [ X] Tests pass locally with `pytest` for python, `yarn test` for js,
`cargo test` for rust
HammadB added a commit that referenced this pull request Nov 12, 2024
## Description of changes

*Summarize the changes made by this PR.*
 - Improvements & Bug fixes
- Send telemetry to otel_collector not jaeger. Fixing the left over in
#3018 deemed out of scope by @codetheweb in
#3018 (comment)
 - New functionality
	 - None

## Test plan
*How are these changes tested?*
- [x] Tests pass locally with `pytest` for python, `yarn test` for js,
`cargo test` for rust

## Documentation Changes
None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants