Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] OTel tracing throughout the codebase #1238

Merged
merged 21 commits into from
Oct 18, 2023
Merged

[ENH] OTel tracing throughout the codebase #1238

merged 21 commits into from
Oct 18, 2023

Conversation

beggers
Copy link
Contributor

@beggers beggers commented Oct 12, 2023

Description of changes

This PR adds OpenTelemetry tracing to ~all major methods throughout our codebase. It also adds configuration to specify where these traces should be sent. I focused more on laying the groundwork for tracing than on collecting all the data we need everywhere.

Default behavior is unchanged: no tracing, no printing.

Summarize the changes made by this PR.

  • New functionality
    • OpenTelemetryClient with relevant config.
    • Wrap most of our code in tracing.

The only major design decision I made was to fully separate OpenTelemetry stuff and Posthog (product telemetry) stuff.

Justification:

It's tempting to combine OTel and product telemetry behind a single internal interface. I don't think this coupling is worth it. Product telemetry cares about a small and relatively static set of uses, whereas tracing by nature should be very deep in our codebase. I see two ways to couple them and problems with each:

  • Have a unified Telemetry interface only for the events our product telemetry cares about and use raw OTel for the rest. In other words, use this telemetry interface only for collection.add()s, collection.delete()s, etc. This seems weird to me: tracing code would be implicit in some cases but explicit in others, making the codebase less easily comprehensible. Also if an engineer later decides to add product telemetry to a codepath that already has tracing, they need to know to remove existing tracing. This increases the cognitive overhead required to work on Chroma, reducing the readability and maintainability of our codebase.
  • Have a unified Telemetry interface which does everything. In this case, it has the above behavior but also wraps all other OTel behavior we want. This seems weird to me because we're basically writing a wrapper around the complete set of OpenTelemetry behavior which only modifies a small part of it. This increases our maintenance burden for very little value.

Instead we have two well-encapsulated telemetry modules which we can modify and use without worrying about the other telemetry. The OTel module provides some lightweight helpers to make OTel a little easier to use, but we can put raw OTel code throughout our codebase and it'll play nicely.

Test plan

How are these changes tested?

  • Tests pass locally with pytest for python, yarn test for js

Manual testing:

  • Set environment variables to export traces to Honeycomb at
    various granularities.
  • Went through various basic Chroma flows and checked that traces show up in Honeycomb as expected.

Screenshot 2023-10-12 at 10 39 11 AM

Documentation Changes

Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the docs repository?

Docs PR to land before this does.

@github-actions
Copy link

Reviewer Checklist

Please leverage this checklist to ensure your code review is thorough before approving

Testing, Bugs, Errors, Logs, Documentation

  • Can you think of any use case in which the code does not behave as intended? Have they been tested?
  • Can you think of any inputs or external events that could break the code? Is user input validated and safe? Have they been tested?
  • If appropriate, are there adequate property based tests?
  • If appropriate, are there adequate unit tests?
  • Should any logging, debugging, tracing information be added or removed?
  • Are error messages user-friendly?
  • Have all documentation changes needed been made?
  • Have all non-obvious changes been commented?

System Compatibility

  • Are there any potential impacts on other parts of the system or backward compatibility?
  • Does this change intersect with any items on our roadmap, and if so, is there a plan for fitting them together?

Quality

  • Is this code of a unexpectedly high quality (Readbility, Modularity, Intuitiveness)

@beggers beggers added the enhancement New feature or request label Oct 13, 2023
@beggers beggers changed the title OTel tracing throughout the codebase [ENH] OTel tracing throughout the codebase Oct 13, 2023
return decorator


def add_attributes_to_current_span(attributes: Dict[str, Any]) -> None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

discussed offline - make this strongly typed and put onus of conversion on caller, error internally if not expected types

otel_collection_headers: Optional[Dict[str, str]],
otel_granularity: OpenTelemetryGranularity,
) -> None:
"""Initializes module-level state for OpenTelemetry."""
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Document params

@beggers beggers merged commit 99c8a99 into main Oct 18, 2023
85 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants