Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

otel: instrument observability (metrics + traces) on the database #2285

Closed
deniseli opened this issue Aug 7, 2024 · 4 comments · Fixed by #2305 or #2373
Closed

otel: instrument observability (metrics + traces) on the database #2285

deniseli opened this issue Aug 7, 2024 · 4 comments · Fixed by #2305 or #2373
Assignees

Comments

@deniseli
Copy link
Contributor

deniseli commented Aug 7, 2024

From #2194 (comment):

Immediate Needs

For metrics in DD, the well-lit path appears to be DD's official Postgres integration. That comes with not only all the core metrics we want, but also a nice dashboard. For immediate use cases, this is definitely the best solution. Next step on that is talking to @bradleydwyer

If we cannot do that^, then we'll need to instrument custom metrics right now, which is possible but not ideal, both because of the implementation time and also because of the added complexity to the sql code (much of which is generated by sqlc, so we can't just do whatever we want to make the interface nice/intuitive).

Long Term

That said, FTL is not tied to DD, so in the long run, we’ll likely want some integration for this outside of the DD world as well. The first reasonable avenue for this was otel's zero code instrumentation for Go. Unfortunately, that integration only supports traces right now, not metrics. The whole project is marked as a work in progress, but there doesn't appear to be any immediate plans to add metric support.

The urgency for supporting non-DD database metrics is fairly low. Our current customer team and next customer team are both internal, so they should both use DD. Most likely, the next one after that will be as well. We will want to lift this instrumentation out of the DD-only world before significant amounts of open source development, but perhaps otel will also provide better built-ins for instrumenting database metrics by then. This seems like something we can afford to wait for.

Parent doc link for zero-code: https://opentelemetry.io/docs/concepts/instrumentation/zero-code/

@github-actions github-actions bot added the triage Issue needs triaging label Aug 7, 2024
@deniseli deniseli mentioned this issue Aug 7, 2024
15 tasks
@deniseli deniseli self-assigned this Aug 7, 2024
@github-actions github-actions bot removed the triage Issue needs triaging label Aug 7, 2024
@ftl-robot ftl-robot mentioned this issue Aug 7, 2024
@deniseli deniseli changed the title Instrument observability (metrics + traces) on the database otel: instrument observability (metrics + traces) on the database Aug 7, 2024
@alecthomas
Copy link
Collaborator

A simpler path is probably to switch to using the standard library's sql API and use OTEL's instrumentation for that.

We're currently using sqlc's pgx5 driver because it supports bulk inserts (IIRC?), but we can likely work around that.

@deniseli
Copy link
Contributor Author

deniseli commented Aug 7, 2024

Niiiiice that looks fantastic! From some brief grepping, I couldn't find any INSERTs in the current code that would cause issues, and it looks like there are plenty of ways to COPY without relying on sqlc's :copyfrom (which we don't even use now).

It seems like the biggest thing we'd be losing is the codegen. What do you think would be the best way to move forward with the existing queries.sql files, if we went down that route? Start manually modifying the files that were generated by sqlc, and delete the queries.sql files they were generated from?

@deniseli
Copy link
Contributor Author

deniseli commented Aug 7, 2024

Notes from standup: we don't need to remove sqlc, just switch out the driver. :)

github-merge-queue bot pushed a commit that referenced this issue Aug 13, 2024
Fixes #2285

Register the standard otel database metrics using:
https://github.com/XSAM/otelsql
Found call sites by grepping for `sql.Open` and skipping all test code.

Sample:
```
ScopeMetrics #2
ScopeMetrics SchemaURL: 
InstrumentationScope github.com/XSAM/otelsql 0.32.0

Metric #0
Descriptor:
     -> Name: db.sql.latency
     -> Description: The latency of calls in milliseconds
     -> Unit: ms
     -> DataType: Histogram
     -> AggregationTemporality: Cumulative

HistogramDataPoints #0
Data point attributes:
     -> method: Str(sql.conn.begin_tx)
     -> status: Str(ok)
StartTimestamp: 2024-08-08 23:18:01.011762 +0000 UTC
Timestamp: 2024-08-08 23:18:11.012177 +0000 UTC
Count: 23
Sum: 17.082795
Min: 0.137584
Max: 3.233917
```

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
@deniseli deniseli reopened this Aug 14, 2024
@deniseli
Copy link
Contributor Author

reopening until we can unrevert the revert

github-merge-queue bot pushed a commit that referenced this issue Aug 20, 2024
Fixes #2285

Unreverts #2305 with these
changes:

* Reverts
[`copyAny`](https://github.com/TBD54566975/ftl/blob/main/internal/reflect/reflect.go#L149)
changes except for the `list.List` handling.
* The original PR caused a panic when deploying to prod: `reflect:
internal error: must be a Pointer or Ptr; got unsafe.Pointer`
* #2376 removed the DB handles
from the module context Database struct, so we no longer need to handle
all these types in `DeepCopy`.
* Updates all `reflect.Ptr` refs to `reflect.Pointer`. Not a big deal,
but `Ptr` is the [old
name](https://cs.opensource.google/go/go/+/refs/tags/go1.23.0:src/reflect/type.go;l=303),
and since we're using `reflect.Pointer` elsewhere in this file, it's
better to stay consistent. Accordingly, this also fixes the error
message that we saw in the prod deployment, which previously implied
`Pointer` and `Ptr` were two _separate_ types.

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants