-
Notifications
You must be signed in to change notification settings - Fork 893
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Database metric semantic conventions #1076
Database metric semantic conventions #1076
Conversation
…mantic conventions
Please add an entry in CHANGELOG.md |
| `db.connections.new` | Counter | {connections} | The number of new connections created. | | ||
| `db.connections.taken` | Counter | {connections} | The number of connections taken from the connection pool. | | ||
| `db.connections.returned` | Counter | {connections} | The number of connections returned to the connection pool. | | ||
| `db.connections.reused` | Counter | {connections} | The number of connections reused. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the criteria for knowing it's an existing connection that's being re-used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good point. I had used this PR as reference, and had assumed that an instrumentation author would know best how to collect these numbers, but now I'm thinking we need definitions of exactly what value needs to be recorded here, or we run the risk of misunderstanding and inconsistency across instrumentation sources.
I'll give this some thought about how to describe this.
|---------------------------|------------|---------------|-------------| | ||
| `db.connections.new` | Counter | {connections} | The number of new connections created. | | ||
| `db.connections.taken` | Counter | {connections} | The number of connections taken from the connection pool. | | ||
| `db.connections.returned` | Counter | {connections} | The number of connections returned to the connection pool. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it anticipated that db.connection_pool.usage
is equal to db.connections.taken
minus db.connections.returned
?
If so, what is the benefit in having the additional instruments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I expect db.connection_pool.usage
= db.connections.taken
- db .connections.returned
. But the breakdown metrics can give much more detail than the use of the connection pool over time.
The usage/limit metrics use a LastValue aggregator by default, meaning you get a periodic snapshot of the number of connections currently in use.
The more detailed metrics allow you to see things like connection churn. With them, a user could differentiate between an application making millions of tiny database operations (and ending a harvest period with 4/5 connections in use) and an application with four very long and slow queries in progress (also ending the harvest period with 4 connections in use).
The connection_pool usage instruments are asynchronous -- they're intended to be polled on a set interval. The more detailed metrics are recording all the actions related to the connection pool in real time.
...but
Having said all of that, I don't have specific use cases to back up the inclusion of these extra instruments.
But, they're already being recorded by go-redis, and I can see the same sort of instruments being possible in a number of other frameworks. And if there's a desire to record them, I think we should spec them so we get consistent names and labels across implementations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried a few different ways of explaining this. I ultimately ended up with the pretty simple approach in ec58de4.
I'm certainly open to suggestions about how to clarify what these instruments are all about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Thanks for the examples, this is great.)
|------------------|--------|--------------|----------|----------| | ||
| `db.name` | string | If no [tech-specific label](#call-level-labels-for-specific-technologies) is defined, this attribute is used to report the name of the database being accessed. For commands that switch the database, this should be set to the target database (even if the command fails). [1] | `customers`<br>`main` | Required if applicable. | | ||
| `db.operation` | string | The name of the operation being executed, e.g. the [MongoDB command name](https://docs.mongodb.com/manual/reference/command/#database-operations) such as `findAndModify`. [4][5] | `findAndModify`<br>`HMSET`<br>`SELECT`<br>`CONNECT` | Required if applicable. | | ||
| `db.table` | string | The name of the primary table, collection, segment, etc... that the operation is acting upon. | `user_table` | Required if applicable. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be in favor of introducing this in the (more elaborate) tracing spec first, as you proposed on #1104, and then just refer to that definition from here to avoid duplication and potentially going out of sync. Once both kinds of semconvs are auto-generated, it should be fine again, however.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💯 I've created that PR now (here: #1141). After it has some reviews, I'll update this PR like you suggest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR #1141 has all the approvals it needs, so I've updated this one to match it.
Just that little bit of manual work to keep these two PRs has reminded me that we really need the semantic convention table generator to work for metrics.
Co-authored-by: Armin Ruech <[email protected]>
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
This is not stale. |
at the metrics sig mtg today talked about this issue's staleness |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔥 Would be good to add a link to database.md
from README.md
too
| `db.system` | An identifier for the database management system (DBMS) product being used. [1] | `other_sql` | Yes | | ||
| `db.connection_string` | The connection string used to connect to the database. It is recommended to remove embedded credentials. | `Server=(localdb)\v11.0;Integrated Security=true;` | No | | ||
| `db.user` | Username for accessing the database. | `readonly_user`<br>`reporting_user` | No | | ||
| `net.transport` | Transport protocol used. See note below. See [general network connection attributes](../../trace/semantic_conventions/span-general.md#general-network-connection-attributes). | `IP.TCP`<br>`Unix` | Conditional [2] | | ||
| `net.peer.ip` | Remote address of the peer (dotted decimal for IPv4 or [RFC5952](https://tools.ietf.org/html/rfc5952) for IPv6) | `127.0.0.1` | No | | ||
| `net.peer.port` | Remote port number. | `80`<br>`8080`<br>`443` | No | | ||
| `net.peer.name` | Remote hostname or similar, see note below. | `example.com` | No | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit these are in a different order than https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/trace/semantic_conventions/database.md#connection-level-attributes (maybe the tracing spec was updated since)
|----------------------|---------------|--------------|-------------| | ||
| `db.client.duration` | ValueRecorder | milliseconds | The duration of the database operation. | | ||
|
||
Database operations SHOULD include execution of queries, including DDL, DML, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Database operations SHOULD include execution of queries, including DDL, DML, | |
Measured database operations SHOULD include execution of queries, including DDL, DML, |
**[5]:** To reduce cardinality, the value for `db.operation` should have parameters | ||
removed or substituted. The resulting value should be a low-cardinality value | ||
represeting the statement or operation being executed on the database. It may be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like this is supposed to pertain to a db.statement
parameter, which is missing in the table.
**[5]:** To reduce cardinality, the value for `db.operation` should have parameters | |
removed or substituted. The resulting value should be a low-cardinality value | |
represeting the statement or operation being executed on the database. It may be | |
**[5]:** To reduce cardinality, the value for `db.operation` should leave in placeholders | |
and not include any actual parameter values. The resulting value should be low-cardinality | |
and represent the statement or operation being executed on the database. It may be |
Might also be worth adding "it is not recommended to attempt any client-side parsing of db.statement
to remove parameters", although users shouldn't be adding formatting them in anyway. The trace conventions say "the value may be sanitized to exclude sensitive information," do you think that is mostly just about not substitution values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this. I've updated it to be more like these words. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The db.operation
label wouldn't have any placeholders I thought, since its just SELECT
, UPDATE
, etc. I was more thinking that advice would apply to db.statement
, unless it was purposefully left out for cardinality reasons?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without db.statement
, is there enough granularity? In the MySQL example, all SELECT
queries to orders
would be aggregated into that same metric
{
"name": "db.client.duration",
"labels": {
"db.operation": "SELECT",
"db.name": "ShopDb",
"db.system": "mysql",
"db.connection_string": "Server=shopdb.example.com;Database=ShopDb;Uid=billing_user;TableCache=true;UseCompression=True;MinimumPoolSize=10;MaximumPoolSize=50;",
"db.user": "billing_user",
"db.sql.table": "orders",
"net.peer.name": "shopdb.example.com",
"net.peer.ip": "192.0.2.12",
"net.peer.port": "3306",
"net.transport": "IP.TCP"
}
}
| `db.connection_pool.usage` | ValueObserver | {connections} | The number of database connections _in use_. | | ||
|
||
If the following detailed information is available, the following metric | ||
instruments MAY be collected. They SHOULD have all [common](#common) labels |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instruments MAY be collected. They SHOULD have all [common](#common) labels | |
instruments MAY be used. They SHOULD have all [common](#common) labels |
|
||
| Name | Instrument | Units | Description | | ||
|---------------------------|------------|---------------|-------------| | ||
| `db.connections.taken` | Counter | {connections} | Incremented when a connection is requested and returned from the pool. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These remind me of the MemStats variables, which are difficult to reason about. This makes me think they should be database-specific, e.g., db.redis.connections.taken
, maybe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I think I'm on board. As I said before, I don't have strong use cases for these metrics, aside from the fact that go-redis is already collecting them.
My only hesitation is: if other database libraries decide to pick up these metrics, and they become somewhat common, I wonder if it diminishes their value to have them namespaced separately.
As a user, If I have redis and postgres connections, will I be frustrated that they report separately as db.redis.connections.taken
and db.postgres.connections.taken
?
Now that I've asked that, though, it doesn't seem very useful to aggregate connection pool information across multiple databases, so maybe it's fine to separate them.
The other option (that I'd be definitely in favor of) is to just drop these metrics entirely, so we'd only specify db.connection_pool.limit
and db.connection_pool.usage
.
Which of those would you prefer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the metrics SIG today: we're going to give guidance to use db.{db.system}.*
for technology-specific instrument names.
And I'll include some examples using the names from go-redis.
Reviewers, please resolve comments that are addressed. |
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
Closed as inactive. Feel free to reopen if this PR is still being worked on. |
Fixes open-telemetry/semantic-conventions#707
This PR attempts to specify semanatic conventions for database metric instruments.
There are a few spicy parts that I'd love feedback about:
1.
db.operation
anddb.table
We'd like to correlate the golden signal metrics to the spans that represent the same
operation. The easiest way to do that may have been to include the
db.statement
as alabel, but this has the potential to drastically increase cardinality.
The alternative that I see is to include
db.operation
anddb.table
as labels.This has clearly been discussed already, and the trace semantic convention includes
this line:
But in my experience, these values are frequently available from an ORM library without
requiring SQL string parsing, and if this instrumentation may be eventually written into
those ORM libraries, I'd like to encourage them to provide this information on the
metric instruments.
2. Connection pool metrics
There is some prior art in go-redis of some very detailed metrics describing database
connection pool behavior. While this is pretty interesting, I think these values will not
be widely available to all database client libraries. So I've recommended that we record
a simpler
db.connection_pool.limit
anddb.connection_pool.usage
.I've also included a MAY to allow instrumentation to collect more detailed information if
it is available. I'm thinking that if these concepts will be used by more than just
go-redis, it would be ideal if they looked the same.
Changes
This PR adds a new semantic conventions at specification/metrics/semantic_conventions/database.md