Database metric semantic conventions #1076

justinfoote · 2020-10-08T21:50:15Z

Fixes open-telemetry/semantic-conventions#707

This PR attempts to specify semanatic conventions for database metric instruments.

There are a few spicy parts that I'd love feedback about:

1. `db.operation` and `db.table`

We'd like to correlate the golden signal metrics to the spans that represent the same
operation. The easiest way to do that may have been to include the db.statement as a
label, but this has the potential to drastically increase cardinality.
The alternative that I see is to include db.operation and db.table as labels.

This has clearly been discussed already, and the trace semantic convention includes
this line:

While it would semantically make sense to set this, e.g., to a SQL keyword like SELECT or INSERT, it is not recommended to attempt any client-side parsing of db.statement just to get this property (the back end can do that if required).

But in my experience, these values are frequently available from an ORM library without
requiring SQL string parsing, and if this instrumentation may be eventually written into
those ORM libraries, I'd like to encourage them to provide this information on the
metric instruments.

2. Connection pool metrics

There is some prior art in go-redis of some very detailed metrics describing database
connection pool behavior. While this is pretty interesting, I think these values will not
be widely available to all database client libraries. So I've recommended that we record
a simpler db.connection_pool.limit and db.connection_pool.usage.

I've also included a MAY to allow instrumentation to collect more detailed information if
it is available. I'm thinking that if these concepts will be used by more than just
go-redis, it would be ideal if they looked the same.

Changes

This PR adds a new semantic conventions at specification/metrics/semantic_conventions/database.md

…mantic conventions

…nventions

specification/metrics/semantic_conventions/database.md

arminru · 2020-10-09T11:48:53Z

Please add an entry in CHANGELOG.md

specification/metrics/semantic_conventions/database.md

kenfinnigan · 2020-10-09T13:15:09Z

specification/metrics/semantic_conventions/database.md

+| `db.connections.new`      | Counter   | {connections} | The number of new connections created. |
+| `db.connections.taken`    | Counter   | {connections} | The number of connections taken from the connection pool. |
+| `db.connections.returned` | Counter   | {connections} | The number of connections returned to the connection pool. |
+| `db.connections.reused`   | Counter   | {connections} | The number of connections reused. |


What are the criteria for knowing it's an existing connection that's being re-used?

This is a good point. I had used this PR as reference, and had assumed that an instrumentation author would know best how to collect these numbers, but now I'm thinking we need definitions of exactly what value needs to be recorded here, or we run the risk of misunderstanding and inconsistency across instrumentation sources.

I'll give this some thought about how to describe this.

kenfinnigan · 2020-10-09T13:16:28Z

specification/metrics/semantic_conventions/database.md

+|---------------------------|------------|---------------|-------------|
+| `db.connections.new`      | Counter   | {connections} | The number of new connections created. |
+| `db.connections.taken`    | Counter   | {connections} | The number of connections taken from the connection pool. |
+| `db.connections.returned` | Counter   | {connections} | The number of connections returned to the connection pool. |


Is it anticipated that db.connection_pool.usage is equal to db.connections.taken minus db.connections.returned?

If so, what is the benefit in having the additional instruments?

Yes, I expect db.connection_pool.usage = db.connections.taken - db .connections.returned. But the breakdown metrics can give much more detail than the use of the connection pool over time.
The usage/limit metrics use a LastValue aggregator by default, meaning you get a periodic snapshot of the number of connections currently in use.
The more detailed metrics allow you to see things like connection churn. With them, a user could differentiate between an application making millions of tiny database operations (and ending a harvest period with 4/5 connections in use) and an application with four very long and slow queries in progress (also ending the harvest period with 4 connections in use).

The connection_pool usage instruments are asynchronous -- they're intended to be polled on a set interval. The more detailed metrics are recording all the actions related to the connection pool in real time.

...but
Having said all of that, I don't have specific use cases to back up the inclusion of these extra instruments.
But, they're already being recorded by go-redis, and I can see the same sort of instruments being possible in a number of other frameworks. And if there's a desire to record them, I think we should spec them so we get consistent names and labels across implementations.

I tried a few different ways of explaining this. I ultimately ended up with the pretty simple approach in ec58de4.
I'm certainly open to suggestions about how to clarify what these instruments are all about.

Co-authored-by: Armin Ruech <[email protected]>

jmacd

(Thanks for the examples, this is great.)

CHANGELOG.md

specification/metrics/semantic_conventions/database.md

arminru · 2020-10-20T11:40:54Z

specification/metrics/semantic_conventions/database.md

+|------------------|--------|--------------|----------|----------|
+| `db.name`        | string | If no [tech-specific label](#call-level-labels-for-specific-technologies) is defined, this attribute is used to report the name of the database being accessed. For commands that switch the database, this should be set to the target database (even if the command fails). [1] | `customers`<br>`main` | Required if applicable. |
+| `db.operation`   | string | The name of the operation being executed, e.g. the [MongoDB command name](https://docs.mongodb.com/manual/reference/command/#database-operations) such as `findAndModify`. [4][5] | `findAndModify`<br>`HMSET`<br>`SELECT`<br>`CONNECT` | Required if applicable. |
+| `db.table`       | string | The name of the primary table, collection, segment, etc... that the operation is acting upon. | `user_table` | Required if applicable. |


I'd be in favor of introducing this in the (more elaborate) tracing spec first, as you proposed on #1104, and then just refer to that definition from here to avoid duplication and potentially going out of sync. Once both kinds of semconvs are auto-generated, it should be fine again, however.

💯 I've created that PR now (here: #1141). After it has some reviews, I'll update this PR like you suggest.

PR #1141 has all the approvals it needs, so I've updated this one to match it.
Just that little bit of manual work to keep these two PRs has reminded me that we really need the semantic convention table generator to work for metrics.

specification/metrics/semantic_conventions/database.md

Co-authored-by: Armin Ruech <[email protected]>

github-actions · 2020-10-31T03:16:21Z

This PR was marked stale due to lack of activity. It will be closed in 7 days.

jmacd · 2020-11-06T00:19:52Z

This is not stale.

andrewhsu · 2020-11-06T00:20:56Z

at the metrics sig mtg today talked about this issue's staleness

aabmass

🔥 Would be good to add a link to database.md from README.md too

aabmass · 2020-11-05T23:42:23Z

specification/metrics/semantic_conventions/database.md

+| `db.system`            | An identifier for the database management system (DBMS) product being used. [1] | `other_sql` | Yes |
+| `db.connection_string` | The connection string used to connect to the database. It is recommended to remove embedded credentials. | `Server=(localdb)\v11.0;Integrated Security=true;` | No |
+| `db.user`              | Username for accessing the database. | `readonly_user`<br>`reporting_user` | No |
+| `net.transport`        | Transport protocol used. See note below. See [general network connection attributes](../../trace/semantic_conventions/span-general.md#general-network-connection-attributes). | `IP.TCP`<br>`Unix` | Conditional [2] |
+| `net.peer.ip`          | Remote address of the peer (dotted decimal for IPv4 or [RFC5952](https://tools.ietf.org/html/rfc5952) for IPv6) | `127.0.0.1` | No |
+| `net.peer.port`        | Remote port number. | `80`<br>`8080`<br>`443` | No |
+| `net.peer.name`        | Remote hostname or similar, see note below. | `example.com` | No |


nit these are in a different order than https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/trace/semantic_conventions/database.md#connection-level-attributes (maybe the tracing spec was updated since)

aabmass · 2020-11-05T23:47:53Z

specification/metrics/semantic_conventions/database.md

+|----------------------|---------------|--------------|-------------|
+| `db.client.duration` | ValueRecorder | milliseconds | The duration of the database operation. |
+
+Database operations SHOULD include execution of queries, including DDL, DML,


Suggested change

Database operations SHOULD include execution of queries, including DDL, DML,

Measured database operations SHOULD include execution of queries, including DDL, DML,

aabmass · 2020-11-06T00:11:17Z

specification/metrics/semantic_conventions/database.md

+**[5]:** To reduce cardinality, the value for `db.operation` should have parameters
+removed or substituted. The resulting value should be a low-cardinality value
+represeting the statement or operation being executed on the database. It may be


Seems like this is supposed to pertain to a db.statement parameter, which is missing in the table.

Suggested change

**[5]:** To reduce cardinality, the value for `db.operation` should have parameters

removed or substituted. The resulting value should be a low-cardinality value

represeting the statement or operation being executed on the database. It may be

**[5]:** To reduce cardinality, the value for `db.operation` should leave in placeholders

and not include any actual parameter values. The resulting value should be low-cardinality

and represent the statement or operation being executed on the database. It may be

Might also be worth adding "it is not recommended to attempt any client-side parsing of db.statement to remove parameters", although users shouldn't be adding formatting them in anyway. The trace conventions say "the value may be sanitized to exclude sensitive information," do you think that is mostly just about not substitution values?

I like this. I've updated it to be more like these words. What do you think?

The db.operation label wouldn't have any placeholders I thought, since its just SELECT, UPDATE, etc. I was more thinking that advice would apply to db.statement, unless it was purposefully left out for cardinality reasons?

Without db.statement, is there enough granularity? In the MySQL example, all SELECT queries to orders would be aggregated into that same metric

{ "name": "db.client.duration", "labels": { "db.operation": "SELECT", "db.name": "ShopDb", "db.system": "mysql", "db.connection_string": "Server=shopdb.example.com;Database=ShopDb;Uid=billing_user;TableCache=true;UseCompression=True;MinimumPoolSize=10;MaximumPoolSize=50;", "db.user": "billing_user", "db.sql.table": "orders", "net.peer.name": "shopdb.example.com", "net.peer.ip": "192.0.2.12", "net.peer.port": "3306", "net.transport": "IP.TCP" } }

specification/metrics/semantic_conventions/database.md

aabmass · 2020-11-06T00:34:08Z

specification/metrics/semantic_conventions/database.md

+| `db.connection_pool.usage` | ValueObserver | {connections} | The number of database connections _in use_. |
+
+If the following detailed information is available, the following metric
+instruments MAY be collected. They SHOULD have all [common](#common) labels


Suggested change

instruments MAY be collected. They SHOULD have all [common](#common) labels

instruments MAY be used. They SHOULD have all [common](#common) labels

jmacd · 2020-11-10T08:28:58Z

specification/metrics/semantic_conventions/database.md

+
+| Name                      | Instrument | Units         | Description |
+|---------------------------|------------|---------------|-------------|
+| `db.connections.taken`    | Counter    | {connections} | Incremented when a connection is requested and returned from the pool.  |


These remind me of the MemStats variables, which are difficult to reason about. This makes me think they should be database-specific, e.g., db.redis.connections.taken, maybe?

OK, I think I'm on board. As I said before, I don't have strong use cases for these metrics, aside from the fact that go-redis is already collecting them.

My only hesitation is: if other database libraries decide to pick up these metrics, and they become somewhat common, I wonder if it diminishes their value to have them namespaced separately.
As a user, If I have redis and postgres connections, will I be frustrated that they report separately as db.redis.connections.taken and db.postgres.connections.taken?
Now that I've asked that, though, it doesn't seem very useful to aggregate connection pool information across multiple databases, so maybe it's fine to separate them.

The other option (that I'd be definitely in favor of) is to just drop these metrics entirely, so we'd only specify db.connection_pool.limit and db.connection_pool.usage.

Which of those would you prefer?

From the metrics SIG today: we're going to give guidance to use db.{db.system}.* for technology-specific instrument names.

And I'll include some examples using the names from go-redis.

tigrannajaryan · 2020-11-26T16:30:15Z

Reviewers, please resolve comments that are addressed.

github-actions · 2020-12-07T03:27:36Z

This PR was marked stale due to lack of activity. It will be closed in 7 days.

github-actions · 2020-12-14T03:29:35Z

Closed as inactive. Feel free to reopen if this PR is still being worked on.

justinfoote added 10 commits October 8, 2020 14:39

Add database metric semantic conventions

9196ab8

Add detailed connection pooling semantic conventions for metrics

330e728

Add exception.type to database metrics semantic conventions

4fbd26f

Clean up grammar in database semantic conventions

2ae2bbb

Add net.peer labels to database metrics semantic conventions

1c2b9e9

Link to trace semantic conventions list of known db systems

c220dfd

Replace database statement with operation and table name in metric se…

9924cff

…mantic conventions

Replace Attribute name with Label name in database metric semantic co…

6b66fd3

…nventions

Reorder connection pooling metric instruments in semantic conventions

10a3ef4

Add duration database metric examples

e9b0bdb

justinfoote requested review from a team October 8, 2020 21:50

github-actions bot assigned tigrannajaryan Oct 8, 2020

Fix markdown-lint errors

506fe6e

arminru reviewed Oct 9, 2020

View reviewed changes

specification/metrics/semantic_conventions/database.md Show resolved Hide resolved

specification/metrics/semantic_conventions/database.md Outdated Show resolved Hide resolved

kenfinnigan reviewed Oct 9, 2020

View reviewed changes

justinfoote and others added 2 commits October 9, 2020 11:32

Add markdown TOC to database metric semantic conventions

b16d67d

Co-authored-by: Armin Ruech <[email protected]>

Update database semantic convention; respond to PR feedback

deea045

justinfoote mentioned this pull request Oct 9, 2020

Update Metrics Semantic Conventions README #1084

Open

justinfoote added 2 commits October 9, 2020 14:23

Update changelog

4d96aab

Fix markdown-lint errors in database metric semantic conventions

dc92381

jmacd approved these changes Oct 15, 2020

View reviewed changes

jamal mentioned this pull request Oct 17, 2020

Database table / collection semantic conventions #1104

Closed

arminru reviewed Oct 20, 2020

View reviewed changes

justinfoote and others added 2 commits October 23, 2020 11:49

Update CHANGELOG.md

88c6e1c

Co-authored-by: Armin Ruech <[email protected]>

Actually add the TOC to database metric semantic conventions

7bdc85d

github-actions bot added the Stale label Oct 31, 2020

justinfoote added 2 commits November 2, 2020 11:37

Add db.sql.table to db metric semantic conventions

cf83046

Add more explanation of db connection pool metrics

ec58de4

justinfoote mentioned this pull request Nov 3, 2020

SMI Metrics and OpenTelemetry servicemeshinterface/smi-spec#199

Closed

MrAlias approved these changes Nov 5, 2020

View reviewed changes

aabmass reviewed Nov 6, 2020

View reviewed changes

justinfoote mentioned this pull request Nov 6, 2020

Database semantic conventions: Add db.sql.table, allow SQL keyword as db.operation #1141

Merged

github-actions bot removed the Stale label Nov 6, 2020

jmacd reviewed Nov 10, 2020

View reviewed changes

justinfoote added 2 commits November 11, 2020 11:05

Update database metric semantic convention in response to PR feedback

9f95457

Add technology-specific db metric instruments guidelines

b2bebfc

justinfoote added 2 commits November 29, 2020 12:41

Fix markdown table layout for db metric labels

e6acc6a

Replace "mongo" with "mongodb" in metrics database semantic conventions

efba901

github-actions bot added the Stale label Dec 7, 2020

github-actions bot closed this Dec 14, 2020

aabmass mentioned this pull request Apr 16, 2021

Skeleton for the remaining metrics instruments #1617

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Database metric semantic conventions #1076

Database metric semantic conventions #1076

justinfoote commented Oct 8, 2020

arminru commented Oct 9, 2020

kenfinnigan Oct 9, 2020

justinfoote Oct 9, 2020

kenfinnigan Oct 9, 2020

justinfoote Oct 19, 2020

justinfoote Nov 2, 2020 •

edited

Loading

jmacd left a comment

arminru Oct 20, 2020

justinfoote Oct 23, 2020

justinfoote Nov 2, 2020

github-actions bot commented Oct 31, 2020

jmacd commented Nov 6, 2020

andrewhsu commented Nov 6, 2020

aabmass left a comment

aabmass Nov 5, 2020

aabmass Nov 5, 2020

aabmass Nov 6, 2020

justinfoote Nov 11, 2020

aabmass Nov 11, 2020

aabmass Nov 11, 2020

aabmass Nov 6, 2020

jmacd Nov 10, 2020

justinfoote Nov 11, 2020

justinfoote Nov 12, 2020

tigrannajaryan commented Nov 26, 2020

github-actions bot commented Dec 7, 2020

github-actions bot commented Dec 14, 2020

	Database operations SHOULD include execution of queries, including DDL, DML,
	Measured database operations SHOULD include execution of queries, including DDL, DML,

	instruments MAY be collected. They SHOULD have all [common](#common) labels
	instruments MAY be used. They SHOULD have all [common](#common) labels

Database metric semantic conventions #1076

Database metric semantic conventions #1076

Conversation

justinfoote commented Oct 8, 2020

1. db.operation and db.table

2. Connection pool metrics

Changes

arminru commented Oct 9, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

justinfoote Nov 2, 2020 • edited Loading

Choose a reason for hiding this comment

jmacd left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Oct 31, 2020

jmacd commented Nov 6, 2020

andrewhsu commented Nov 6, 2020

aabmass left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tigrannajaryan commented Nov 26, 2020

github-actions bot commented Dec 7, 2020

github-actions bot commented Dec 14, 2020

1. `db.operation` and `db.table`

justinfoote Nov 2, 2020 •

edited

Loading