Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it okay to use SQL statement type (e.g. just "SELECT") as span name? #1139

Closed
mateuszrzeszutek opened this issue Oct 23, 2020 · 9 comments · Fixed by #1219
Closed

Is it okay to use SQL statement type (e.g. just "SELECT") as span name? #1139

mateuszrzeszutek opened this issue Oct 23, 2020 · 9 comments · Fixed by #1219
Labels
area:semantic-conventions Related to semantic conventions priority:p3 Lowest priority level release:allowed-for-ga Editorial changes that can still be added before GA since they don't require action by SIGs spec:trace Related to the specification/trace directory

Comments

@mateuszrzeszutek
Copy link
Member

What are you trying to achieve?

Spec states that "SQL statement without variable arguments" are a good choice for DB span name, but sometimes SQL queries (especially ones generated by ORM frameworks like Hibernate) get terribly long and may have very high cardinality. E.g.

SELECT t.col1 as column1, t.col2 as column2, t.col3 as column3, t.col4 as column4, t.col5 as column5 from table t where t.id = ?

just for selecting a single entry. Imagine a query that eagerly fetches 3 tables joined together, each one having 15-20 columns, with some complex conditions -- and everything generated completely dynamically using session.createCriteria() (Hibernate again).

Instead, would it be fine to use just SELECT (or INSERT or ...) as span name? Full query should be available as db.statement anyway.

@mateuszrzeszutek mateuszrzeszutek added the spec:trace Related to the specification/trace directory label Oct 23, 2020
@Oberon00 Oberon00 added the area:semantic-conventions Related to semantic conventions label Oct 24, 2020
@iNikem
Copy link
Contributor

iNikem commented Oct 26, 2020

First, how do you get SELECT or INSERT from the query without parsing it? Spec advises strongly against query parsing and I agree.

I very much prefer the John's idea from open-telemetry/opentelemetry-java-instrumentation#1409 (comment)

@trask
Copy link
Member

trask commented Oct 26, 2020

I very much prefer the John's idea from open-telemetry/opentelemetry-java-instrumentation#1409 (comment)

👍

Full query should be available as db.statement anyway.

👍

@andrewhsu andrewhsu added priority:p3 Lowest priority level release:allowed-for-ga Editorial changes that can still be added before GA since they don't require action by SIGs labels Oct 27, 2020
@mateuszrzeszutek
Copy link
Member Author

Alternative proposition: what about statement type + main table name?

SELECT FROM <table>
UPDATE <table>
INSERT INTO <table>
DELETE FROM <table>
MERGE INTO <table>
...

It's possible to implement that using a simple lexer (and the implementation won't be very complicated, I think). The downside of this solution is that it would cover only the most common/simple SELECT queries, tables used in joins & subqueries wouldn't be visible in the span name (of course, there's still the db.statement attribute that contains the original query).

What do you think about that?

CC @bogdandrutu @tigrannajaryan

@Oberon00
Copy link
Member

Oberon00 commented Oct 30, 2020

I think with the current spec, this is all allowed, as it is not very strict:

The span name SHOULD be set to a low cardinality value representing the statement executed on the database. It may be a stored procedure name (without arguments), SQL statement without variable arguments, operation name, etc. When it's otherwise impossible to get any meaningful span name, db.name or the tech-specific database name MAY be used.

-- https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/trace/semantic_conventions/database.md

I think we wanted to avoid requiring or suggesting to parse anything though. If you have a very long statement, I think the db.name suggestion applies.

@mateuszrzeszutek
Copy link
Member Author

I think we wanted to avoid requiring or suggesting to parse anything though. If you have a very long statement, I think the db.name suggestion applies.

You still need to do simple parsing if you want to remove (potentially sensitive) arguments from a SQL query.
Anyway, I wanted to find some middle ground between db.name (which provides very little information) and the full query (which provides way too much information), so I'm glad that spec allows it.
Do you think we could mention it in the spec?

@Oberon00
Copy link
Member

Oberon00 commented Oct 30, 2020

Sure, the spec already has a few examples listed with etc. at the end, adding the suggestion to use the SQL keyword seems OK. EDIT: But probably db.name + keyword, otherwise we would get too few distinct span names. I think just "SELECT" does not describe an "interesting class of spans" as the general span name spec puts it.

@mateuszrzeszutek
Copy link
Member Author

I think just "SELECT" does not describe an "interesting class of spans" as the general span name spec puts it.

Oh, I meant SELECT FROM <table>, SELECT by itself really is not descriptive enough.

@Oberon00
Copy link
Member

Oberon00 commented Oct 30, 2020

Hmm, what if you have more than one table? or what if you SELECT FROM (SELECT ...)? This might be too complex for a general recommendation. Maybe <keyword> <db.name>.<db.table>? (see #1141 for db.table)

@mateuszrzeszutek
Copy link
Member Author

Maybe <keyword> <db.name>.<db.table>

That actually sounds great. In case of nested queries or multiple tables <db.table> could be omitted, same as what #1141 suggests for that attribute:

If the operation is acting upon an anonymous table, or more than one table, this value may be omitted.

So for SELECT FROM (SELECT ...) the span name would be SELECT db_name without any table. WDYT?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:semantic-conventions Related to semantic conventions priority:p3 Lowest priority level release:allowed-for-ga Editorial changes that can still be added before GA since they don't require action by SIGs spec:trace Related to the specification/trace directory
Projects
None yet
5 participants