Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add sqlcommenter comment with trace context to queries in pg instrumentation #1286

Merged
merged 19 commits into from
Nov 24, 2022

Conversation

henrinormak
Copy link
Contributor

Which problem is this PR solving?

In order to get end-to-end telemetry, Google proposed a standard for sending trace context to DB engines - sqlcommenter. This is now part of the OTel family of packages (the Node variants have not been published under @opentelemetry scope AFAIK).

With this in mind, it makes sense to bring this capability directly to the instrumentation, as that way users without ORMs can still benefit from the ability to propagate a tracing context to the DB engine.

Short description of the changes

Added a new config flag to the pg instrumentation addSqlCommenterCommentToQueries (name up for discussion as it might end up being used by other DB driver instrumentations as well). If this is set to true, the instrumentation will modify the query (according to the sqlcommenter spec) before sending it onwards to the underlying library.

My end goal is to test this out with Postgres running in CloudSQL, which should allow me to get much deeper detail into the DB related Spans - https://cloud.google.com/blog/products/databases/sqlcommenter-merges-with-opentelemetry

Currently this PR is a draft, as I feel it is missing tests to validate that the query is actually correctly augmented with the tracing comment - the utility function is tested, but would be nice to have an integration test for this. I might be able to figure something out based on DB logs, but not 100% sure yet.

@@ -48,6 +48,8 @@ PostgreSQL instrumentation has few options available to choose from. You can set
| ------- | ---- | ----------- |
| [`enhancedDatabaseReporting`](./src/types.ts#L30) | `boolean` | If true, additional information about query parameters and results will be attached (as `attributes`) to spans representing database operations |
| `responseHook` | `PgInstrumentationExecutionResponseHook` (function) | Function for adding custom attributes from db response |
| `requireParentSpan` | `boolean` | If true, requires a parent span to create new spans (default false) |
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was previously added, but README was out of date.

@codecov
Copy link

codecov bot commented Nov 14, 2022

Codecov Report

Merging #1286 (70ac1cf) into main (fef44f4) will decrease coverage by 0.85%.
The diff coverage is 100.00%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1286      +/-   ##
==========================================
- Coverage   96.08%   95.22%   -0.86%     
==========================================
  Files          14        4      -10     
  Lines         893      314     -579     
  Branches      191       51     -140     
==========================================
- Hits          858      299     -559     
+ Misses         35       15      -20     
Impacted Files Coverage Δ
...elemetry-instrumentation-pg/src/instrumentation.ts 90.47% <100.00%> (ø)
...node/opentelemetry-instrumentation-pg/src/utils.ts 98.42% <100.00%> (ø)
...lugin-react-load/src/BaseOpenTelemetryComponent.ts
...metry-propagator-ot-trace/src/OTTracePropagator.ts
...ation-user-interaction/src/enums/AttributeNames.ts
...emetry-propagator-instana/src/InstanaPropagator.ts
...etapackages/auto-instrumentations-web/src/utils.ts
...trumentation-document-load/src/enums/EventNames.ts
...umentation-user-interaction/src/instrumentation.ts
...metry-propagator-aws-xray/src/AWSXRayPropagator.ts
... and 10 more

@weyert
Copy link

weyert commented Nov 14, 2022

Awesome, great PR. Thank you. I had a similar query on the sqlcommenter repo many moons ago:
open-telemetry/opentelemetry-sqlcommenter#9

I think it would be nice if this gets integrated directly in otel-js

@henrinormak henrinormak marked this pull request as ready for review November 18, 2022 14:19
@henrinormak henrinormak requested a review from a team November 18, 2022 14:19
@henrinormak
Copy link
Contributor Author

I added some tests that assert the query is augmented, not super happy with the way I had to hack into the query queue of pg, but I could not figure out a simpler way. Luckily this is limited to the tests and does not alter the actual instrumentation code.

@henrinormak
Copy link
Contributor Author

@Flarna @weyert any further comments? Could we get this merged? Looking forward to trying it out with CloudSQL and pg, thanks a bunch! 🙇

Copy link
Member

@Flarna Flarna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left some minor comments. Actual merging is up to component owner/repo maintainers.

Copy link
Member

@blumamir blumamir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution
Added few comments

@aabmass
Copy link
Member

aabmass commented Nov 23, 2022

@sjs994 can you take a look at this PR?

  • did we decide if we are migrating SQLCommenter instrumentation into the individual contrib repos as an overall strategy?
  • is there some code we should be re-using from the SQLCommenter repo here?

@weyert
Copy link

weyert commented Nov 23, 2022

@sjs994 can you take a look at this PR?

  • did we decide if we are migrating SQLCommenter instrumentation into the individual contrib repos as an overall strategy?
  • is there some code we should be re-using from the SQLCommenter repo here?

Originally, the idea was to wait for SQLCommenter to be specced out before this. But I this could be a nice test case for it. The final dream state was to move them to instrumentations instead of a separate package.

Copy link
Member

@blumamir blumamir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing everything.
LGTM

We talked about it yesterday at the SIG. @aabmass and @sjs994 are working on this at google and offered to review it as well

Copy link

@sjs994 sjs994 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have one comment (this may be in general to all SQLCommenter library). Otherwise LGTM.

@@ -289,3 +292,65 @@ export function patchClientConnectCallback(
cb.call(this, err);
};
}

function hasValidSqlComment(query: string): boolean {
const indexOpeningDashDashComment = query.indexOf('--');
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The query might contain "--" as part of parameter. In that case, this method will return true that the query contains a comment.

For Ex:

SELECT c1 FROM t1 WHERE c2 = '--<search_text>';

I am not sure how to detect correctly that a query contains a comment without using a parser.
Should we add a comment at-least for now that this may not be completely correct and revisit once we have a proper solution or a change in the spec ?

Copy link
Contributor Author

@henrinormak henrinormak Nov 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the same also apply for the block comment? i.e if it is inside of a string literal, it is not a comment? I took the current code almost verbatim from the existing sqlcommenter Node implementation (the knex one for example), so the fault is present there as well.

Adding a parser would not really be a problem, I think, although not sure what the overhead would be for each query execution, perhaps a comment in the source code + a OpenTelemetry diagnostic log line for queries that are ignored is a nice starting point?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it might affect a block comment also. I agree the issue originates in the original SQLCommenter libraries. That's why i am inclined towards adding a comment regarding this issue here (and if possible callout in docs).

Adding a parser might affect the performance as well as will be an additional dependency in this library (which databases are anyway doing).

We are thinking if the specs regarding "do not add a comment if there is already a comment" could be modified. Should it just be "add a new comment at the end of the statement" ?.
But, for that we need to understand whether different databases (at-least the supported ones Postgres, MySQL, SQLite, etc) have any constraints on comments and whether any popular "hint"(Ex pg_hint_plan, MySQL query hints) features will get affected if multiple comment sections are present.

So, maybe till that is finalised, we can add a comment and tech debt maybe ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment and a short note to the README for the flag.

@blumamir blumamir merged commit a0003e7 into open-telemetry:main Nov 24, 2022
@dyladan dyladan mentioned this pull request Nov 24, 2022
@joelmukuthu
Copy link

Hi @henrinormak, thanks for working on this feature! Do you have thoughts on how other/custom attributes could be added to the SQL comments e.g. action, controller etc?

On another note, I currently use the following code to get the trace context (inspired by https://github.com/open-telemetry/opentelemetry-sqlcommenter/tree/2f8841add68358069ebf1c0ee560ab3e98a59aa9/nodejs/sqlcommenter-nodejs):

import { context, propagation } from '@opentelemetry/api';

type TraceContext = {
  traceparent?: string;
  tracestate?: string;
};

export const getActiveTraceContext = (): TraceContext => {
  const traceContext: TraceContext = {};

  propagation.inject(context.active(), traceContext);

  return traceContext;
};

I use this function in some other code where all SQL goes through before being passed on the node-postgres module.

It works, somewhat (I never get tracestate), but it's quite different from what you did here and I'm fairly certain I'm doing something wrong. Would you be able to suggest the best way to get the trace context in my case? For full context, I'm not able to take advantage of addSqlCommenterCommentToQueries because I need to add action and controller to my SQL comments. Thanks in advance!

@henrinormak
Copy link
Contributor Author

Regarding the action, controller, framework. I suppose the correct approach would be to allow the instrumentation to be configured somehow to extract these from the context that is active when the query is being made. Query + the active context should be enough to know those values, right? I think that addition might be simple enough, but not sure whether I'll have time for this in the near future (PRs are welcome though as always :) )

As for the issue with the code snippet, I don't quite understand what you suspect being wrong, it does seem like one way of getting the context to be injected into the query (assuming you have configured the W3C propagator). It of course does not cover the actual injection of the context into the query as a comment, but I assume that is also handled. As for why the instrumentation implementation may differ - simply put, as the instrumentation does not know which of the propagators you have configured on OTel API, we use the propagator directly. Or did you mean some other difference?

@joelmukuthu
Copy link

I suppose the correct approach would be to allow the instrumentation to be configured somehow to extract these from the context that is active when the query is being made

Sounds good. I can find time to work on it in the coming weeks; I'll probably start with an issue to verify the approach and ask for pointers on where to get started in the code.

As for my getActiveTraceContext, you've answered the question! I looked at the implementation here and thought that I should also be creating a propagator and maybe getting the active context some other way. Thanks for responding!

@withnale
Copy link

@joelmukuthu - have you got an example of your code that is injecting the route and others into the comments. We're stuck with the same problem

@joelmukuthu
Copy link

Hi @withnale. My implementation boils down to something like this:

import { context, propagation } from '@opentelemetry/api';
import { Pool, type QueryConfig } from 'pg';

const getActiveTraceContext = (): { traceparent?: string; tracestate?: string; } => {
  const traceContext: TraceContext = {};
  propagation.inject(context.active(), traceContext);
  return traceContext;
};

// This is my own implementation of the spec. You can also borrow the one implemented
// in this PR.
const formatSqlComments = (comments: Record<string, string>): string => {
  // Based on this spec: https://google.github.io/sqlcommenter/spec/
  const keyValuePairs = Object.entries(comments).map(([key, value]) => {
    key = encodeURIComponent(key);
    value = encodeURIComponent((value || '').replace(/'/g, "\\'"));
    return `${key}='${value}'`;
  });

  if (!keyValuePairs.length) {
    return '';
  }

  return `/* ${keyValuePairs.sort().join(', ')} */`;
};

const pool = new Pool();

const runQuery = async (sql: string | QueryConfig): ReturnType<typeof pool.query> => {
  // According to the spec, you should first check that the query has no comments before 
  // appending any. You can borrow the implementation of `hasComments` from this PR.
  if (hasComments(sql)) {
    return pool.query(sql);
  }

  const activeTraceContext = getActiveTraceContext();
  const comments = {
    ...activeTraceContext,
    // Your custom attributes. These need to come from some kind of context that is unique to a request
    action: 'foo', // or something like `req.action`
    controller: 'bar',
    route: 'baz',
    // etc ...
  };

  const formattedComments = formatSqlComments(comments);

  if (typeof sql === 'string') {
    sql = `${options.sql} ${formattedComments}`;
  } else {
    sql.text = `${options.sql.text} ${formattedComments}`;
  }

  return pool.query(sql);
};

There's nothing special here, the main thing you'd need to make sure of is that all your database calls go through a central point e.g. the runQuery function above. Also this is not the best place to offer support so if you need more help, see here for more info: https://github.com/open-telemetry/opentelemetry-js-contrib#useful-links

But thanks for asking, I'll take a stab at #1353 which would ideally solve your issue

@withnale
Copy link

Ah. Thanks. I wrote something very similar. Initially I thought you'd worked out a way to patch instrumentation-pg, since that generates additional spans at a lower level. In the end, I decided the same as you that the sql comments were more important than these autogenerated spans.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants