Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add botocore db.statement sanitization for dynamoDB #1662

Open
wants to merge 41 commits into
base: main
Choose a base branch
from

Conversation

nemoshlag
Copy link
Member

Description

Added an optional query sanitizer to the botocore dynamoDB instrumentation.
Usage
BotocoreInstrumentor().instrument(sanitize_query=True)

This will affect the DB_STATEMENT value to contain the original query or sanitized one.

Fixes open-telemetry/opentelemetry-specification#1544
Following the specification discussion here

Type of change

Please delete options that are not relevant.

  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

  • Test A db.statement has been sanitized
  • Test B no side affects occurred on a query without a db.statement attribute

Does This PR Require a Core Repo Change?

  • No.

Checklist:

  • Followed the style guidelines of this project
  • Changelogs have been updated
  • Unit tests have been added
  • Documentation has been updated

@nemoshlag nemoshlag marked this pull request as ready for review February 12, 2023 15:27
@nemoshlag nemoshlag requested a review from a team February 12, 2023 15:27
@nemoshlag nemoshlag changed the title Add/botocore sanitization Add botocore db.statement sanitization for dynamoDB Feb 12, 2023
@@ -354,6 +374,12 @@ def __init__(self, call_context: _AwsSdkCallContext):
def extract_attributes(self, attributes: _AttributeMapT):
attributes[SpanAttributes.DB_SYSTEM] = DbSystemValues.DYNAMODB.value
attributes[SpanAttributes.DB_OPERATION] = self._call_context.operation
attributes[SpanAttributes.DB_STATEMENT] = str(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i wonder if capturing the db.statement for DynamoDB is really applicable in terms of semantic conventions, at least in the way how it is captured here.
right now this would capture (more or less) a serialized JSON string consisting of all the parameters that were passed to the DynamoDb SDK function (e.g. client.get_item()). This doesn't seem to look like defined format and backends will probably have a hard time interpreting it.

The instrumentation is also already capturing some of the parameters (e.g. TableName, ConsistentRead, ConsumedCapacity, ...) in other span attributes. So there will probably be some redundant/unrelated parts in the db.statement.
Additionally, boto3 provides at least two ways to call certain APIs. E.g. for the get_item (and most other) functions there exists a client API and a convenience table API. They are mostly the sames but slightly differ in e.g. how the Key parameter needs to be passed. For a captured unsanitized db.statement attribute this would probably mean that the statement would differ depending on which API is used.

Furthermore, there exists also the PartiQL query language that is supported by certain DynamoDB APIs e.g. the execute_statement. Currently the instrumentation doesn't capture any additional span attributes for any of these PartiQL specific APIs, but it might lead to inconsistencies in case support for it is added in the future, where then one API captures the db.statement as serialized JSON and another API captures PartiQL query string.

…add/botocore-sanitization

# Conflicts:
#	instrumentation/opentelemetry-instrumentation-botocore/src/opentelemetry/instrumentation/botocore/__init__.py
#	instrumentation/opentelemetry-instrumentation-botocore/src/opentelemetry/instrumentation/botocore/extensions/dynamodb.py
#	instrumentation/opentelemetry-instrumentation-botocore/tests/test_botocore_dynamodb.py
@@ -161,7 +178,9 @@ def _patched_api_call(self, original_func, instance, args, kwargs):
if context_api.get_value(_SUPPRESS_INSTRUMENTATION_KEY):
return original_func(*args, **kwargs)

call_context = _determine_call_context(instance, args)
call_context = _determine_call_context(
instance, args + (self.configuration,)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

args is what the botocore library passes to botocore.client.BaseClient._make_api_call (the instrumented function). i think currently it is the name of the operation (e.g. Lambda.ListLayers) and the input parameters (e.g. client.list_layers(<input>)) as dictionary.
This might potentially change in the future, so i think it would be better to pass the configuration as separate parameter to _determine_call_context (and in turn to _AwsSdkCallContext) instead of adding it to the args. This would also avoid issues when e.g. the type of args changes from tuple to list.

if self._configuration.get("sanitize_query"):
attributes[
SpanAttributes.DB_STATEMENT
] = _conv_params_to_sanitized_str(self._call_context.params)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

afaik there isn't a specification yet how the db.statement should look like for DynamoDB.
i think this would be defined first to so that all techs (Js, Ruby, .NET, ....) use a commonly understood format.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get what you're saying. However, there are several similar merged PRs (#1545, #1547, #1548), and an open discussion in specs. Your call :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it then even make sens to add statement sanitization if the format isn't specced yet? Also when considering that there doesn't seem to be a common query language for DynamoDB (e.g SDK input parameters vs PartiQL).

Personally i don't have a strong opinion about statement capturing/sanitization and i'm just pointing out that it might lead to breaking changes in the future.
If you think this is ok feel free and continue with merging this PR :)

@srikanthccv
Copy link
Member

@mariojonke approve if your comments are addressed or request changes to prevent the accidental merge.

@dacevedo12 dacevedo12 mentioned this pull request Mar 8, 2023
10 tasks
@ocelotl
Copy link
Contributor

ocelotl commented Jul 12, 2023

@nemoshlag are there any updates on the comments from @mariojonke?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Is the Included Propagators section a duplicate of Propagators Distribution?
5 participants