Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add botocore db.statement sanitization for dynamoDB #1662

Open
wants to merge 41 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 22 commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
e2753f1
add sanitization for dynamoDB queries
nemoshlag Feb 12, 2023
068b8f6
lint fix
nemoshlag Feb 12, 2023
41d3724
lint fix2
nemoshlag Feb 12, 2023
6efcbb8
add CHANGELOG entry
nemoshlag Feb 12, 2023
3df3d25
CR fix
nemoshlag Feb 12, 2023
3cb7488
Merge branch 'main' into add/botocore-sanitization
srikanthccv Feb 13, 2023
721b41c
Merge branch 'main' into add/botocore-sanitization
shalevr Feb 13, 2023
b68de4d
Merge branch 'main' into add/botocore-sanitization
nemoshlag Feb 13, 2023
9b17814
Merge branch 'main' into add/botocore-sanitization
shalevr Feb 13, 2023
05d23b6
Merge branch 'main' into add/botocore-sanitization
srikanthccv Feb 13, 2023
18b651d
Merge branch 'main' into add/botocore-sanitization
nemoshlag Feb 13, 2023
53d7070
Merge branch 'main' into add/botocore-sanitization
nemoshlag Feb 14, 2023
fa461ef
add sanitization for dynamoDB queries
nemoshlag Feb 12, 2023
c6073d2
lint fix
nemoshlag Feb 12, 2023
068d3e0
lint fix2
nemoshlag Feb 12, 2023
5e11ff5
merged
nemoshlag Feb 19, 2023
82f4ba9
CR fix
nemoshlag Feb 12, 2023
38ef502
add configuration to context
nemoshlag Feb 19, 2023
a876ad0
Merge branch 'main' into add/botocore-sanitization
nemoshlag Feb 19, 2023
3b6d054
imp sanitization as external class
nemoshlag Feb 21, 2023
ceba85d
Merge remote-tracking branch 'origin/add/botocore-sanitization' into …
nemoshlag Feb 21, 2023
e08a776
Merge branch 'main' into add/botocore-sanitization
nemoshlag Feb 21, 2023
aee58c8
lint fix
nemoshlag Feb 21, 2023
6fc7647
cr fix
nemoshlag Feb 22, 2023
0a6d697
lint fix
nemoshlag Feb 22, 2023
4669538
Merge branch 'main' into add/botocore-sanitization
nemoshlag Feb 23, 2023
b4eca22
Merge branch 'main' into add/botocore-sanitization
shalevr Feb 26, 2023
d9beb26
Merge branch 'main' into add/botocore-sanitization
nemoshlag Feb 26, 2023
d6ef436
Merge branch 'main' into add/botocore-sanitization
nemoshlag Feb 27, 2023
acc8acc
Merge branch 'main' into add/botocore-sanitization
nemoshlag Feb 28, 2023
363a581
Merge branch 'main' into add/botocore-sanitization
nemoshlag Mar 1, 2023
cd8baff
use json for db.statement
nemoshlag Mar 2, 2023
28d4176
simpler implementation
nemoshlag Mar 5, 2023
541e6d7
lint fix
nemoshlag Mar 5, 2023
6fb5545
Merge branch 'main' into add/botocore-sanitization
nemoshlag Mar 7, 2023
787ac27
Merge branch 'main' into add/botocore-sanitization
nemoshlag Mar 8, 2023
a8b61b8
Merge branch 'main' into add/botocore-sanitization
shalevr Mar 8, 2023
f2a4623
Merge branch 'main' into add/botocore-sanitization
nemoshlag Mar 13, 2023
e4ca88f
Merge branch 'main' into add/botocore-sanitization
nemoshlag Mar 16, 2023
5281c38
Merge branch 'main' into add/botocore-sanitization
nemoshlag Mar 20, 2023
e9a11db
Merge branch 'main' into add/botocore-sanitization
nemoshlag Apr 4, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### Added

- `opentelemetry-instrumentation-botocore` Add optional db.statement query sanitization.
([#1662](https://github.com/open-telemetry/opentelemetry-python-contrib/pull/1662))
- Support `aio_pika` 9.x (([#1670](https://github.com/open-telemetry/opentelemetry-python-contrib/pull/1670])
- `opentelemetry-instrumentation-redis` Add `sanitize_query` config option to allow query sanitization. ([#1572](https://github.com/open-telemetry/opentelemetry-python-contrib/pull/1572))
- `opentelemetry-instrumentation-elasticsearch` Add optional db.statement query sanitization.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,8 @@ def request_hook(span, service_name, operation_name, api_params):
def response_hook(span, service_name, operation_name, result):
# response hook logic

dynamodb_sanitize_query (bool) - an optional dynamodb query sanitization flag, default is False

# Instrument Botocore with hooks
BotocoreInstrumentor().instrument(request_hook=request_hook, response_hook=response_hook)

Expand All @@ -76,6 +78,9 @@ def response_hook(span, service_name, operation_name, result):
)
ec2 = self.session.create_client("ec2", region_name="us-west-2")
ec2.describe_instances()

# Instrument Botocore with dynamoDB sanitization enabled - default is False
BotocoreInstrumentor().instrument(dynamodb_sanitize_query=True)
"""

import logging
Expand Down Expand Up @@ -117,6 +122,17 @@ def _patched_endpoint_prepare_request(wrapped, instance, args, kwargs):
return wrapped(*args, **kwargs)


class ExtensionsConfiguration:
nemoshlag marked this conversation as resolved.
Show resolved Hide resolved
def __init__(self, **kwargs):
dynamodb_sanitize_query = kwargs.get("dynamodb_sanitize_query", False)
self.dynamodb_configuration = {
"sanitize_query": dynamodb_sanitize_query
}

def get_dynamodb_configuration(self):
return self.dynamodb_configuration


class BotocoreInstrumentor(BaseInstrumentor):
"""An instrumentor for Botocore.

Expand All @@ -139,6 +155,7 @@ def _instrument(self, **kwargs):

self.request_hook = kwargs.get("request_hook")
self.response_hook = kwargs.get("response_hook")
self.configuration = ExtensionsConfiguration(**kwargs)

wrap_function_wrapper(
"botocore.client",
Expand All @@ -161,7 +178,9 @@ def _patched_api_call(self, original_func, instance, args, kwargs):
if context_api.get_value(_SUPPRESS_INSTRUMENTATION_KEY):
return original_func(*args, **kwargs)

call_context = _determine_call_context(instance, args)
call_context = _determine_call_context(
instance, args + (self.configuration,)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

args is what the botocore library passes to botocore.client.BaseClient._make_api_call (the instrumented function). i think currently it is the name of the operation (e.g. Lambda.ListLayers) and the input parameters (e.g. client.list_layers(<input>)) as dictionary.
This might potentially change in the future, so i think it would be better to pass the configuration as separate parameter to _determine_call_context (and in turn to _AwsSdkCallContext) instead of adding it to the args. This would also avoid issues when e.g. the type of args changes from tuple to list.

)
if call_context is None:
return original_func(*args, **kwargs)

Expand Down Expand Up @@ -243,9 +262,9 @@ def _apply_response_attributes(span: Span, result):
headers = metadata.get("HTTPHeaders")
if headers is not None:
request_id = (
headers.get("x-amzn-RequestId")
or headers.get("x-amz-request-id")
or headers.get("x-amz-id-2")
headers.get("x-amzn-RequestId")
or headers.get("x-amz-request-id")
or headers.get("x-amz-id-2")
)
if request_id:
# TODO: update when semantic conventions exist
Expand All @@ -262,7 +281,7 @@ def _apply_response_attributes(span: Span, result):


def _determine_call_context(
client: BaseClient, args: Tuple[str, Dict[str, Any]]
client: BaseClient, args: Tuple[str, Dict[str, Any]]
) -> Optional[_AwsSdkCallContext]:
try:
call_context = _AwsSdkCallContext(client, args)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -62,14 +62,44 @@ def _conv_val_to_len(value) -> Optional[int]:
return len(value) if value is not None else None


_db_statement_operations = {
"BatchGetItem",
"BatchWriteItem",
"DeleteItem",
"GetItem",
"PutItem",
"Query",
"Scan",
"UpdateItem",
}

_sanitized_keys = (
"Item",
"Key",
"KeyConditionExpression",
"ExpressionAttributeValues",
"FilterExpression",
)
_sanitized_value = {"?": "?"}


# pylint: disable=C0103
def _conv_params_to_sanitized_str(params) -> str:
p = params.copy()
for key in p:
if key in _sanitized_keys:
p[key] = _sanitized_value

return str(p)


################################################################################
# common request attributes
################################################################################

_REQ_TABLE_NAME = ("TableName", _conv_val_to_single_attr_tuple)
_REQ_REQITEMS_TABLE_NAMES = ("RequestItems", _conv_dict_to_key_tuple)


_REQ_GLOBAL_SEC_INDEXES = ("GlobalSecondaryIndexes", _conv_list_to_json_list)
_REQ_LOCAL_SEC_INDEXES = ("LocalSecondaryIndexes", _conv_list_to_json_list)

Expand Down Expand Up @@ -350,12 +380,25 @@ class _DynamoDbExtension(_AwsSdkExtension):
def __init__(self, call_context: _AwsSdkCallContext):
super().__init__(call_context)
self._op = _OPERATION_MAPPING.get(call_context.operation)
self._configuration = (
call_context.configuration.get_dynamodb_configuration()
)

def extract_attributes(self, attributes: _AttributeMapT):
attributes[SpanAttributes.DB_SYSTEM] = DbSystemValues.DYNAMODB.value
attributes[SpanAttributes.DB_OPERATION] = self._call_context.operation
attributes[SpanAttributes.NET_PEER_NAME] = self._get_peer_name()

if self._call_context.operation in _db_statement_operations:
attributes[SpanAttributes.DB_STATEMENT] = str(
self._call_context.params
nemoshlag marked this conversation as resolved.
Show resolved Hide resolved
)

if self._configuration.get("sanitize_query"):
attributes[
SpanAttributes.DB_STATEMENT
] = _conv_params_to_sanitized_str(self._call_context.params)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

afaik there isn't a specification yet how the db.statement should look like for DynamoDB.
i think this would be defined first to so that all techs (Js, Ruby, .NET, ....) use a commonly understood format.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get what you're saying. However, there are several similar merged PRs (#1545, #1547, #1548), and an open discussion in specs. Your call :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it then even make sens to add statement sanitization if the format isn't specced yet? Also when considering that there doesn't seem to be a common query language for DynamoDB (e.g SDK input parameters vs PartiQL).

Personally i don't have a strong opinion about statement capturing/sanitization and i'm just pointing out that it might lead to breaking changes in the future.
If you think this is ok feel free and continue with merging this PR :)


if self._op is None:
return

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,6 @@
_BotoClientT = "botocore.client.BaseClient"
_BotoResultT = Dict[str, Any]
_BotoClientErrorT = "botocore.exceptions.ClientError"

_OperationParamsT = Dict[str, Any]
nemoshlag marked this conversation as resolved.
Show resolved Hide resolved
_AttributeMapT = Dict[str, AttributeValue]


Expand All @@ -44,6 +42,7 @@ class _AwsSdkCallContext:
api_version: the API version of the called AWS service.
span_name: the name used to create the span.
span_kind: the kind used to create the span.
configuration: a class represents additional configuration for botocore extensions.
"""

def __init__(self, client: _BotoClientT, args: Tuple[str, Dict[str, Any]]):
Expand All @@ -54,12 +53,19 @@ def __init__(self, client: _BotoClientT, args: Tuple[str, Dict[str, Any]]):
_logger.warning("Could not get request params.")
params = {}

try:
configurations = args[2]
except (IndexError, TypeError):
_logger.warning("Could not get request configurations.")
configurations = None

boto_meta = client.meta
service_model = boto_meta.service_model

self.service = service_model.service_name.lower() # type: str
self.operation = operation # type: str
self.params = params # type: Dict[str, Any]
self.configuration = configurations

# 'operation' and 'service' are essential for instrumentation.
# for all other attributes we extract them defensively. All of them should
Expand Down
Loading