-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create the structure of a redaction processor #5274
Create the structure of a redaction processor #5274
Conversation
@dmitryax @Aneurysm9 Is there something I can do to make reviewing easier? I don't see this PR on the project board yet, but I don't seem to have permissions to edit the projects field |
@dmitryax @Aneurysm9 Do you think it would help if I streamlined this PR further? |
9e39445
to
2895668
Compare
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
Bump |
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
- group | ||
- name | ||
# Verbosity: debug vs info vs silent | ||
summary: debug |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we use service::telemetry::logs
config instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dmitryax Is there a way to access that value from a processor? The CreateTracesProcessor definition only provides config.Processor
and not config.Service
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no way to access it, but we can write logs in this processor with different levels and that verbosity of this output can be controlled with service::telemetry::logs
1507f52
to
143bae0
Compare
815eaf7
to
63daf82
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few more comments. I don't think you can access service::telemetry::logs
from the processor. I would suggest keeping the summary/dryRun functionality to a future PR
// DryRun mode does not remove any span attributes or mask any values. | ||
// Instead, changes are documented in new summary span attributes | ||
DryRun bool `mapstructure:"dry_run"` | ||
// MetricTags is list of span attribute keys that should used as dynamic | ||
// tags on metrics. This allows redaction metrics to be filtered based | ||
// on domain-specific dimensions. | ||
MetricTags []string `mapstructure:"metric_tags"` | ||
// Summary is the verbosity level for the new summary span attributes. | ||
// `debug` lists all redacted span attribute keys, `info` only provides | ||
// a count, and `silent` provides no summary. | ||
Summary string `mapstructure:"summary"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand what the additional value of summary: debug
is when compared to using the logging/file exporter to check the output traces (maybe because I am not seeing the implementation). Maybe we should leave Summary
and DryRun
for a future PR so that everything is in context?
63daf82
to
6ac15ea
Compare
Thanks @mx-psi! I've simplified the PR (removed DryRun and Summary, unexported/removed some constants) and made it work properly on 0.37 |
@codeboten lint is failing on this PR because of #5775. This is a new component and thus it's not yet added to the main go.mod file (as the contributing guidelines state). What should we do to make linting pass? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR looks good, but the issue above needs to be fixed for this to mergeable
|
||
// makeAllowList sets up a lookup table of allowed span attribute keys | ||
func makeAllowList(c *Config) map[string]string { | ||
redactionKeys := []string{redactedKeys, redactedKeyCount, maskedValues, maskedValueCount} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are redactedKeys
and maskedValues
supposed to be used for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dmitryax I've added a comment:
// redactionKeys are additional span attributes created by the processor to
// summarize the changes it made to a span. If the processor removes
// 2 attributes from a span (e.g. `birth_date`, `mothers_maiden_name`),
// then it will list them in the `redactedKeys` span attribute and set the
// `redactedKeysCount` attribute to 2
//
// If the processor finds and masks values matching a blocked regex in 2
// span attributes (e.g. `notes`, `description`), then it will those
// attribute keys in `maskedValues` and set the `maskedValueCount` to 2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok sounds good. Why redactedKeyCount
and maskedValueCount
needed then? They seem redundant if we have redactedKeys
and maskedValue
with array values...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dmitryax Using this processor in production, I've found that the counts are convenient for querying through traces for redaction. The lists of attributes are nice for diagnostics, but being able to query for redactedKeysCount > 0
is convenient
The design reason for the counts is that they leak less information than the lists of span attribute keys in case the actual key names need to be kept confidential
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @leonsp-ai, sorry for the delayed response. Thanks for explaining the attributes purpose. It looks good to me.
The only concern is that attribute names in this PR don't fit established attribute naming conventions. Please take a look at this specification and think about names that would better fit the tracing model. Once the processor is implemented (or closer to final PRs), they should be added to the semantic conventions in the spec. Not necessary to change it in this PR, can be changed later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dmitryax Thanks for the link to the spec! I've pencilled in names that match a little better
processor/redactionprocessor/go.mod
Outdated
go.opentelemetry.io/collector v0.37.0 | ||
go.opentelemetry.io/collector/model v0.37.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, sorry for the delay. Please update the version to 0.38.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated for 0.39.0
@codeboten What do you think about #5274 (comment) ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR looks good to me.
We need someone from @open-telemetry/collector-contrib-maintainer to merge this
* `num_redacted_keys` represents the number of span attribute keys redacted by processor | ||
* `num_masked_values` represents the number of span attribute values masked by processor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new names look better please update them here as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dmitryax Those are metric names, so I'll move those to the follow-up metrics PR
@leonsp-ai can we try and add it to the |
Once the build passes, ping me and I'll merge this. |
82dbb3b
to
d41df1d
Compare
d41df1d
to
fd0a4ec
Compare
@dmitryax Could you please approve the workflow? I've rebased which I think should fix the error in the last unittest/lint runs |
processor/redactionprocessor/go.mod
Outdated
require ( | ||
github.com/stretchr/testify v1.7.0 | ||
go.opentelemetry.io/collector v0.39.0 | ||
go.opentelemetry.io/collector/model v0.39.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be in sync with what we have in the main branch at the moment of the merge. This is currently v0.39.1-0.20211117203239-e23c9d0a0183
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jpkrohling Updated to v0.39.1-0.20211117203239-e23c9d0a0183
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, looks like this got updated once again to v0.39.1-0.20211119172502-53d057f9c0e7 by #6390.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jpkrohling Updated to v0.39.1-0.20211119172502-53d057f9c0e7
The `resource_to_telemetry_conversion` setting was listed in the readme, but it doesn't correspond to a setting for this exporter. It looks like individual exporters in the contrib repo have it. The readme also had a dangling link for more settings. I've removed the link altogether since, as far as I can tell, there are no longer any settings not listed in the readme. Please kindly verify that this is true.
Description: Implement a redaction processor to prevent leaking sensitive or private span attributes or span attribute values for privacy, security, and compliance purposes
Link to tracking Issue: #4131
Testing: This is the first PR creating the structure of the new processor per guidelines. This PR has coverage for creating, starting, and stopping the processor. Test coverage for the actual redaction functionality will go in a second PR along with the implementation.
Documentation: Readme for the new redaction processor