Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create the structure of a redaction processor #5274

Conversation

leonsp-ai
Copy link
Contributor

Description: Implement a redaction processor to prevent leaking sensitive or private span attributes or span attribute values for privacy, security, and compliance purposes

Link to tracking Issue: #4131

Testing: This is the first PR creating the structure of the new processor per guidelines. This PR has coverage for creating, starting, and stopping the processor. Test coverage for the actual redaction functionality will go in a second PR along with the implementation.

Documentation: Readme for the new redaction processor

@leonsp-ai leonsp-ai requested review from a team and dmitryax September 17, 2021 19:35
@leonsp-ai
Copy link
Contributor Author

@dmitryax @Aneurysm9 Is there something I can do to make reviewing easier?

I don't see this PR on the project board yet, but I don't seem to have permissions to edit the projects field

@leonsp-ai
Copy link
Contributor Author

@dmitryax @Aneurysm9 Do you think it would help if I streamlined this PR further?

@leonsp-ai leonsp-ai force-pushed the lp/add-redaction-processor-structure branch from 9e39445 to 2895668 Compare October 1, 2021 15:18
@github-actions
Copy link
Contributor

github-actions bot commented Oct 9, 2021

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added the Stale label Oct 9, 2021
@leonsp-ai
Copy link
Contributor Author

Bump

@github-actions github-actions bot removed the Stale label Oct 13, 2021
@github-actions
Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added the Stale label Oct 20, 2021
@dmitryax dmitryax removed the Stale label Oct 20, 2021
- group
- name
# Verbosity: debug vs info vs silent
summary: debug
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use service::telemetry::logs config instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dmitryax Is there a way to access that value from a processor? The CreateTracesProcessor definition only provides config.Processor and not config.Service

Copy link
Member

@dmitryax dmitryax Oct 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no way to access it, but we can write logs in this processor with different levels and that verbosity of this output can be controlled with service::telemetry::logs

@leonsp-ai leonsp-ai force-pushed the lp/add-redaction-processor-structure branch from 1507f52 to 143bae0 Compare October 21, 2021 21:06
@leonsp-ai
Copy link
Contributor Author

@mx-psi @dmitryax Thank you for the in-depth review! I've added more documentation and addressed the feedback

The one thing I'm fuzzy on is how to access the configuration value for service::telemetry::logs, but I think it would be good to go over this again now that there's more documentation

@leonsp-ai leonsp-ai requested review from mx-psi and dmitryax October 21, 2021 21:11
@leonsp-ai leonsp-ai force-pushed the lp/add-redaction-processor-structure branch 2 times, most recently from 815eaf7 to 63daf82 Compare October 21, 2021 21:14
Copy link
Member

@mx-psi mx-psi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few more comments. I don't think you can access service::telemetry::logs from the processor. I would suggest keeping the summary/dryRun functionality to a future PR

Comment on lines 30 to 40
// DryRun mode does not remove any span attributes or mask any values.
// Instead, changes are documented in new summary span attributes
DryRun bool `mapstructure:"dry_run"`
// MetricTags is list of span attribute keys that should used as dynamic
// tags on metrics. This allows redaction metrics to be filtered based
// on domain-specific dimensions.
MetricTags []string `mapstructure:"metric_tags"`
// Summary is the verbosity level for the new summary span attributes.
// `debug` lists all redacted span attribute keys, `info` only provides
// a count, and `silent` provides no summary.
Summary string `mapstructure:"summary"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what the additional value of summary: debug is when compared to using the logging/file exporter to check the output traces (maybe because I am not seeing the implementation). Maybe we should leave Summary and DryRun for a future PR so that everything is in context?

@leonsp-ai leonsp-ai force-pushed the lp/add-redaction-processor-structure branch from 63daf82 to 6ac15ea Compare October 25, 2021 19:03
@leonsp-ai
Copy link
Contributor Author

Thanks @mx-psi! I've simplified the PR (removed DryRun and Summary, unexported/removed some constants) and made it work properly on 0.37

@leonsp-ai leonsp-ai requested a review from mx-psi October 25, 2021 19:06
@mx-psi
Copy link
Member

mx-psi commented Oct 26, 2021

@codeboten lint is failing on this PR because of #5775. This is a new component and thus it's not yet added to the main go.mod file (as the contributing guidelines state). What should we do to make linting pass?

Copy link
Member

@mx-psi mx-psi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR looks good, but the issue above needs to be fixed for this to mergeable

@leonsp-ai leonsp-ai requested a review from dmitryax October 27, 2021 21:16

// makeAllowList sets up a lookup table of allowed span attribute keys
func makeAllowList(c *Config) map[string]string {
redactionKeys := []string{redactedKeys, redactedKeyCount, maskedValues, maskedValueCount}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are redactedKeys and maskedValues supposed to be used for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dmitryax I've added a comment:

	// redactionKeys are additional span attributes created by the processor to
	// summarize the changes it made to a span. If the processor removes
	// 2 attributes from a span (e.g. `birth_date`, `mothers_maiden_name`),
	// then it will list them in the `redactedKeys` span attribute and set the
	// `redactedKeysCount` attribute to 2
	//
	// If the processor finds and masks values matching a blocked regex in 2
	// span attributes (e.g. `notes`, `description`), then it will those
	// attribute keys in `maskedValues` and set the `maskedValueCount` to 2

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok sounds good. Why redactedKeyCount and maskedValueCount needed then? They seem redundant if we have redactedKeys and maskedValue with array values...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dmitryax Using this processor in production, I've found that the counts are convenient for querying through traces for redaction. The lists of attributes are nice for diagnostics, but being able to query for redactedKeysCount > 0 is convenient

The design reason for the counts is that they leak less information than the lists of span attribute keys in case the actual key names need to be kept confidential

Copy link
Member

@dmitryax dmitryax Nov 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @leonsp-ai, sorry for the delayed response. Thanks for explaining the attributes purpose. It looks good to me.

The only concern is that attribute names in this PR don't fit established attribute naming conventions. Please take a look at this specification and think about names that would better fit the tracing model. Once the processor is implemented (or closer to final PRs), they should be added to the semantic conventions in the spec. Not necessary to change it in this PR, can be changed later

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dmitryax Thanks for the link to the spec! I've pencilled in names that match a little better

@leonsp-ai leonsp-ai requested a review from dmitryax October 28, 2021 19:23
Comment on lines 8 to 9
go.opentelemetry.io/collector v0.37.0
go.opentelemetry.io/collector/model v0.37.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, sorry for the delay. Please update the version to 0.38.0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated for 0.39.0

@leonsp-ai
Copy link
Contributor Author

@codeboten What do you think about #5274 (comment) ?

Can we go ahead and merge regardless? cc @dmitryax @mx-psi

Copy link
Member

@dmitryax dmitryax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR looks good to me.

We need someone from @open-telemetry/collector-contrib-maintainer to merge this

Comment on lines 64 to 65
* `num_redacted_keys` represents the number of span attribute keys redacted by processor
* `num_masked_values` represents the number of span attribute values masked by processor
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new names look better please update them here as well

Copy link
Contributor Author

@leonsp-ai leonsp-ai Nov 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dmitryax Those are metric names, so I'll move those to the follow-up metrics PR

@mx-psi
Copy link
Member

mx-psi commented Nov 17, 2021

@leonsp-ai can we try and add it to the versions.yaml list? https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/versions.yaml#L18 I think that may make the lint job pass

@jpkrohling
Copy link
Member

Once the build passes, ping me and I'll merge this.

@leonsp-ai leonsp-ai force-pushed the lp/add-redaction-processor-structure branch from 82dbb3b to d41df1d Compare November 17, 2021 21:37
@leonsp-ai leonsp-ai force-pushed the lp/add-redaction-processor-structure branch from d41df1d to fd0a4ec Compare November 18, 2021 18:26
@leonsp-ai
Copy link
Contributor Author

@dmitryax Could you please approve the workflow? I've rebased which I think should fix the error in the last unittest/lint runs

require (
github.com/stretchr/testify v1.7.0
go.opentelemetry.io/collector v0.39.0
go.opentelemetry.io/collector/model v0.39.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be in sync with what we have in the main branch at the moment of the merge. This is currently v0.39.1-0.20211117203239-e23c9d0a0183

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jpkrohling Updated to v0.39.1-0.20211117203239-e23c9d0a0183

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, looks like this got updated once again to v0.39.1-0.20211119172502-53d057f9c0e7 by #6390.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jpkrohling Updated to v0.39.1-0.20211119172502-53d057f9c0e7

@jpkrohling jpkrohling merged commit 207b7f2 into open-telemetry:main Nov 23, 2021
@leonsp-ai leonsp-ai deleted the lp/add-redaction-processor-structure branch November 25, 2021 21:10
hex1848 pushed a commit to hex1848/opentelemetry-collector-contrib that referenced this pull request Jun 2, 2022
The `resource_to_telemetry_conversion` setting was listed in the readme, but it doesn't correspond to a setting for this exporter. It looks like individual exporters in the contrib repo have it.

The readme also had a dangling link for more settings. I've removed the link altogether since, as far as I can tell, there are no longer any settings not listed in the readme. Please kindly verify that this is true.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants