Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GoogleCloudSpannerReceiver: Mask lock stats PII #14607

Closed
wants to merge 3 commits into from

Conversation

architjugran
Copy link
Contributor

Description: Cloud Spanner receiver should have the capability to mask sensitive information. This PR adds a configurable option to mask the PII in lock stats metrics for customers.

For the metric "top minute lock stats", the label "row_range_start_key" (https://cloud.google.com/spanner/docs/introspection/lock-statistics#explain-row-range) has PII information of table key values. We have now given the ability to hash the keys individually such that
table_name(key1,key2) becomes table_name(hash1,hash2).
Note that even though key values are hashed, a user can identify if sets of keys share common prefix like table1(key1,key2) and table1(key1,key3)

Link to tracking Issue: Google-Internal customer-request.

Testing: Unit tests added. Manual end-to-end testing done by running the service locally and verifying the hashed row_range_start_key values.

Documentation: Added the configuration option and the corresponding info in the README

@architjugran architjugran requested review from a team and dmitryax September 30, 2022 13:23
@architjugran architjugran changed the title GoogleCloudSpannerReceiver: Hide lock stats PII GoogleCloudSpannerReceiver: Mask lock stats PII Sep 30, 2022
@mx-psi
Copy link
Member

mx-psi commented Sep 30, 2022

Would using the redaction processor work for you?

@architjugran
Copy link
Contributor Author

Would using the redaction processor work for you?

@mx-psi Hey! I don't think redaction processor will work for my use-case.
Reason being I want to modify the value of the metric label in a certain manner.

Example: if value is "Table_name(key1, key2, key1...)" , i will modify it to be "Table_name(hash1, hash2, hash1...)" if the user wants to. As you can see, each key will correspond to a hashed value, and table_name will not be hashed.
Hence I don't think above functionality is possible via redaction processor.

@mx-psi
Copy link
Member

mx-psi commented Oct 5, 2022

Example: if value is "Table_name(key1, key2, key1...)" , i will modify it to be "Table_name(hash1, hash2, hash1...)" if the user wants to. As you can see, each key will correspond to a hashed value, and table_name will not be hashed.

Right, then I think the transform processor should work if you use replace_all_patterns, but you won't get the hashes, just a dummy value. It would be nice to get the CODEOWNERs opinion on whether the transform processor should support something like this (support redacting PII while still being able to tell that two values were the same) cc @TylerHelmuth @kentquirk @bogdandrutu.

Either way, I think the status quo is to accept overlapping functionality if it makes it simpler to configure a pipeline, so no objection from my side to merging this after a review.

@architjugran
Copy link
Contributor Author

architjugran commented Oct 7, 2022

Thanks @mx-psi for the explanaiton.
Could someone please review @mx-psi @dmitryax


func (mdp *MetricsDataPoint) HideLockStatsRowrangestartkeyPII() {
for index, labelValue := range mdp.labelValues {
if labelValue.Metadata().Name() == "row_range_start_key" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this label available on a single metric or on multiple metrics? (Trying to figure out how the configuration should look like)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This label is available on a single metric.

func parseAndHashRowrangestartkey(key string) string {
startIndexKeys := strings.Index(key, "(")
substring := key[startIndexKeys+1 : len(key)-1]
hashedKey := key[:startIndexKeys+1]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest using a strings.Builder here for performance

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done . Updated PR is at #16343 because this one got closed because of inactivity

@github-actions
Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label Oct 22, 2022
@github-actions
Copy link
Contributor

github-actions bot commented Nov 6, 2022

Closed as inactive. Feel free to reopen if this PR is still being worked on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants