Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LG-14334 | Don't log redacted values #11158

Merged
merged 2 commits into from
Sep 3, 2024
Merged

Conversation

n1zyy
Copy link
Member

@n1zyy n1zyy commented Aug 28, 2024

🎫 Ticket

Link to the relevant ticket:
LG-14334

🛠 Summary of changes

Note: This is a discussion about the literal string [redacted] being logged. Its use in the following paragraphs is about the literal word "redacted" in brackets, not a representation that information has been redacted.

We log an analytics event with the ThreatMetrix response as a hash, which includes several hundred fields—so many that CloudWatch stops parsing them. (It has a limit of 200 fields that it will parse out.) This means that sometimes, important attributes cannot be queried through CloudWatch because they were the >200th field.

A large number of these logged parameters have the literal value "[redacted]", so they are effectively junk. These come from the ResponseRedacter class, which replaces any non-allowlisted attribute with "[redacted]" to guard against accidentally logging PII.

Because the sheer volume of unusable attributes is breaking things, this PR changes the ResponseRedacter to only return the non-redacted attributes. Fields which previously would have had their value replaced with "[redacted]" are now not logged at all. This means that:

  • PII safeguards remain in place (and the test asserts this).
  • We should now have <200 attributes (see ALLOWED_RESPONSE_FIELDS in the redacter class)
  • It is impossible that any of the fields we are removing are being used for anything, because the value would always have been the literal string "[redacted]".

changelog: Internal, Logging, Don't log "[REDACTED]" fields in TMX response since they are useless
@n1zyy
Copy link
Member Author

n1zyy commented Aug 28, 2024

There are actually 185 attributes allowed. It's possible that other analytics parameters might still push us to a bit over 200. We could probably pare this list down further, but I would advocate that any such work be done in a followup PR.

@@ -197,12 +197,7 @@ class ResponseRedacter
def self.redact(hash)
return { error: 'TMx response body was empty' } if hash.nil?
return { error: 'TMx response body was malformed' } unless hash.is_a? Hash
filtered_response_h = hash.slice(*ALLOWED_RESPONSE_FIELDS)
unfiltered_keys = hash.keys - filtered_response_h.keys
Copy link
Member Author

@n1zyy n1zyy Aug 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I briefly had the thought that we could keep this line, and then do something like:

filtered_response_h['redacted_keys'] = unfiltered_keys.join(", ")

I don't see a need for this, though.

filtered_response_h[key] = '[redacted]'
end
filtered_response_h
hash.slice(*ALLOWED_RESPONSE_FIELDS)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can read this PR as lines 201-204 being deleted, and then me removing the variable assignment on line 200.

@n1zyy n1zyy merged commit c1d5d0b into main Sep 3, 2024
2 checks passed
@n1zyy n1zyy deleted the mattw/LG-14334-redacted-to-baleeted branch September 3, 2024 19:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants