-
Notifications
You must be signed in to change notification settings - Fork 527
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Serialize vars early to avoid living references #3409
Conversation
Codecov ReportAttention: Patch coverage is
✅ All tests successful. No failed tests found.
Additional details and impacted files@@ Coverage Diff @@
## master #3409 +/- ##
==========================================
- Coverage 79.78% 79.78% -0.01%
==========================================
Files 133 133
Lines 14418 14418
Branches 3036 3036
==========================================
- Hits 11504 11503 -1
+ Misses 2083 2082 -1
- Partials 831 833 +2
|
ee83663
to
cbbab06
Compare
cbbab06
to
c02a8ff
Compare
(I'm the person who originally reported the bug to Sentry) I'm worried that this approach is going to severely restrict our ability to scrub PII from the data we send to Sentry. By serialising the frame locals prior to the scrubber, we lose context about the Python type of the variables. This means we can no longer identify sensitive data by object type and scrub it out. For example, with this proposed change, I think a Django model instance would be converted to a string that contained PII before it got to the scrubber (for example, if the model To give some context, we're currently using the Event Scrubber to convert variables with potential PII into something that we can use to identify the content later, without sending that content to Sentry (as often the content of the variable that contains PII is critical to identifying and reproducing an issue). Am I correct here or is there another way that we can achieve PII scrubbing based on data type? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, see comment
sentry_sdk/serializer.py
Outdated
return None | ||
|
||
return False | ||
return is_vars |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't comment on the specific lines because not part of the diff, but:
L123: return type is now always bool
Please also update the docstring
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed method and documented serialize
instead.
hmm @philipstarkey I see, that's a valid concern. So the reason I looked at this again was because of a report where we patch the original Further, the scrubber was meant to be a simpler I'm going to go ahead with merging this PR, but at the same time I will expose another |
e4f30c3
to
bd67ea4
Compare
@sl0thentr0py Thanks! I appreciate the consideration. I've only had a very cursory look at the serialization code but it occurred to me that being able to provide a custom implementation for |
@philipstarkey we actually already have that but we haven't documented it openly since it's a footgun. You can set sentry-python/sentry_sdk/serializer.py Line 271 in bd67ea4
Could you try that and let me know if it works for you? |
@sl0thentr0py I did happen to see that actually! It's where I got the idea from 🙂 Unfortunately
Now that this is merged I'd also be happy to continue the conversation in a new issue if you'd like. |
aha okay, so we have 2 options, both of which I'm fine with, so I'll let you choose which one you prefer:
|
@sl0thentr0py Thanks, yes, I think we prefer option 2 - the option for a custom Thanks for giving us the choice 🙂 |
Since we added the
recursive
option for theEventScrubber
in #2755, vars nested inside could be replaced withAnnotatedValue
and potentially break userland code.We really shouldn't be holding references to
frame.f_locals
throughout our SDK, this has all sorts of breakage potential.This is a bit hacky, but we'll simply call
serialize
early onvars
. I tried patchingdeepcopy
in #3392 to work with our requirements, but I was simply reimplementing most of this serialize function.Turns out @sentrivana did the same in #2117