-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(perf): Extract scrubbed IP addresses into the span.domain
tag
#3383
Conversation
- extract new functions - rename unclear variables
@gggritso thank you for picking this up and for improving separation of concerns!
That would be OK with me! As long as we follow up with some cardinality analysis.
That would be ideal, but IMO the PR is still small enough to keep them both in one.
See PR comment. |
relay-event-normalization/src/normalize/span/description/mod.rs
Outdated
Show resolved
Hide resolved
relay-event-normalization/src/normalize/span/description/mod.rs
Outdated
Show resolved
Hide resolved
Any IP that isn't loopback is fully scrubbed out.
Use pattern-matching on the tuple instead of long conditionals.
span.domain
tagspan.domain
tag
@jjbayer I made a few major changes:
Thanks for taking an early look! |
relay-event-normalization/src/normalize/span/description/mod.rs
Outdated
Show resolved
Hide resolved
relay-event-normalization/src/normalize/span/description/mod.rs
Outdated
Show resolved
Hide resolved
relay-event-normalization/src/normalize/span/description/mod.rs
Outdated
Show resolved
Hide resolved
relay-event-normalization/src/normalize/span/description/mod.rs
Outdated
Show resolved
Hide resolved
relay-event-normalization/src/normalize/span/description/mod.rs
Outdated
Show resolved
Hide resolved
Instead of allocating strings (especially since it's just for localhost), use the provided constants.
We're looking for a defined host and undefined port, rather than an `Option` host.
@jjbayer made another round of changes! Should be pretty close now 🙌🏻 |
relay-event-normalization/src/normalize/span/description/mod.rs
Outdated
Show resolved
Hide resolved
relay-event-normalization/src/normalize/span/description/mod.rs
Outdated
Show resolved
Hide resolved
relay-event-normalization/src/normalize/span/description/mod.rs
Outdated
Show resolved
Hide resolved
relay-event-normalization/src/normalize/span/description/mod.rs
Outdated
Show resolved
Hide resolved
relay-event-normalization/src/normalize/span/description/mod.rs
Outdated
Show resolved
Hide resolved
relay-event-normalization/src/normalize/span/description/mod.rs
Outdated
Show resolved
Hide resolved
relay-event-normalization/src/normalize/span/description/mod.rs
Outdated
Show resolved
Hide resolved
Co-authored-by: David Herberth <[email protected]>
- accept `&str` - return `Cow`
Don't bother with one-time-long allow lists. Use a `match` and don't allocate anything.
@Dav1dde @jjbayer thanks for bearing with me, and for the thorough review 🙏🏻 I tried to implement all your suggestions, but I might have missing something, please let me know! The major changes:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bunch of docs nits and I don't know why they weren't caught by CI :(
relay-event-normalization/src/normalize/span/description/mod.rs
Outdated
Show resolved
Hide resolved
relay-event-normalization/src/normalize/span/description/mod.rs
Outdated
Show resolved
Hide resolved
relay-event-normalization/src/normalize/span/description/mod.rs
Outdated
Show resolved
Hide resolved
relay-event-normalization/src/normalize/span/description/mod.rs
Outdated
Show resolved
Hide resolved
relay-event-normalization/src/normalize/span/description/mod.rs
Outdated
Show resolved
Hide resolved
relay-event-normalization/src/normalize/span/description/mod.rs
Outdated
Show resolved
Hide resolved
Co-authored-by: David Herberth <[email protected]>
Co-authored-by: David Herberth <[email protected]>
Co-authored-by: David Herberth <[email protected]>
Co-authored-by: David Herberth <[email protected]>
Co-authored-by: David Herberth <[email protected]>
Co-authored-by: David Herberth <[email protected]>
Co-authored-by: David Herberth <[email protected]>
Co-authored-by: David Herberth <[email protected]>
…-domain-tags-ii' into fix/perf/extract-ip-domain-tags-ii
@Dav1dde updated! Added/fixed docstrings and committed your |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tl;dr URLs with IP domains like
http://8.8.8.8/data
get thespan.domain
tag of*.*.*.*
. Also a small refactor to support this.Right now, if a span's description looks like
GET http://8.8.8.8/data
the scrubbedspan.description
isGET http://*.*.8.8/
andspan.domain
isNone
. This is inconsistent, and also makes for a bad user experience because people cannot find requests to certain IPs by the domain.There are two causes:
8.8.8.8
into*.*.8.8
because the scrubber thinks the last two 8s are a TLDURL
and are invalid. The parsing fails, and nodomain
tag is produced.This PR makes some refactors, and turns off scrubbing of IP host strings.
Changes
normalize_domain
-->scrub_host
. Instead of a function that takes a domain string and a port and does the asterisking, the function accepts aUrl::Host
and does different things based on whether it's an IP or a qualified domain. This is nicer because the name is clearer and the function does fewer thingsscrub_domain_name
is a new function that only replaces a string with asterisks. This makes it possible to call this function independently of other scrubbing logic (e.g., to forgo calling it for IP addresses)concatenate_host_and_port
is a simple concatenator, since I saw this logic in a few placesscrub_ipv4
andscrub_ipv6
that fully scrub out IP addressesQuestions
This will increase the cardinality by the number of ports used with IP addresses in the wild.
I think we'll probably want to do some Big Query research to see how many unique IPs are in the dataset, and maybe only partially scrub the IP (e.g.,
*.*.*.67
).Otherwise, there are code-level questions:
1. Should I separate this PR out into the refactor and a separate PR to disable IP scrubbing?2. I had a hard time deciding when to return/accept
&str
andString
. I ended up returningString
in most places, since I think the functions that get the object back to should ownership?3. Overall, I'm not confident about Rust style here, you tell me!