-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ref(tx-processor): Apply transaction rules after UUID scrubbing #1964
Conversation
1c5efd0
to
abc0d69
Compare
.get_or_insert_with(Default::default) | ||
.source; | ||
source.set_value(Some(TransactionSource::Sanitized)); | ||
if self.name_config.scrub_identifiers { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Should we rename the scrub_identifiers
flag in config to normalize
?
I've been thinking about naming for a while, I think we need unambiguous names to refer to the different parts of transaction name handling but haven't been able to come up with good ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree the names could be better, but I don't have any suggestions (besides being very explicit and setting extremely long names, which doesn't really improve things I guess). Since I expect to remove these feature flags soonish, I didn't rename them. That said, happy to refactor the names if you come up with a good suggestion.
if self.name_config.mark_scrubbed_as_sanitized { | ||
// TODO(iker): we also mark transaction as sanitized after | ||
// applying renaming rules. Once we remove this feature | ||
// flag, relay should only mark transactions as sanitized | ||
// once. | ||
event | ||
.transaction_info | ||
.get_or_insert_with(Default::default) | ||
.source | ||
.set_value(Some(TransactionSource::Sanitized)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might not matter in practice, but I would put this block after apply_transaction_rename_rule
. Basically
- Apply sanitation step 1
- Apply sanitation step 2
- Mark as sanitized
Right now the order is
- Apply sanitation step 1
- Mark as sanitized
- Apply sanitation step 2
which feels unnatural to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docs of mark_scrubbed_as_sanitized
are as follows:
True if transaction names scrubbed by regex patterns should be marked as
TransactionSource::Sanitized
.Transaction names modified by clusterer rules are always marked as such.
So the code follows these steps:
- Apply sanitation step 1.
- Mark as sanitized, resulting from step 1.
- Apply sanitation step 2 (which also marks as sanitized, so no
4
needed).
I think 1 and 2 belong together. I agree that this could be improved, and that's why I added the TODO
, but I wanted to minimize refactoring as much as possible until we have more information on how this looks and what we do with the feature flags..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True if transaction names scrubbed by regex patterns should be marked as TransactionSource::Sanitized.
That doc was written before you removed the if changed
condition though (by which it actually belonged to pattern-based matching). By removing that condition, the real consequence of mark_scrubbed_as_sanitized
is that every url
transaction will be marked as sanitized as long as the normalize-transaction-names
feature flag is on.
if meta.original_value().is_none() { | ||
meta.set_original_value(Some(transaction.clone())); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to do this with get_or_insert_with
, but I couldn't figure out how. Happy to change it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is nice and explicit.
@@ -1675,7 +1684,7 @@ mod tests { | |||
let json = r#" | |||
{ | |||
"type": "transaction", | |||
"transaction": "/foo/2fd4e1c67a2d28fced849ee1bb76e7391b93eb12/user/123/0/", | |||
"transaction": "/foo/rule-target/user/123/0/", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason for these changes is to avoid UUID scrubbing rules running first in all fields. They are already running for 123
so that is good enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #1964 (comment).
if meta.original_value().is_none() { | ||
meta.set_original_value(Some(transaction.clone())); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is nice and explicit.
if matches!(event.get_transaction_source(), &TransactionSource::Url) | ||
&& self.name_config.mark_scrubbed_as_sanitized |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed offline, we might want to add an an additional condition which is scrub_identifiers_changed_something OR rename_rules.len() > 0
, to prevent projects that do not have any rules yet from marking everything as sanitized. But that is actually out of scope for this PR.
* master: ref(tx-processor): Apply transaction rules after UUID scrubbing (#1964)
Applying transaction rules before UUID scrubbing makes it impossible to track what rules were applied. It's interesting to know what rules are useful to reduce the high cardinality, so that we can use them again over other rules that are less interesting or provide no value. Applying these rules after UUID scrubbing leaves no room for further modification of the transaction name, so the resulting transaction name is guaranteed to be the direct output of a rule.
This is the first step in the process of discarding the rules that provide no value and prioritizing those that do.
Ref: https://github.com/getsentry/team-ingest/issues/14.