-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(normalization): Optionally mark scrubbed transactions as sanitized #1917
Conversation
@@ -124,8 +124,7 @@ pub unsafe extern "C" fn relay_store_normalizer_normalize_event( | |||
measurements_config: None, // only supported in relay | |||
breakdowns_config: None, // only supported in relay | |||
normalize_user_agent: config.normalize_user_agent, | |||
normalize_transaction_name: false, // only supported in relay | |||
tx_name_rules: &[], // only supported in relay | |||
transaction_name_config: Default::default(), // only supported in relay |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of adding a third transaction name related parameter to LightNormalizationConfig
, I decided to put it in its own struct.
.transaction_info | ||
.get_or_insert_with(Default::default) | ||
.source; | ||
source.set_value(Some(TransactionSource::Sanitized)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the actual relevant change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall LGTM
pub fn new(name_config: &'r TransactionNameConfig<'r>) -> Self { | ||
Self { name_config } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just suggestion, so if instead of providing reference here and implementing default with static name config, we could accept here a reference to LightNormalizationConfig
and then build the config inside of the new? Like following:
pub fn new(name_config: &'r TransactionNameConfig<'r>) -> Self { | |
Self { name_config } | |
pub fn new(config: &'r LightNormalizationConfig) -> Self { | |
let name_config = TransactionNameConfig { | |
scrub_identifiers: config.transaction_name_config.scrub_identifiers, | |
mark_scrubbed_as_sanitized: config.transaction_name_config.mark_scrubbed_as_sanitized, | |
rules: config.transaction_name_config.rules, | |
}; | |
Self { name_config } | |
} | |
This way you could still #[derive(Default)]
on TransactionsProcessor
and in normalize.rs
you could just do
let mut transactions_processor =
transactions::TransactionsProcessor::new(config);
And of course update tests to use new new
function 😄
Right now I'm not sure what would be better, maybe having this the way it is now it's a bit clearer and have better separation of concerns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a better idea now, let's make light_normalize_event
accept not a reference but a a moved variable of LightNormalizationConfig
and then we could easily move transaction_name_config
out of it and move it into new
function.
This way I think the code will code much cleaner and also there won't be additional jumps through the reference when that not really needed.
@jjbayer Let me know what you think and we could pair on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, done in e2633be.
Pass a feature flag to Relay to indicate that all normalized transaction names should be marked as `sanitized`. See getsentry/relay#1917. getsentry/team-ingest#69
…or-slug-parameter * master: (21 commits) ci: switch to using GitHub artefacts instead of ghcr for PR testing (#1918) ref: Add new mobile vitals (#1943) release: 23.3.0 ref(profiling): Remove platform validation (#1933) ref(relay_cache): Inject upstream realy into the service (#1932) feat(csp): Add the request's origin to CSP events (#1934) chore(common): Remove unused tryf macro (#1931) ref(test): Remove actix test utilities (#1930) ref(actix): Refactor shutdown without actix (#1912) ref(outcomes): Do not use registry in outcome producer (#1927) ref: Inject services in project cache and project upstream (#1926) feat(config): Add links to docs in YAML config file (#1923) feat(mobile): Move device.class from contexts to tags (#1911) fix(ci): Use correct Rust toolchain in Beta CI (#1925) ci(gha): Remove unmaintained actions-rs actions (#1922) feat(normalization): Optionally mark scrubbed transactions as sanitized (#1917) chore: Apply import grouping from coding style (#1921) feat(general): Fix leaked ip address (#1892) fix(normalize): No original_value for untouched transactions (#1916) chore(build): Update rust toolchain to 1.68.0 (#1914) ...
If the corresponding feature flag is set, mark transactions that were scrubbed by regex patterns as
sanitized
instead ofurl
.Motivation
If we discovered identifiers in a URL transaction by pattern matching, we previously did not risk marking it as
sanitized
, because it could theoretically still contain user names or other high-cardinality identifiers. But in practice this seems to conservative, and we'd rather assume that a URL scrubbed by pattern matching is low-cardinality enough.Risks
If there is a second identifier in the URL that neither the pattern-matching nor the clusterer discovered, we mark something high-cardinality as low-cardinality. This might add noise to the product and an increase in transaction metrics being rate-limited by the cardinality limiter.
https://github.com/getsentry/team-ingest/issues/69