Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(normalization): Optionally mark scrubbed transactions as sanitized #1917

Merged
merged 11 commits into from
Mar 13, 2023

Conversation

jjbayer
Copy link
Member

@jjbayer jjbayer commented Mar 10, 2023

If the corresponding feature flag is set, mark transactions that were scrubbed by regex patterns as sanitized instead of url.

Motivation

If we discovered identifiers in a URL transaction by pattern matching, we previously did not risk marking it as sanitized, because it could theoretically still contain user names or other high-cardinality identifiers. But in practice this seems to conservative, and we'd rather assume that a URL scrubbed by pattern matching is low-cardinality enough.

Risks

If there is a second identifier in the URL that neither the pattern-matching nor the clusterer discovered, we mark something high-cardinality as low-cardinality. This might add noise to the product and an increase in transaction metrics being rate-limited by the cardinality limiter.

https://github.com/getsentry/team-ingest/issues/69

@jjbayer jjbayer changed the title Feat/tx mark as sanitized feat(normalization): Optionally mark scrubbed transactions as sanitized Mar 10, 2023
@@ -124,8 +124,7 @@ pub unsafe extern "C" fn relay_store_normalizer_normalize_event(
measurements_config: None, // only supported in relay
breakdowns_config: None, // only supported in relay
normalize_user_agent: config.normalize_user_agent,
normalize_transaction_name: false, // only supported in relay
tx_name_rules: &[], // only supported in relay
transaction_name_config: Default::default(), // only supported in relay
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of adding a third transaction name related parameter to LightNormalizationConfig, I decided to put it in its own struct.

.transaction_info
.get_or_insert_with(Default::default)
.source;
source.set_value(Some(TransactionSource::Sanitized));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the actual relevant change.

@jjbayer jjbayer marked this pull request as ready for review March 10, 2023 13:36
@jjbayer jjbayer requested a review from a team March 10, 2023 13:36
Copy link
Contributor

@olksdr olksdr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall LGTM

Comment on lines 33 to 34
pub fn new(name_config: &'r TransactionNameConfig<'r>) -> Self {
Self { name_config }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just suggestion, so if instead of providing reference here and implementing default with static name config, we could accept here a reference to LightNormalizationConfig and then build the config inside of the new? Like following:

Suggested change
pub fn new(name_config: &'r TransactionNameConfig<'r>) -> Self {
Self { name_config }
pub fn new(config: &'r LightNormalizationConfig) -> Self {
let name_config = TransactionNameConfig {
scrub_identifiers: config.transaction_name_config.scrub_identifiers,
mark_scrubbed_as_sanitized: config.transaction_name_config.mark_scrubbed_as_sanitized,
rules: config.transaction_name_config.rules,
};
Self { name_config }
}

This way you could still #[derive(Default)] on TransactionsProcessor and in normalize.rs you could just do

let mut transactions_processor =
            transactions::TransactionsProcessor::new(config);

And of course update tests to use new new function 😄

Right now I'm not sure what would be better, maybe having this the way it is now it's a bit clearer and have better separation of concerns.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a better idea now, let's make light_normalize_event accept not a reference but a a moved variable of LightNormalizationConfig and then we could easily move transaction_name_config out of it and move it into new function.
This way I think the code will code much cleaner and also there won't be additional jumps through the reference when that not really needed.

@jjbayer Let me know what you think and we could pair on this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, done in e2633be.

jjbayer added a commit to getsentry/sentry that referenced this pull request Mar 13, 2023
Pass a feature flag to Relay to indicate that all normalized transaction
names should be marked as `sanitized`. See
getsentry/relay#1917.

getsentry/team-ingest#69
@jjbayer jjbayer enabled auto-merge (squash) March 13, 2023 10:44
@jjbayer jjbayer merged commit 1c26d49 into master Mar 13, 2023
@jjbayer jjbayer deleted the feat/tx-mark-as-sanitized branch March 13, 2023 11:02
jan-auer added a commit that referenced this pull request Mar 17, 2023
…or-slug-parameter

* master: (21 commits)
  ci: switch to using GitHub artefacts instead of ghcr for PR testing (#1918)
  ref: Add new mobile vitals (#1943)
  release: 23.3.0
  ref(profiling): Remove platform validation (#1933)
  ref(relay_cache): Inject upstream realy into the service (#1932)
  feat(csp): Add the request's origin to CSP events (#1934)
  chore(common): Remove unused tryf macro (#1931)
  ref(test): Remove actix test utilities (#1930)
  ref(actix): Refactor shutdown without actix (#1912)
  ref(outcomes): Do not use registry in outcome producer (#1927)
  ref: Inject services in project cache and project upstream (#1926)
  feat(config): Add links to docs in YAML config file (#1923)
  feat(mobile): Move device.class from contexts to tags (#1911)
  fix(ci): Use correct Rust toolchain in Beta CI (#1925)
  ci(gha): Remove unmaintained actions-rs actions (#1922)
  feat(normalization): Optionally mark scrubbed transactions as sanitized (#1917)
  chore: Apply import grouping from coding style (#1921)
  feat(general): Fix leaked ip address (#1892)
  fix(normalize): No original_value for untouched transactions (#1916)
  chore(build): Update rust toolchain to 1.68.0 (#1914)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants