Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ref(sampling): Refactor dynamic sampling #2514

Merged
merged 63 commits into from
Sep 26, 2023
Merged

ref(sampling): Refactor dynamic sampling #2514

merged 63 commits into from
Sep 26, 2023

Conversation

TBS1996
Copy link
Contributor

@TBS1996 TBS1996 commented Sep 14, 2023

  • Got rid of the old style of SamplingResult, now we save the entire sampling match along with a flag of whether it's kept or not.
  • SamplingResult now has a Pending variant which suggests that dynamic sampling has yet to be run. The alternative, Option, was too vague for my taste.
  • Introduced SamplingEvaluator, a state machine for matching sampling rules.
  • Removing a lot of responsibilities of the sampling crate, in order to make it more generic and reusable.
  • More general refactoring, including a lot of simplified API for functions.
  • Moved dynamic sampling tests from lib.rs to wherever the functionality they test resides.
  • Deleted many tests that relied on the old APIs, or tested responsibilities that the sampling crate no longer has.
  • Created many new tests.

#skip-changelog

TBS1996 added a commit that referenced this pull request Sep 14, 2023
1. We are refactoring dynamic sampling in #2514, making these tests
outdated.
2. it references a dynamic sampling function id like to make private

let service = create_test_processor(config);

#[test]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changing the signature of compute_sampling_decision made this test a lot simpler

Comment on lines +2374 to +2378
if (sampling_config.is_none() || event.is_none())
&& (root_sampling_config.is_none() || dsc.is_none())
{
return SamplingResult::NoMatch;
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Early return.

I don't like that the type system doesn't represent how sampling_config and event is related. and how root_sampling_config and dsc is connected.

Perhaps some new struct which encapsulates both the configuration and rules, and the instance to match on. Would be a separate PR of course.

state.envelope().dsc(),
state.event.value(),
);
fn compute_sampling_decision(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed the &self because it was only used to check if processing is enabled, which also made it way easier to unit test.

Not sure if it should still be under EnvelopeProcessorService though, how about we move it to utils like the utils::get_sampling_result?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest to revert this change for three reasons:

  1. The parameter list of this function is long. That was also the reason why ProcessEnvelopeState was introduced in the first place, to pass processing information together.
  2. We will soon need more fields from the state, such as access to Redis as well as the received timestamp (it is currently a bug that this function uses Utc::now()).
  3. Associated functions like this one that neither work with the type that they are declared on, nor return it, are unidiomatic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alright, i'll revert it, although personally I prefer it like this. My thinking was that we would clearly delineate the process_state function which runs entirely on side effects, and it would in turn call pure functions, meaning we would avoid passing in state any further from processing_state.

I'd say the long parameter list might tell us something about this function that simply passing in state would mask. Passing in state here feels a lot like using global variables.

in the function (as expressed by the first early return) you can see how 'event' and 'sampling_config' need each other, same with dsc and root_sampling_config. Perhaps we could explore a newtype that encapsulates the "Getter" and a sampling config in a later PR?

3: I agree its not idiomatic, so i think it would be good in utils of the same crate

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok so, reverting it leads to complicating many tests I wrote that relied on its functional approach, where you have to start the service, create a mock processenvelopestate and projectstate to check if a field on processenvelopestate has changed.

could you verify again if you think I should revert this? my suggestion would be to keep it functional, move it to utils, and let the long param list stand as a sign for us to think about refactoring in the future (rather than hiding the params in state).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Given this is an internal function of the processor, we can easily revisit this at a later point. There's limited exposure of this API. Let's move forward.

@TBS1996 TBS1996 requested a review from a team September 24, 2023 11:37
Copy link
Member

@jjbayer jjbayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, though a refactor like this makes it hard to catch logical errors. I'll leave final review to @jan-auer.

Comment on lines +43 to +46
/// Returns true if no rule have matched.
pub fn is_no_match(&self) -> bool {
!self.is_match()
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO there's no benefit in writing if foo.is_no_match() over ! foo.is_match().

Suggested change
/// Returns true if no rule have matched.
pub fn is_no_match(&self) -> bool {
!self.is_match()
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, I took inspiration from is_some()/is_none() from std, id say it makes it a little more readable but i see your point too


use crate::dsc::TraceUserContext;
use crate::tests::{and, eq, glob, not, or};
use crate::DynamicSamplingContext;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NB: it's great you're moving these tests now. I hadn't moved them previously since they depend on the DSC, and these rules should be independent of implementations of the Getter trait.

In a follow-up, we'll move this module into another crate and then also update the tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was debating the idea of making a dummy Getter struct, just to make it more clear that this crate shouldn't care about implementations, but since we want to keep DynamicSamplingContext in this crate then it's simpler just to use that. All the tests that used Event I've replaced with dsc.

state.envelope().dsc(),
state.event.value(),
);
fn compute_sampling_decision(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest to revert this change for three reasons:

  1. The parameter list of this function is long. That was also the reason why ProcessEnvelopeState was introduced in the first place, to pass processing information together.
  2. We will soon need more fields from the state, such as access to Redis as well as the received timestamp (it is currently a bug that this function uses Utc::now()).
  3. Associated functions like this one that neither work with the type that they are declared on, nor return it, are unidiomatic.

Comment on lines 95 to 96
let sampling_result: SamplingResult = evaluator.match_rules(dsc.trace_id, dsc, rules).into();
Some(sampling_result.should_keep())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of converting to SamplingResult, consider to match the RuleMatchingState directly.

Suggested change
let sampling_result: SamplingResult = evaluator.match_rules(dsc.trace_id, dsc, rules).into();
Some(sampling_result.should_keep())
Some(match evaluator.match_rules(dsc.trace_id, dsc, rules) {
RuleMatchingState::SamplingMatch(m) => m.should_keep(),
_ => false,
})

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, im not sure if it's obvious that a lack of match should result in true here, case in point: in your example you return false whereas it should be true. Maybe it's better to use the implementation in SamplingResult for that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we return the bool manually we're essentially re-implementing this behaviour:

impl From<Evaluation> for SamplingResult {
    fn from(value: Evaluation) -> Self {
        match value {
            Evaluation::Matched(sampling_match) => Self::Match(sampling_match),
            Evaluation::Continue(_) => Self::NoMatch,
        }
    }
}
    /// Returns `true` if the event should be kept.
    pub fn should_keep(&self) -> bool {
        match self {
            SamplingResult::Match(sampling_match) => sampling_match.should_keep(),
            // If no rules matched on an event, we want to keep it.
            SamplingResult::NoMatch => true,
            SamplingResult::Pending => true,
        }
    }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the original implementation, it turns into Keep if there's no match in fn determine_from_sampling_match, which becomes 'true' here:

    let sampled = match sampling_result {
        SamplingResult::Keep => true,
        SamplingResult::Drop(_) => false,

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're absolutely right, the false was an outright typo in my suggestion :) I was slightly concerned with the additional conversion into a sampling result, however my own typo is a good example of why this sort of deduplication should exist.

Let's keep it. A suggestion on how to format it, though:

let evaluation = evaluator.match_rules(dsc.trace_id, dsc, rules);
Some(SamplingResult::from(evaluation).should_keep())

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, thanks! that looks a lot nicer than what I wrote

@TBS1996 TBS1996 requested a review from jan-auer September 26, 2023 11:40
Copy link
Contributor

@iker-barriocanal iker-barriocanal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have we considered splitting this refactor into smaller ones so that it's easier to review and identify regressions?

@jan-auer
Copy link
Member

@iker-barriocanal next time we should try to make some of these changes in smaller increments. Now that review has almost concluded, we can keep the PR.

@TBS1996
Copy link
Contributor Author

TBS1996 commented Sep 26, 2023

Have we considered splitting this refactor into smaller ones so that it's easier to review and identify regressions?

I regret not having a separate PR for moving the tests, but other than that, the reason it's so huge is because the new implementation changed the API used for tests, and there was a lottt of tests here.

at this point it's easier to just go forward with it.

@TBS1996 TBS1996 merged commit 7bb785a into master Sep 26, 2023
@TBS1996 TBS1996 deleted the tor/ref_ds branch September 26, 2023 12:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants