-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ref(sampling): Refactor dynamic sampling #2514
Conversation
1. We are refactoring dynamic sampling in #2514, making these tests outdated. 2. it references a dynamic sampling function id like to make private
|
||
let service = create_test_processor(config); | ||
|
||
#[test] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changing the signature of compute_sampling_decision
made this test a lot simpler
if (sampling_config.is_none() || event.is_none()) | ||
&& (root_sampling_config.is_none() || dsc.is_none()) | ||
{ | ||
return SamplingResult::NoMatch; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Early return.
I don't like that the type system doesn't represent how sampling_config
and event
is related. and how root_sampling_config
and dsc
is connected.
Perhaps some new struct which encapsulates both the configuration and rules, and the instance to match on. Would be a separate PR of course.
state.envelope().dsc(), | ||
state.event.value(), | ||
); | ||
fn compute_sampling_decision( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed the &self
because it was only used to check if processing is enabled, which also made it way easier to unit test.
Not sure if it should still be under EnvelopeProcessorService
though, how about we move it to utils
like the utils::get_sampling_result
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest to revert this change for three reasons:
- The parameter list of this function is long. That was also the reason why
ProcessEnvelopeState
was introduced in the first place, to pass processing information together. - We will soon need more fields from the state, such as access to Redis as well as the
received
timestamp (it is currently a bug that this function usesUtc::now()
). - Associated functions like this one that neither work with the type that they are declared on, nor return it, are unidiomatic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alright, i'll revert it, although personally I prefer it like this. My thinking was that we would clearly delineate the process_state
function which runs entirely on side effects, and it would in turn call pure functions, meaning we would avoid passing in state
any further from processing_state
.
I'd say the long parameter list might tell us something about this function that simply passing in state would mask. Passing in state
here feels a lot like using global variables.
in the function (as expressed by the first early return) you can see how 'event' and 'sampling_config' need each other, same with dsc and root_sampling_config. Perhaps we could explore a newtype that encapsulates the "Getter" and a sampling config in a later PR?
3: I agree its not idiomatic, so i think it would be good in utils
of the same crate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok so, reverting it leads to complicating many tests I wrote that relied on its functional approach, where you have to start the service, create a mock processenvelopestate
and projectstate
to check if a field on processenvelopestate has changed.
could you verify again if you think I should revert this? my suggestion would be to keep it functional, move it to utils, and let the long param list stand as a sign for us to think about refactoring in the future (rather than hiding the params in state).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. Given this is an internal function of the processor, we can easily revisit this at a later point. There's limited exposure of this API. Let's move forward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, though a refactor like this makes it hard to catch logical errors. I'll leave final review to @jan-auer.
/// Returns true if no rule have matched. | ||
pub fn is_no_match(&self) -> bool { | ||
!self.is_match() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO there's no benefit in writing if foo.is_no_match()
over ! foo.is_match()
.
/// Returns true if no rule have matched. | |
pub fn is_no_match(&self) -> bool { | |
!self.is_match() | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm, I took inspiration from is_some()
/is_none()
from std, id say it makes it a little more readable but i see your point too
|
||
use crate::dsc::TraceUserContext; | ||
use crate::tests::{and, eq, glob, not, or}; | ||
use crate::DynamicSamplingContext; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NB: it's great you're moving these tests now. I hadn't moved them previously since they depend on the DSC, and these rules should be independent of implementations of the Getter
trait.
In a follow-up, we'll move this module into another crate and then also update the tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was debating the idea of making a dummy Getter struct, just to make it more clear that this crate shouldn't care about implementations, but since we want to keep DynamicSamplingContext
in this crate then it's simpler just to use that. All the tests that used Event
I've replaced with dsc.
state.envelope().dsc(), | ||
state.event.value(), | ||
); | ||
fn compute_sampling_decision( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest to revert this change for three reasons:
- The parameter list of this function is long. That was also the reason why
ProcessEnvelopeState
was introduced in the first place, to pass processing information together. - We will soon need more fields from the state, such as access to Redis as well as the
received
timestamp (it is currently a bug that this function usesUtc::now()
). - Associated functions like this one that neither work with the type that they are declared on, nor return it, are unidiomatic.
let sampling_result: SamplingResult = evaluator.match_rules(dsc.trace_id, dsc, rules).into(); | ||
Some(sampling_result.should_keep()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of converting to SamplingResult
, consider to match the RuleMatchingState
directly.
let sampling_result: SamplingResult = evaluator.match_rules(dsc.trace_id, dsc, rules).into(); | |
Some(sampling_result.should_keep()) | |
Some(match evaluator.match_rules(dsc.trace_id, dsc, rules) { | |
RuleMatchingState::SamplingMatch(m) => m.should_keep(), | |
_ => false, | |
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, im not sure if it's obvious that a lack of match should result in true
here, case in point: in your example you return false
whereas it should be true
. Maybe it's better to use the implementation in SamplingResult
for that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we return the bool manually we're essentially re-implementing this behaviour:
impl From<Evaluation> for SamplingResult {
fn from(value: Evaluation) -> Self {
match value {
Evaluation::Matched(sampling_match) => Self::Match(sampling_match),
Evaluation::Continue(_) => Self::NoMatch,
}
}
}
/// Returns `true` if the event should be kept.
pub fn should_keep(&self) -> bool {
match self {
SamplingResult::Match(sampling_match) => sampling_match.should_keep(),
// If no rules matched on an event, we want to keep it.
SamplingResult::NoMatch => true,
SamplingResult::Pending => true,
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the original implementation, it turns into Keep
if there's no match in fn determine_from_sampling_match
, which becomes 'true' here:
let sampled = match sampling_result {
SamplingResult::Keep => true,
SamplingResult::Drop(_) => false,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're absolutely right, the false
was an outright typo in my suggestion :) I was slightly concerned with the additional conversion into a sampling result, however my own typo is a good example of why this sort of deduplication should exist.
Let's keep it. A suggestion on how to format it, though:
let evaluation = evaluator.match_rules(dsc.trace_id, dsc, rules);
Some(SamplingResult::from(evaluation).should_keep())
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice, thanks! that looks a lot nicer than what I wrote
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have we considered splitting this refactor into smaller ones so that it's easier to review and identify regressions?
@iker-barriocanal next time we should try to make some of these changes in smaller increments. Now that review has almost concluded, we can keep the PR. |
I regret not having a separate PR for moving the tests, but other than that, the reason it's so huge is because the new implementation changed the API used for tests, and there was a lottt of tests here. at this point it's easier to just go forward with it. |
Pending
variant which suggests that dynamic sampling has yet to be run. The alternative, Option, was too vague for my taste.SamplingEvaluator
, a state machine for matching sampling rules.lib.rs
to wherever the functionality they test resides.#skip-changelog