[CONTINT-3920][fakeintake] Make the store more generic #24063

AliDatadog · 2024-03-25T13:28:13Z

What does this PR do?

This PR makes the store more generic and replaces it by an interface.
Because GetJSONPayloads is only retrieving raw payloads and marshals them, we also replace this function by something more generic.
We add unit tests where needed
We also improve error handling / stopping the store where we can

Motivation

This is a first step towards having a fakeintake with a persistent db

Additional Notes

Possible Drawbacks / Trade-offs

None

Describe how to test/QA your changes

…place

pr-commenter · 2024-03-25T14:47:38Z

Test changes on VM

Use this command from test-infra-definitions to manually test this PR changes on a VM:

inv create-vm --pipeline-id=30848496 --os-family=ubuntu

KevinFairise2

Some comments, but that's nice a improvement to be able to use other stores! 🎉

KevinFairise2 · 2024-03-26T10:28:11Z

test/fakeintake/server/server.go

@@ -87,11 +87,15 @@ func NewServer(options ...func(*Server)) *Server {

 	registry := prometheus.NewRegistry()

+	storeMetrics := fi.store.GetMetrics()


Is it really useful? Do we have some case where a newly instantiated serverstore already contains metrics?

I think the store must register its metrics and the metrics will depend on the store implementation. For example on SQL we might want SQL metrics

Oh I missed the line fi.store.NbPayloads in the diff. Looks good to me then, maybe we could just have the naming a bit more explicit to indicate that we actually retrieve the number of payload in the store as a Prometheus metric. Otherwise it can be a bit misleading since the store can be used to store metrics sent from the agent

Do you have any suggestion on a better name ? GetMetrics() can return any metric we define for the given store. For example on SQL it should return latencies and SQL metrics

test/fakeintake/server/server.go

test/fakeintake/server/serverstore/store.go

AliDatadog · 2024-03-26T16:32:36Z

/merge

dd-devflow · 2024-03-26T16:32:40Z

🚂 MergeQueue

This merge request is not mergeable yet, because of pending checks/missing approvals. It will be added to the queue as soon as checks pass and/or get approvals.
Note: if you pushed new commits since the last approval, you may need additional approval.
You can remove it from the waiting list with /remove command.

Use /merge -c to cancel this operation!

pr-commenter · 2024-03-26T18:37:20Z

Regression Detector

Regression Detector Results

Run ID: 5b2d7b3-66c5-4b16-8b12-f1a928bdb3f8
Baseline: 53b2451
Comparison: c8d3903

Performance changes are noted in the perf column of each table:

✅ = significantly better comparison variant performance
❌ = significantly worse comparison variant performance
➖ = no significant change in performance

No significant changes in experiment optimization goals

Confidence level: 90.00%
Effect size tolerance: |Δ mean %| ≥ 5.00%

There were no significant changes in experiment optimization goals at this confidence level and effect size tolerance.

Experiments ignored for regressions

Regressions in experiments with settings containing erratic: true are ignored.

perf	experiment	goal	Δ mean %	Δ mean % CI
➖	file_to_blackhole	% cpu utilization	+2.35	[-4.02, +8.72]

Fine details of change detection per experiment

perf	experiment	goal	Δ mean %	Δ mean % CI
➖	file_to_blackhole	% cpu utilization	+2.35	[-4.02, +8.72]
➖	process_agent_standard_check	memory utilization	+0.70	[+0.65, +0.75]
➖	file_tree	memory utilization	+0.44	[+0.33, +0.55]
➖	basic_py_check	% cpu utilization	+0.43	[-2.10, +2.96]
➖	otel_to_otel_logs	ingress throughput	+0.24	[-0.19, +0.67]
➖	uds_dogstatsd_to_api_cpu	% cpu utilization	+0.09	[-2.70, +2.89]
➖	trace_agent_msgpack	ingress throughput	+0.02	[+0.01, +0.03]
➖	tcp_dd_logs_filter_exclude	ingress throughput	+0.00	[-0.03, +0.04]
➖	uds_dogstatsd_to_api	ingress throughput	-0.01	[-0.21, +0.19]
➖	trace_agent_json	ingress throughput	-0.02	[-0.04, +0.00]
➖	process_agent_standard_check_with_stats	memory utilization	-0.10	[-0.13, -0.06]
➖	tcp_syslog_to_blackhole	ingress throughput	-0.41	[-0.50, -0.32]
➖	idle	memory utilization	-0.44	[-0.48, -0.40]
➖	process_agent_real_time_mode	memory utilization	-0.50	[-0.54, -0.45]
➖	pycheck_1000_100byte_tags	% cpu utilization	-2.81	[-7.63, +2.01]

Explanation

A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".

For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:

Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.
Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.
Its configuration does not mark it "erratic".

dd-devflow · 2024-03-26T19:27:49Z

🚂 MergeQueue

Added to the queue.

This build is going to start soon! (estimated merge in less than 28m)

Use /merge -c to cancel this operation!

* Replace in memory store by an interface * Make sure we close the store in tests and in the server at the right place * rename as get internal metrics

AliDatadog added changelog/no-changelog qa/no-code-change No code change in Agent code requiring validation labels Mar 25, 2024

AliDatadog added this to the 7.53.0 milestone Mar 25, 2024

AliDatadog changed the title ~~[fakeintake] Make the store more generic~~ [CONTINT-3920][fakeintake] Make the store more generic Mar 25, 2024

AliDatadog added 2 commits March 25, 2024 14:40

Replace in memory store by an interface

8216622

Make sure we close the store in tests and in the server at the right …

d6c794b

…place

AliDatadog force-pushed the ali/generic-store branch from abce287 to d6c794b Compare March 25, 2024 13:43

AliDatadog marked this pull request as ready for review March 26, 2024 10:13

AliDatadog requested review from a team as code owners March 26, 2024 10:13

KevinFairise2 reviewed Mar 26, 2024

View reviewed changes

AliDatadog requested a review from KevinFairise2 March 26, 2024 11:47

KevinFairise2 approved these changes Mar 26, 2024

View reviewed changes

rename as get internal metrics

c8d3903

dd-mergequeue bot merged commit 6dd02e4 into main Mar 26, 2024
170 checks passed

dd-mergequeue bot deleted the ali/generic-store branch March 26, 2024 19:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CONTINT-3920][fakeintake] Make the store more generic #24063

[CONTINT-3920][fakeintake] Make the store more generic #24063

AliDatadog commented Mar 25, 2024

pr-commenter bot commented Mar 25, 2024 •

edited

Loading

KevinFairise2 left a comment

KevinFairise2 Mar 26, 2024

AliDatadog Mar 26, 2024

KevinFairise2 Mar 26, 2024

AliDatadog Mar 26, 2024

AliDatadog commented Mar 26, 2024

dd-devflow bot commented Mar 26, 2024

pr-commenter bot commented Mar 26, 2024

Experiments ignored for regressions

Fine details of change detection per experiment

Explanation

dd-devflow bot commented Mar 26, 2024

		@@ -87,11 +87,15 @@ func NewServer(options ...func(Server)) Server {

		registry := prometheus.NewRegistry()

		storeMetrics := fi.store.GetMetrics()

[CONTINT-3920][fakeintake] Make the store more generic #24063

[CONTINT-3920][fakeintake] Make the store more generic #24063

Conversation

AliDatadog commented Mar 25, 2024

What does this PR do?

Motivation

Additional Notes

Possible Drawbacks / Trade-offs

Describe how to test/QA your changes

pr-commenter bot commented Mar 25, 2024 • edited Loading

Test changes on VM

KevinFairise2 left a comment

Choose a reason for hiding this comment

KevinFairise2 Mar 26, 2024

Choose a reason for hiding this comment

AliDatadog Mar 26, 2024

Choose a reason for hiding this comment

KevinFairise2 Mar 26, 2024

Choose a reason for hiding this comment

AliDatadog Mar 26, 2024

Choose a reason for hiding this comment

AliDatadog commented Mar 26, 2024

dd-devflow bot commented Mar 26, 2024

pr-commenter bot commented Mar 26, 2024

Regression Detector

Regression Detector Results

No significant changes in experiment optimization goals

Experiments ignored for regressions

Fine details of change detection per experiment

Explanation

dd-devflow bot commented Mar 26, 2024

pr-commenter bot commented Mar 25, 2024 •

edited

Loading