Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(metricprovider): support Datadog v2 API Fixes #2813 #2997

Merged
merged 16 commits into from
Oct 14, 2023

Conversation

meeech
Copy link
Contributor

@meeech meeech commented Aug 30, 2023

fixes #2813 #2819

Add support for Datadog v2 api.
Add new properties queries and formula.
Update the docs with some more examples, more details around v2.

Update CRDs to express validation requirements, allowing us to remove some guardrails in the code.

Rationale:
given the validation at the heart of k8s objects is held to a high standard
given the context the analysis Runs exec is k8s
given the the Analysis Template has a defined spec

eg: we can then make some assumptions
apiVersion is empty, v1, v2
set some defaults.

(server dry run)
The AnalysisTemplate "scratch.example-v2-template" is invalid: spec.metrics[0].provider.datadog.apiVersion: Unsupported value: "v3": supported values: "v1", "v2"

Why not use the DD go client?

I explored it. There was no obvious benefit vs making the requests how we do now, with the drawback of having a new external dep. Which is in an unstable state - this is emitted when using the DD client api.
WARNING: Using unstable operation 'v2.QueryTimeseriesData'"

v1 vs v2
That being said, I do have a request out to Datadog to clarify the future of v1 api so we can definitively clear up this business where we recommend moving to v2 because of impending deprecation of v1, and more guidance around the rate limits.

(as of sept 20, 2023)

From DD re: v1 deprecation: " I received confirmation from our engineering team that the api endpoint, /api/v1/query, is not going to be deprecated in the near future."

From DD re: Rate limits: "increasing the rate limit of /api/v1/query api endpoint is possible and reviewed on a case-to-case basis."

Why scalar and not timeseries?
Looking into timeseries vs scalar. I think since we only need 1 value, we do better to hit the v2/query/scalar endpoint instead of v2/query/timeseries. A lot more data is being transferred for timeseries with no benefit- we throw it away.
Also confirmed with DD: From support ticket with DD: "...I have also tested the scalar api endpoint with the last aggregator as well as the timeseries api endpoint. They do indeed return the same values when retrieving the values via the api endpoints. Observing the time it takes to retrieve the values, they remain relatively the same..."

I have done similar tests and can confirm values returning remain relatively the same.

Checklist:

  • Either (a) I've created an enhancement proposal and discussed it with the community, (b) this is a bug fix, or (c) this is a chore.
  • The title of the PR is (a) conventional with a list of types and scopes found here, (b) states what changed, and (c) suffixes the related issues number. E.g. "fix(controller): Updates such and such. Fixes #1234".
  • I've signed my commits with DCO
  • I have written unit and/or e2e tests for my change. PRs without these are unlikely to be merged.
  • My builds are green. Try syncing with master if they are not.
  • My organization is added to USERS.md.

@github-actions
Copy link
Contributor

github-actions bot commented Aug 30, 2023

Go Published Test Results

2 062 tests   2 062 ✔️  2m 43s ⏱️
   118 suites         0 💤
       1 files           0

Results for commit c9e2560.

♻️ This comment has been updated with latest results.

@github-actions
Copy link
Contributor

github-actions bot commented Aug 30, 2023

E2E Tests Published Test Results

    4 files      4 suites   3h 39m 11s ⏱️
102 tests   87 ✔️   6 💤   9
434 runs  384 ✔️ 24 💤 26

For more details on these failures, see this check.

Results for commit c9e2560.

♻️ This comment has been updated with latest results.

@codecov
Copy link

codecov bot commented Sep 1, 2023

Codecov Report

Attention: 14 lines in your changes are missing coverage. Please review.

Comparison is base (1d89c33) 81.78% compared to head (c9e2560) 81.82%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2997      +/-   ##
==========================================
+ Coverage   81.78%   81.82%   +0.03%     
==========================================
  Files         134      134              
  Lines       20459    20507      +48     
==========================================
+ Hits        16733    16780      +47     
- Misses       2863     2864       +1     
  Partials      863      863              
Files Coverage Δ
metricproviders/datadog/datadog.go 79.85% <88.13%> (+3.76%) ⬆️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@meeech meeech force-pushed the 2813-datadog-v2-api branch 2 times, most recently from 41ae1e2 to 417866a Compare September 5, 2023 23:57
@meeech meeech changed the title 2813 datadog v2 api fix(metricprovider) support Datadog v2 API Fixes #2813 Sep 6, 2023
@meeech meeech changed the title fix(metricprovider) support Datadog v2 API Fixes #2813 fix(metricprovider): support Datadog v2 API Fixes #2813 Sep 6, 2023
@meeech meeech force-pushed the 2813-datadog-v2-api branch 3 times, most recently from 56d2a66 to 9701e9e Compare September 12, 2023 00:41
@meeech meeech force-pushed the 2813-datadog-v2-api branch 3 times, most recently from 77ed492 to 128e409 Compare September 20, 2023 18:08
@meeech meeech marked this pull request as ready for review September 20, 2023 18:09
@meeech meeech marked this pull request as draft September 21, 2023 00:48
@meeech meeech force-pushed the 2813-datadog-v2-api branch 3 times, most recently from 46ca1a9 to fdb3a3a Compare September 22, 2023 02:45
@meeech meeech marked this pull request as ready for review September 22, 2023 12:53
@zachaller zachaller self-requested a review September 24, 2023 02:06
@zachaller zachaller added this to the v1.7 milestone Sep 24, 2023
@meeech meeech force-pushed the 2813-datadog-v2-api branch 2 times, most recently from 988cbf1 to 7ba20d7 Compare September 27, 2023 22:12
@meeech meeech force-pushed the 2813-datadog-v2-api branch 2 times, most recently from b11893c to 2f2e291 Compare October 7, 2023 20:51
* Expand working with v2 information
* Contacted Datadog to get latest info re: v1 deprecation, api limits.
* Add tips about rate limits, using helm for templates
* Add more example templates

Signed-off-by: mitchell amihod <[email protected]>
* Make Query omitempty since now possible it won't exist
* Add some descriptions
* Add new properties we need for v2

Queries: We can pass in key:query for queries
Formula: Makes formulas using the keys from queries

* Defaults!
Use annotations to declare defaults for some fields. This lets us remove some guard rails from the code itself

Interval: 5m - Move this from code to here
ApiVersion: v1 - Move this from code to here

* Enums!
Much like defaults, having enums lets us make assumptions about the incoming metric so we dont need as many guardrails.

ApiVersion: Enum to restrict to v1 or v2
Signed-off-by: mitchell amihod <[email protected]>
Everything looks ok.

Signed-off-by: mitchell amihod <[email protected]>
Validating the metric on initialization, rather than spread out throughout. You get earlier feedback if you have a bad metric defined. (Not perfect, but there's limitations with our annotation generator for the rules in the crd. eg: If we could use oneOf, we wouldn't need a lot of this validation)

We check all the mutually exclusive props.
The props where one requires another.
We don't have to check for defaults and set them anymore, since they are guaranteed by the crd.

rules:

- ensure we have only query OR queries
- restrict v1 to query only
- make sure you only provide a formula with queries
- make sure multiple queries are accompanied by a formula

Signed-off-by: mitchell amihod <[email protected]>
ApiVersion is guaranteed to have value, and the enum ensures its v1/v2 when user provided.

Updated v1 tests to reflect some of these new realities

Signed-off-by: mitchell amihod <[email protected]>
run was getting a bit long according to the checks

Signed-off-by: mitchell amihod <[email protected]>
It is a straightline to initialize since default is set to 5m for incoming metrics where it is not set.

Signed-off-by: mitchell amihod <[email protected]>
Split into createRequest v1/v2
v1 : pretty much unchanged. just extracted
v2: support for v2/query/scalar

We don't need all the timeseries. I did some testing fetching both scalar and timeseries, and they pretty much lined up.

Also confirmed with DD: From support ticket with DD: "...I have also tested the scalar api endpoint with the last aggregator as well as the timeseries api endpoint. They do indeed return the same values when retrieving the values via the api endpoints. Observing the time it takes to retrieve the values, they remain relatively the same..."

re: query + v2: Keep backwards compat. if we get in a query, we turn it into a queries object to pass on to requestv2 queries into the QueriesPayload.
Signed-off-by: mitchell amihod <[email protected]>
* update the datadogResponseV2 for scalar values
* handle no results so it has parity with v1 - empty will now usually result in `[]` unless something goes very wrong on dd side

Signed-off-by: mitchell amihod <[email protected]>
* add some new test cases
* update mock server to handle queries / formulas validation
* update no data tests to reflect reality
* stop all values being the same. it makes it difficult to know find which test case failed. move meaning from comments into the metric name.
* stop using deprecated ioutil

Signed-off-by: mitchell amihod <[email protected]>
@meeech
Copy link
Contributor Author

meeech commented Oct 13, 2023

@zachaller What do we do re: conflict with generated?
Any way we can get this reviewed anytime soon so a) don't have to keep rebasing and b) before the context around choices leaves my head :)
I am happy to do a paired pr review if that is helpful/makes it quicker.

Signed-off-by: zachaller <[email protected]>
@zachaller
Copy link
Collaborator

@meeech I had reviewed this and it was good I see you made the only change with the error let me give it one quick go over again and merge it.

@sonarcloud
Copy link

sonarcloud bot commented Oct 13, 2023

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 2 Code Smells

No Coverage information No Coverage information
12.7% 12.7% Duplication

@zachaller zachaller merged commit ebf1a3e into argoproj:master Oct 14, 2023
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Datadog v2 API fails when using formulas in query
2 participants