Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Usage collection] refactor cloud detector collector #110439

Merged
merged 7 commits into from
Sep 10, 2021

Conversation

Bamieh
Copy link
Member

@Bamieh Bamieh commented Aug 30, 2021

Summary

  • getting rid of the constructor args which are only there for testing purposes (request, fs), and mocking those in the tests
  • getting rid of the use of the request lib in favor of something like node-fetch which we use more widely in kibana (request looks like it's mostly used in tests atm, not many other places)
  • adding some retry logic to the detector in the event of a retryable failure. Decided not to add any retry logic to the detectors:
    • Adding a generic retry logic will only increase the time for detection since we are expecting errors to be returned from providers if the deployment is not hosted there.
    • Retrying on the documented codes (404, 403, or 401) is unncessary since replaying requests will result in the same error codes.
    • Retrying on timeouts will only increase time for detection since pinging an invalid host will timeout in some cases.
  • updating GCP logic to still return values even if a request for one of the fields fails, instead of using an all-or-nothing approach

Closes #109202
Closes #96711

Testing

Using https://github.com/lukeelmers/terraform-kibana-dev to manually test the changes are working on each cloud provider:

  • Tested an AWS deployment
    image

  • Tested an Azure deployment
    image

  • Tested a GCP deployment
    image

@Bamieh Bamieh added v8.0.0 release_note:skip Skip the PR/issue when compiling release notes auto-backport Deprecated - use backport:version if exact versions are needed v7.16.0 labels Aug 30, 2021
@Bamieh Bamieh marked this pull request as ready for review August 30, 2021 11:13
@Bamieh Bamieh requested a review from a team as a code owner August 30, 2021 11:13
const response = await awsCheckedFileSystem._checkIfService(request);
const response = await awsService['_checkIfService']();
expect(readFile).toBeCalledTimes(0);
expect(fetchMock).toBeCalledTimes(1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is asserting against the number of times fetchMock is called not redundant considering that we assert against what the mock is called with?
In other words, are there specific situations that might cause fetch to be called multiple times?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have no way of detecting if fetch was called multiple times by any reason without this expect. Checking what the fetech is being called with wouldnt detect how many times the fetch was called. Having this check helps a lot when we're refactoring code in the future as it might catch unexpected changes.

imageId: 'ami-6df1e514',
},
});
expect(response.toJSON()).toMatchInlineSnapshot(`
Copy link
Contributor

@TinaHeiligers TinaHeiligers Aug 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to change from asserting equality to asserting against a snapshot? Snapshots tend to get stale and I don't see anything obvious that prevents us using toEqual.
The same question applies to the other changes from

expect(response.toJSON()).toEqual({...}) -> expect(response.toJSON()).toMatchInlineSnapshot(...)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both are equavilant in what they are checking. using inline snapshots saves us time manually changing the values inside the object rather than just updating the snapshots and verifying they adhere to the expected changes. I dont like using variables inside the toEqual from outside the thing that is being tested as it makes the test less verbose and might allow some bugs to pass.

zone: undefined,
metadata: undefined,
});
expect(response.toJSON()).toMatchInlineSnapshot(`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -37,9 +37,9 @@ export class CloudDetector {
/**
* Get any cloud details that we have detected.
*/
getCloudDetails() {
public getCloudDetails = () => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to look up the subtle binding differences again and cross check the change against how we're class 😅 .

Copy link
Contributor

@TinaHeiligers TinaHeiligers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for tackling this!
Overall, the changes LGTM, although we could improve on where we declare and use the required cgp header.

@lukeelmers
Copy link
Member

adding some retry logic to the detector in the event of a retryable failure. Decided not to add any retry logic to the detectors:

  • Adding a generic retry logic will only increase the time for detection since we are expecting errors to be returned from providers if the deployment is not hosted there.
  • Retrying on the documented codes (404, 403, or 401) is unncessary since replaying requests will result in the same error codes.
  • Retrying on timeouts will only increase time for detection since pinging an invalid host will timeout in some cases.

These reasons make sense, but since the usage collector is running this in the background, I'm wondering if we really care about the length of time for detection? It seems we could still retry w/ backoff on 5xx errors if we wanted.

I don't have particularly strong feelings on this as I don't think retries are critical for this case, but just wanted to mention it.

@Bamieh Bamieh enabled auto-merge (squash) September 10, 2021 17:22
@kibanamachine
Copy link
Contributor

💚 Build Succeeded

Metrics [docs]

✅ unchanged

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@kibanamachine
Copy link
Contributor

💚 Backport successful

Status Branch Result
7.x

This backport PR will be merged automatically after passing CI.

@Bamieh Bamieh deleted the telemetry/remove_request_package branch September 10, 2021 20:27
kibanamachine added a commit that referenced this pull request Sep 10, 2021
jloleysens added a commit to jloleysens/kibana that referenced this pull request Sep 13, 2021
…-link-to-kibana-app

* 'master' of github.com:elastic/kibana: (120 commits)
  [TSVB] Support custom field format (elastic#101245)
  [VisEditors] Add code ownership to the functional tests (elastic#111680)
  [Lens] Make Lens saved object share-capable (elastic#111403)
  [Graph] Make Graph saved object share-capable (elastic#111404)
  [Stack Monitoring] Add breadcrumb support (elastic#111850)
  Update Jira Cloud to use OAuth2.0 (elastic#111493)
  Show warning message when attempting to create an APM alert in stack management (elastic#111781)
  Skip suite blocking ES snapshot promotion (elastic#111907)
  Respect `auth_provider_hint` if session is not authenticated. (elastic#111521)
  Added in 'Responses' field in alert telemetry & updated test (elastic#111892)
  [Usage collection] refactor cloud detector collector (elastic#110439)
  Make classnames a shared dep (elastic#111636)
  Fix link to e2e tests in APM testing.md (elastic#111869)
  [Security Solution] Add host.os.name.caseless mapping and runtime field (elastic#111455)
  [APM] Removes the beta label from APM tutorial (elastic#111499) (elastic#111828)
  [RAC] [Observability] Expand Observability alerts page functional tests (elastic#111297)
  Fix extra white space on the alert table whe page size is 50 or 100 (elastic#111568)
  [Metrics UI] Add Inventory Timeline open/close state to context and URL state (elastic#111034)
  [Graph] Switch to SavedObjectClient.resolve  (elastic#109617)
  [APM] Adding lambda icon (elastic#111834)
  ...

# Conflicts:
#	x-pack/plugins/reporting/public/management/__snapshots__/report_listing.test.tsx.snap
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Deprecated - use backport:version if exact versions are needed release_note:skip Skip the PR/issue when compiling release notes v7.16.0 v8.0.0
Projects
None yet
4 participants