Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Cloud Security][Telemetry] add error handling incase an individual collector fails #165918

Conversation

Omolola-Akinleye
Copy link
Contributor

Summary

Summarize your PR. If it involves visual changes include a screenshot or gif.
The kibana cloud security usage collector payload fails when one feature collector fails. I applied error handling and enabled the feature collector to fail gracefully with a default value. We are now using Promise.allSettled vs Promse.all to avoid all collectors failing in case one collector fails.

@Omolola-Akinleye Omolola-Akinleye added bug Fixes for quality problems that affect the customer experience release_note:skip Skip the PR/issue when compiling release notes Team:Cloud Security Cloud Security team related 8.11 candidate labels Sep 6, 2023
@Omolola-Akinleye Omolola-Akinleye self-assigned this Sep 6, 2023
@Omolola-Akinleye Omolola-Akinleye requested a review from a team as a code owner September 6, 2023 21:54
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-cloud-security-posture (Team:Cloud Security)

@@ -61,6 +65,13 @@ export function registerCspmUsageCollector(
getAlertsStats(collectorFetchContext.esClient, logger),
]);

const indicesStats = handleResult('Indices', results[0]);
const accountsStats = handleResult('Accounts', results[1]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here if there's an error you would do something like:

const accountsStats = handleResult('Accounts', results[1]) || []

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Promise.allSettled() always resolves in a truthy value - an object so I would never render a [].

 // [
//   { status: 'fulfilled', value: 33 },
//   { status: 'fulfilled', value: 66 },
//   { status: 'fulfilled', value: 99 },
//   { status: 'rejected', reason: Error: an error }
// ] ```

installationStats,
alertsStats,
] = await Promise.all([
const handleResult = <T>(taskName: string, result: PromiseSettledResult<T>) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we are not handling the error here (I don't see logging as handling it) so I'd suggest name that suggests the actual action when the happy flow happens like extractPromiseValue

@@ -8,7 +8,7 @@
import { CspStatusCode } from '../../../../common/types';

export interface CspmUsage {
indices: CspmIndicesStats;
indices: CspmIndicesStats | never[];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I've read about never it doesn't appear to be related here

my reference: https://www.typescriptlang.org/docs/handbook/basic-types.html#never

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, good point. I mistakenly thought we wanted to handle the error with an empty state so applying never fixed the issue type error but I'll change the default value to undefined.

Copy link
Contributor

@kfirpeled kfirpeled left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - added a nit refactoring suggestion

installationStats,
alertsStats,
] = await Promise.all([
const getPromiseValue = <T>(
Copy link
Contributor

@kfirpeled kfirpeled Sep 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit you could improve it a bit by not tangling yourself with results[X] and working with it by using the following changing getPromiseValue to receive a promise and wait for it
So when forming the allSettled request, the naming of the request is right next to it.
And now getPromiseValue name also describes better its logic.
And maybe awaitPromiseSafe describes it even better. It is safe because it won't throw any errors

      const awaitPromiseSafe = async <T>(
        taskName: CloudSecurityUsageCollectorType,
        promise: Promise<T>
      ) => {
        try {
          const val = await promise;
          return val;
        } catch (error) {
          logger.error(`${taskName} task failed: ${error.message}`);
          logger.error(error.stack);
          return undefined;
        }
      };
      const esClient = collectorFetchContext.esClient;
      const soClient = collectorFetchContext.soClient;
      const [
        indices,
        // eslint-disable-next-line @typescript-eslint/naming-convention
        accounts_stats,
        // eslint-disable-next-line @typescript-eslint/naming-convention
        resources_stats,
        // eslint-disable-next-line @typescript-eslint/naming-convention
        rules_stats,
        // eslint-disable-next-line @typescript-eslint/naming-convention
        installation_stats,
        // eslint-disable-next-line @typescript-eslint/naming-convention
        alerts_stats,
      ] = await Promise.all([
        awaitPromiseSafe('Indices', getIndicesStats(esClient, soClient, coreServices, logger)),
        awaitPromiseSafe('Accounts', getAccountsStats(esClient, logger)),
        awaitPromiseSafe('Resources', getResourcesStats(esClient, logger)),
        awaitPromiseSafe('Rules', getRulesStats(esClient, logger)),
        awaitPromiseSafe(
          'Installation',
          getInstallationStats(esClient, soClient, coreServices, logger)
        ),
        awaitPromiseSafe('Alerts', getAlertsStats(esClient, logger)),
      ]);

      return {
        indices,
        accounts_stats,
        resources_stats,
        rules_stats,
        installation_stats,
        alerts_stats,
      };

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and we can replace allSettled with all, because we wrap it with a handler that keeps it safe

@Omolola-Akinleye Omolola-Akinleye enabled auto-merge (squash) September 13, 2023 10:29
@Omolola-Akinleye Omolola-Akinleye merged commit 0e2775f into elastic:main Sep 13, 2023
@kibana-ci
Copy link
Collaborator

💚 Build Succeeded

Metrics [docs]

✅ unchanged

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @Omolola-Akinleye

@kibanamachine kibanamachine added v8.11.0 backport:skip This commit does not require backporting labels Sep 13, 2023
@kfirpeled kfirpeled removed 8.11 candidate bug Fixes for quality problems that affect the customer experience labels Sep 13, 2023
@@ -7,6 +7,14 @@

import { CspStatusCode } from '../../../../common/types';

export type CloudSecurityUsageCollectorType =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:skip This commit does not require backporting release_note:skip Skip the PR/issue when compiling release notes Team:Cloud Security Cloud Security team related v8.11.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Cloud Security][Bug][Telemetry] Telemetry data not showing up on Big Query
6 participants