Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Security Solution][Detections] Fix GET /api/detection_engine/rules?id={ruleId} endpoint #122024

Merged
merged 3 commits into from
Jan 5, 2022

Conversation

banderror
Copy link
Contributor

@banderror banderror commented Dec 27, 2021

Ticket: #120872 (this is a quick fix that partially addresses issues described in there)

Summary

When a Security rule fails on the Alerting Framework level (but not on the Detection Engine level), we return 500 error result from the Detections API which breaks the Rule Details page. This PR fixes that: instead, the API now returns 200 with the rule and its failed execution status.

Details

When a rule fails on the Alerting Framework level, its rule.executionStatus might look something like that:

{
  status: 'error',
  lastExecutionDate: new Date('2021-12-24T04:44:44.961Z'),
  error: {
    reason: AlertExecutionStatusErrorReasons.Read,
    message:
      'security_exception: [security_exception] Reason: missing authentication credentials for REST request [/_security/user/_has_privileges], caused by: ""',
  },
}

We merge the Framework's rule.executionStatus with our custom status based on the legacy siem-detection-engine-rule-status saved object, and return the resulting status from multiple endpoints, like /api/detection_engine/rules/_find, /internal/detection_engine/rules/_find_status and /api/detection_engine/rules?id={ruleId}.

The /api/detection_engine/rules?id={ruleId} route handler contained incorrect merging logic which, in the case of rule.executionStatus.status === 'error', has been leading to an exception and 500 error result returned from it. This logic has been removed:

if (currentStatus != null && rule.executionStatus.status === 'error') {
  currentStatus.attributes.lastFailureMessage = `Reason: ${rule.executionStatus.error?.reason} Message: ${rule.executionStatus.error?.message}`;
  currentStatus.attributes.lastFailureAt =
    rule.executionStatus.lastExecutionDate.toISOString();
  currentStatus.attributes.statusDate =
    rule.executionStatus.lastExecutionDate.toISOString();
  currentStatus.attributes.status = RuleExecutionStatus.failed;
}

The proper logic of merging rule statuses is still there. Check transform -> transformAlertToRule -> internalRuleToAPIResponse -> mergeAlertWithSidecarStatus.

Screenshots

Before: (this is how the page looks like if /api/detection_engine/rules?id={ruleId} returns 500)

After:

How to test

I wasn't able to reproduce the original error described in #120872:

security_exception: [security_exception] Reason: missing authentication credentials for REST request [/_security/user/_has_privileges], caused by: ""

One way of getting it or a similar error from the Framework is by simulating it in the code.

Add this to x-pack/plugins/security_solution/server/lib/detection_engine/routes/rules/read_rules_route.ts:

import { AlertExecutionStatusErrorReasons } from '../../../../../../alerting/common';

rule.executionStatus = {
  status: 'error',
  lastExecutionDate: new Date('2021-12-24T04:44:44.961Z'),
  error: {
    reason: AlertExecutionStatusErrorReasons.Read,
    message:
      'security_exception: [security_exception] Reason: missing authentication credentials for REST request [/_security/user/_has_privileges], caused by: ""',
  },
};

Modify getFailingRules in x-pack/plugins/security_solution/server/lib/detection_engine/routes/utils.ts:

export const getFailingRules = async (
  ids: string[],
  rulesClient: RulesClient
): Promise<GetFailingRulesResult> => {
  try {
    const errorRules = await Promise.all(
      ids.map(async (id) =>
        rulesClient.resolve({
          id,
        })
      )
    );
    return errorRules
      .map((rule) => {
        rule.executionStatus = {
          status: 'error',
          lastExecutionDate: new Date('2021-12-25T10:44:44.961Z'),
          error: {
            reason: AlertExecutionStatusErrorReasons.Read,
            message:
              'security_exception: [security_exception] Reason: missing authentication credentials for REST request [/_security/user/_has_privileges], caused by: ""',
          },
        };
        return rule;
      })
      .filter((rule) => rule.executionStatus.status === 'error')
      .reduce<GetFailingRulesResult>((acc, failingRule) => {
        return {
          [failingRule.id]: {
            ...failingRule,
          },
          ...acc,
        };
      }, {});
  } catch (exc) {
    if (exc instanceof CustomHttpRequestError) {
      throw exc;
    }
    throw new Error(`Failed to get executionStatus with RulesClient: ${(exc as Error).message}`);
  }
};

Please note that lastExecutionDate should be greater than the date of the current legacy rule status SO.

Checklist

@banderror banderror added bug Fixes for quality problems that affect the customer experience v8.0.0 impact:high Addressing this issue will have a high level of impact on the quality/strength of our product. Team:Detections and Resp Security Detection Response Team Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. Feature:Rule Management Security Solution Detection Rule Management area v8.1.0 Team:Detection Rule Management Security Detection Rule Management Team labels Dec 27, 2021
@banderror banderror self-assigned this Dec 27, 2021
@banderror banderror requested a review from a team as a code owner December 27, 2021 14:31
@elasticmachine
Copy link
Contributor

Pinging @elastic/security-detections-response (Team:Detections and Resp)

@elasticmachine
Copy link
Contributor

Pinging @elastic/security-solution (Team: SecuritySolution)

Copy link
Contributor

@rylnd rylnd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@banderror it looks like the 500 is coming from a bug introduced in #115574, where find was replaced with getCurrentStatus, but the code (that's being deleted here) was not updated to account for the new return type.

This route looks much better; getting rid of that side effect is a great improvement here. However, with that side effect now gone, it's probably also worth verifying:

  1. that mergeAlertWithSidecarStatus is called on all relevant routes
  2. that no other code is similarly expecting an attributes property from getCurrentStatus

Related to 2, I was able to remove the any that lead to this bug with the following change:

-// eslint-disable-next-line @typescript-eslint/no-explicit-any
-export interface IRuleStatusSOAttributes extends Record<string, any> {
+export interface IRuleStatusSOAttributes extends SavedObjectAttributes {

but I'd appreciate another once-over since typescript may have missed something.

@banderror banderror force-pushed the fix-get-rule-by-id-endpoint branch from 25a8a01 to eb69bde Compare December 30, 2021 04:00
@banderror
Copy link
Contributor Author

it looks like the 500 is coming from a bug introduced in #115574, where find was replaced with getCurrentStatus, but the code (that's being deleted here) was not updated to account for the new return type.

@rylnd oh sorry, my bad. This one was a messy PR. Thanks for finding this out!

I fixed IRuleStatusSOAttributes according to your suggestion and double-checked all routes that call getCurrentStatus. Most of them call functions like transformValidate that call mergeAlertWithSidecarStatus under the hood, the only exception is findRuleStatusInternalRoute which calls it directly. I think we should be good here.

Thank you!

@spong
Copy link
Member

spong commented Jan 5, 2022

@elasticmachine merge upstream

@kibana-ci
Copy link
Collaborator

💚 Build Succeeded

Metrics [docs]

Unknown metric groups

ESLint disabled line counts

id before after diff
securitySolution 455 454 -1

Total ESLint disabled count

id before after diff
securitySolution 522 521 -1

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @banderror

Copy link
Contributor

@rylnd rylnd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes LGTM, thanks for the quick turnaround and cleanup here 👍

@spong spong added the auto-backport Deprecated - use backport:version if exact versions are needed label Jan 5, 2022
@spong spong merged commit 43202f6 into elastic:main Jan 5, 2022
@kibanamachine
Copy link
Contributor

💔 Backport failed

The backport operation could not be completed due to the following error:
You must specify a valid Github repository

You can specify it via either:

The backport PRs will be merged automatically after passing CI.

To backport manually run:
node scripts/backport --pr 122024

spong pushed a commit to spong/kibana that referenced this pull request Jan 6, 2022
…d={ruleId} endpoint (elastic#122024)

**Ticket:** elastic#120872 (this is a quick fix that [partially](elastic#120872 (comment)) addresses issues described in there)

## Summary

When a Security rule fails on the Alerting Framework level (but not on the Detection Engine level), we return 500 error result from the Detections API which breaks the Rule Details page. This PR fixes that: instead, the API now returns 200 with the rule and its failed execution status.

## Details

When a rule fails on the Alerting Framework level, its `rule.executionStatus` might look something like that:

```
{
  status: 'error',
  lastExecutionDate: new Date('2021-12-24T04:44:44.961Z'),
  error: {
    reason: AlertExecutionStatusErrorReasons.Read,
    message:
      'security_exception: [security_exception] Reason: missing authentication credentials for REST request [/_security/user/_has_privileges], caused by: ""',
  },
}
```

We merge the Framework's `rule.executionStatus` with our custom status based on the legacy `siem-detection-engine-rule-status` saved object, and return the resulting status from multiple endpoints, like `/api/detection_engine/rules/_find`, `/internal/detection_engine/rules/_find_status` and `/api/detection_engine/rules?id={ruleId}`.

The `/api/detection_engine/rules?id={ruleId}` route handler contained incorrect merging logic which, in the case of `rule.executionStatus.status === 'error'`, has been leading to an exception and 500 error result returned from it. This logic has been removed:

```ts
if (currentStatus != null && rule.executionStatus.status === 'error') {
  currentStatus.attributes.lastFailureMessage = `Reason: ${rule.executionStatus.error?.reason} Message: ${rule.executionStatus.error?.message}`;
  currentStatus.attributes.lastFailureAt =
    rule.executionStatus.lastExecutionDate.toISOString();
  currentStatus.attributes.statusDate =
    rule.executionStatus.lastExecutionDate.toISOString();
  currentStatus.attributes.status = RuleExecutionStatus.failed;
}
```

The proper logic of merging rule statuses is still there. Check `transform` -> `transformAlertToRule` -> `internalRuleToAPIResponse` -> `mergeAlertWithSidecarStatus`.

## Screenshots

**Before:** (this is how the page looks like if `/api/detection_engine/rules?id={ruleId}` returns 500)

![](https://puu.sh/IyOuI/878484c991.png)

**After:**

![](https://puu.sh/IyOtY/3db04cecd7.png)

## How to test

I wasn't able to reproduce the original error described in elastic#120872:

```
security_exception: [security_exception] Reason: missing authentication credentials for REST request [/_security/user/_has_privileges], caused by: ""
```

One way of getting it or a similar error from the Framework is by simulating it in the code.

Add this to `x-pack/plugins/security_solution/server/lib/detection_engine/routes/rules/read_rules_route.ts`:

```ts
import { AlertExecutionStatusErrorReasons } from '../../../../../../alerting/common';

rule.executionStatus = {
  status: 'error',
  lastExecutionDate: new Date('2021-12-24T04:44:44.961Z'),
  error: {
    reason: AlertExecutionStatusErrorReasons.Read,
    message:
      'security_exception: [security_exception] Reason: missing authentication credentials for REST request [/_security/user/_has_privileges], caused by: ""',
  },
};
```

Modify `getFailingRules` in `x-pack/plugins/security_solution/server/lib/detection_engine/routes/utils.ts`:

```ts
export const getFailingRules = async (
  ids: string[],
  rulesClient: RulesClient
): Promise<GetFailingRulesResult> => {
  try {
    const errorRules = await Promise.all(
      ids.map(async (id) =>
        rulesClient.resolve({
          id,
        })
      )
    );
    return errorRules
      .map((rule) => {
        rule.executionStatus = {
          status: 'error',
          lastExecutionDate: new Date('2021-12-25T10:44:44.961Z'),
          error: {
            reason: AlertExecutionStatusErrorReasons.Read,
            message:
              'security_exception: [security_exception] Reason: missing authentication credentials for REST request [/_security/user/_has_privileges], caused by: ""',
          },
        };
        return rule;
      })
      .filter((rule) => rule.executionStatus.status === 'error')
      .reduce<GetFailingRulesResult>((acc, failingRule) => {
        return {
          [failingRule.id]: {
            ...failingRule,
          },
          ...acc,
        };
      }, {});
  } catch (exc) {
    if (exc instanceof CustomHttpRequestError) {
      throw exc;
    }
    throw new Error(`Failed to get executionStatus with RulesClient: ${(exc as Error).message}`);
  }
};
```

Please note that `lastExecutionDate` should be greater than the date of the current legacy rule status SO.

### Checklist

- [ ] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios
spong added a commit that referenced this pull request Jan 6, 2022
…d={ruleId} endpoint (#122024) (#122406)

**Ticket:** #120872 (this is a quick fix that [partially](#120872 (comment)) addresses issues described in there)

## Summary

When a Security rule fails on the Alerting Framework level (but not on the Detection Engine level), we return 500 error result from the Detections API which breaks the Rule Details page. This PR fixes that: instead, the API now returns 200 with the rule and its failed execution status.

## Details

When a rule fails on the Alerting Framework level, its `rule.executionStatus` might look something like that:

```
{
  status: 'error',
  lastExecutionDate: new Date('2021-12-24T04:44:44.961Z'),
  error: {
    reason: AlertExecutionStatusErrorReasons.Read,
    message:
      'security_exception: [security_exception] Reason: missing authentication credentials for REST request [/_security/user/_has_privileges], caused by: ""',
  },
}
```

We merge the Framework's `rule.executionStatus` with our custom status based on the legacy `siem-detection-engine-rule-status` saved object, and return the resulting status from multiple endpoints, like `/api/detection_engine/rules/_find`, `/internal/detection_engine/rules/_find_status` and `/api/detection_engine/rules?id={ruleId}`.

The `/api/detection_engine/rules?id={ruleId}` route handler contained incorrect merging logic which, in the case of `rule.executionStatus.status === 'error'`, has been leading to an exception and 500 error result returned from it. This logic has been removed:

```ts
if (currentStatus != null && rule.executionStatus.status === 'error') {
  currentStatus.attributes.lastFailureMessage = `Reason: ${rule.executionStatus.error?.reason} Message: ${rule.executionStatus.error?.message}`;
  currentStatus.attributes.lastFailureAt =
    rule.executionStatus.lastExecutionDate.toISOString();
  currentStatus.attributes.statusDate =
    rule.executionStatus.lastExecutionDate.toISOString();
  currentStatus.attributes.status = RuleExecutionStatus.failed;
}
```

The proper logic of merging rule statuses is still there. Check `transform` -> `transformAlertToRule` -> `internalRuleToAPIResponse` -> `mergeAlertWithSidecarStatus`.

## Screenshots

**Before:** (this is how the page looks like if `/api/detection_engine/rules?id={ruleId}` returns 500)

![](https://puu.sh/IyOuI/878484c991.png)

**After:**

![](https://puu.sh/IyOtY/3db04cecd7.png)

## How to test

I wasn't able to reproduce the original error described in #120872:

```
security_exception: [security_exception] Reason: missing authentication credentials for REST request [/_security/user/_has_privileges], caused by: ""
```

One way of getting it or a similar error from the Framework is by simulating it in the code.

Add this to `x-pack/plugins/security_solution/server/lib/detection_engine/routes/rules/read_rules_route.ts`:

```ts
import { AlertExecutionStatusErrorReasons } from '../../../../../../alerting/common';

rule.executionStatus = {
  status: 'error',
  lastExecutionDate: new Date('2021-12-24T04:44:44.961Z'),
  error: {
    reason: AlertExecutionStatusErrorReasons.Read,
    message:
      'security_exception: [security_exception] Reason: missing authentication credentials for REST request [/_security/user/_has_privileges], caused by: ""',
  },
};
```

Modify `getFailingRules` in `x-pack/plugins/security_solution/server/lib/detection_engine/routes/utils.ts`:

```ts
export const getFailingRules = async (
  ids: string[],
  rulesClient: RulesClient
): Promise<GetFailingRulesResult> => {
  try {
    const errorRules = await Promise.all(
      ids.map(async (id) =>
        rulesClient.resolve({
          id,
        })
      )
    );
    return errorRules
      .map((rule) => {
        rule.executionStatus = {
          status: 'error',
          lastExecutionDate: new Date('2021-12-25T10:44:44.961Z'),
          error: {
            reason: AlertExecutionStatusErrorReasons.Read,
            message:
              'security_exception: [security_exception] Reason: missing authentication credentials for REST request [/_security/user/_has_privileges], caused by: ""',
          },
        };
        return rule;
      })
      .filter((rule) => rule.executionStatus.status === 'error')
      .reduce<GetFailingRulesResult>((acc, failingRule) => {
        return {
          [failingRule.id]: {
            ...failingRule,
          },
          ...acc,
        };
      }, {});
  } catch (exc) {
    if (exc instanceof CustomHttpRequestError) {
      throw exc;
    }
    throw new Error(`Failed to get executionStatus with RulesClient: ${(exc as Error).message}`);
  }
};
```

Please note that `lastExecutionDate` should be greater than the date of the current legacy rule status SO.

### Checklist

- [ ] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios

Co-authored-by: Georgii Gorbachev <[email protected]>
@banderror banderror deleted the fix-get-rule-by-id-endpoint branch January 10, 2022 10:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Deprecated - use backport:version if exact versions are needed bug Fixes for quality problems that affect the customer experience Feature:Rule Management Security Solution Detection Rule Management area impact:high Addressing this issue will have a high level of impact on the quality/strength of our product. release_note:fix Team:Detection Rule Management Security Detection Rule Management Team Team:Detections and Resp Security Detection Response Team Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. v8.0.0 v8.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants