Skip to content

Commit

Permalink
[Rules migration] Adding initial implementation of integration RAG (e…
Browse files Browse the repository at this point in the history
…lastic#200922)

## Summary

This is part 1 of a 2-3 part PR. This involes the initial implementation
of the integration RAG, while part 2 focuses on prompt tuning, and a
potential part 3 is the change from local JSON file to EPR client and
further tuning of prompts and search result ranking.

The change introduces a new feature to the rule migration, enriching the
current graph implementation with metadata from available integrations,
allowing us to currently choose the correct index patterns needed if a
relevant integration is found.

Changes in the PR:

- Introduction of the `integration data client`, which might later be
moved under resource.
- Moving` translate_rule` node to its own subgraph, then divided into
multiple nodes to support the RAG search step.
- The creation and population of the index used to store the integration
metadata, together with the `semantic_text` mapping used by the default
included ELSER model.
- Updates to `elastic_rule` type, to include the integration ID's and
index patterns.


## Example finished task:
```json
[
    {
        "migration_id": "3d4cae35-eb8d-49fe-960a-2ef17bc026c6",
        "original_rule": {
            "id": "f8c325ea-506e-4105-8ccf-da1492e90115",
            "vendor": "splunk",
            "title": "Linux Auditd Add User Account Type",
            "description": "The following analytic detects the suspicious add user account type. This behavior is critical for a SOC to monitor because it may indicate attempts to gain unauthorized access or maintain control over a system. Such actions could be signs of malicious activity. If confirmed, this could lead to serious consequences, including a compromised system, unauthorized access to sensitive data, or even a wider breach affecting the entire network. Detecting and responding to these signs early is essential to prevent potential security incidents.",
            "query": "sourcetype=\"linux:audit\" type=ADD_USER \n| rename hostname as dest \n| stats count min(_time) as firstTime max(_time) as lastTime by exe pid dest res UID type \n| `security_content_ctime(firstTime)` \n| `security_content_ctime(lastTime)`\n| search *",
            "query_language": "spl",
            "mitre_attack_ids": [
                "T1136"
            ]
        },
        "@timestamp": "2024-11-21T11:37:10.548Z",
        "status": "completed",
        "created_by": "elastic",
        "updated_at": "2024-11-21T11:38:01.397Z",
        "updated_by": "elastic",
        "comments": [
            "## Migration Summary\n\n1. Source selection:\n   - The original SPL query used `sourcetype=\"linux:audit\"`. In the ES|QL query, we're using the provided index patterns that include logs related to system audit and auditd.\n\n2. Event type filtering:\n   - The SPL query filtered for `type=ADD_USER`. In ES|QL, we use `WHERE event.type == \"ADD_USER\"` to achieve the same filtering.\n\n3. Field renaming:\n   - The SPL query renamed `hostname` to `dest`. In ES|QL, we use the `RENAME` command to rename `host.hostname` to `dest`, assuming the ECS field mapping.\n\n4. Statistics calculation:\n   - The SPL query used `stats` to calculate count, min(_time), and max(_time). In ES|QL, we use the `STATS` command with `COUNT(*)`, `MIN(@timestamp)`, and `MAX(@timestamp)` to achieve similar results.\n   - The grouping fields are adjusted to match ECS field names:\n     - `exe` -> `process.executable`\n     - `pid` -> `process.pid`\n     - `dest` remains the same (after renaming)\n     - `res` -> `event.outcome`\n     - `UID` -> `user.id`\n     - `type` -> `event.type`\n\n5. Time conversion:\n   - The SPL query used `security_content_ctime` function for time conversion. In ES|QL, we use the `EVAL` command with `TO_DATETIME` function to convert the `firstTime` and `lastTime` fields to datetime format.\n\n6. Additional notes:\n   - The `search *` at the end of the SPL query is not necessary in ES|QL as it doesn't change the result set.\n   - The ES|QL query assumes that the `@timestamp` field is used for event timestamps, which is standard in ECS.\n\nThis ES|QL query should provide equivalent functionality to the original Splunk query, adapted for Elastic Security and using ECS field names where appropriate."
        ],
        "translation_result": "full",
        "elastic_rule": {
            "severity": "low",
            "query": "FROM logs-system_audit.package-* logs-endpoint.events.api-* logs-endpoint.events.file-* logs-endpoint.events.library-* logs-endpoint.events.network-* logs-endpoint.events.process-* logs-endpoint.events.registry-* logs-endpoint.events.security-* logs-auditd.log-*\n| WHERE event.type == \"ADD_USER\"\n| RENAME host.hostname AS dest\n| STATS count = COUNT(*), firstTime = MIN(@timestamp), lastTime = MAX(@timestamp) BY process.executable, process.pid, dest, event.outcome, user.id, event.type\n| EVAL firstTime = TO_DATETIME(firstTime), lastTime = TO_DATETIME(lastTime)",
            "description": "The following analytic detects the suspicious add user account type. This behavior is critical for a SOC to monitor because it may indicate attempts to gain unauthorized access or maintain control over a system. Such actions could be signs of malicious activity. If confirmed, this could lead to serious consequences, including a compromised system, unauthorized access to sensitive data, or even a wider breach affecting the entire network. Detecting and responding to these signs early is essential to prevent potential security incidents.",
            "index_patterns": [
                "logs-system_audit.package-*",
                "logs-endpoint.events.api-*",
                "logs-endpoint.events.file-*",
                "logs-endpoint.events.library-*",
                "logs-endpoint.events.network-*",
                "logs-endpoint.events.process-*",
                "logs-endpoint.events.registry-*",
                "logs-endpoint.events.security-*",
                "logs-auditd.log-*"
            ],
            "query_language": "esql",
            "title": "Linux Auditd Add User Account Type",
            "integration_ids": [
                "system_audit",
                "endpoint",
                "auditd"
            ]
        },
        "_id": "8eKDTpMBwtRPKDL_CLKW"
    },
    {
        "migration_id": "3d4cae35-eb8d-49fe-960a-2ef17bc026c6",
        "original_rule": {
            "id": "7b87c556-0ca4-47e0-b84c-6cd62a0a3e90",
            "vendor": "splunk",
            "title": "Linux Auditd Change File Owner To Root",
            "description": "The following analytic detects the use of the 'chown' command to change a file owner to 'root' on a Linux system. It leverages Linux Auditd telemetry, specifically monitoring command-line executions and process details. This activity is significant as it may indicate an attempt to escalate privileges by adversaries, malware, or red teamers. If confirmed malicious, this action could allow an attacker to gain root-level access, leading to full control over the compromised host and potential persistence within the environment.",
            "query": "`linux_auditd` `linux_auditd_normalized_proctitle_process`\r\n| rename host as dest \r\n| where LIKE (process_exec, \"%chown %root%\") \r\n| stats count min(_time) as firstTime max(_time) as lastTime by process_exec proctitle normalized_proctitle_delimiter dest \r\n| `security_content_ctime(firstTime)` \r\n| `security_content_ctime(lastTime)`\r\n| `linux_auditd_change_file_owner_to_root_filter`",
            "query_language": "spl",
            "mitre_attack_ids": [
                "T1222"
            ]
        },
        "@timestamp": "2024-11-21T11:37:10.548Z",
        "status": "completed",
        "created_by": "elastic",
        "updated_at": "2024-11-21T11:38:04.527Z",
        "updated_by": "elastic",
        "comments": [
            "## Migration Summary\n\n1. Source selection:\n   - The original SPL query used `linux_auditd` and `linux_auditd_normalized_proctitle_process` macros. In ES|QL, we're using the recommended index patterns for Elastic Endpoint Security events.\n\n2. Field mapping:\n   - `host` was renamed to `dest` in the original query. In ECS, we use `host.name`.\n   - `process_exec` is mapped to `process.executable` in ECS.\n   - `proctitle` and `normalized_proctitle_delimiter` are combined into `process.command_line` in ECS.\n   - `_time` is replaced with `@timestamp` in ECS.\n\n3. Filtering:\n   - The `LIKE` function is used in ES|QL, which is equivalent to the SPL `LIKE` function.\n   - We're checking for \"chown\" in the process executable and \"root\" in the process arguments.\n\n4. Statistics:\n   - The `STATS` command in ES|QL is similar to SPL's `stats` command.\n   - We use `COUNT(*)`, `MIN(@timestamp)`, and `MAX(@timestamp)` for the count, firstTime, and lastTime respectively.\n   - The `BY` clause groups the results by the relevant fields.\n\n5. Sorting:\n   - Added a `SORT` command to order the results by count in descending order, which wasn't in the original query but is useful for identifying the most frequent occurrences.\n\n6. Removed:\n   - The `security_content_ctime` function calls were removed as they are specific to Splunk. ES|QL uses the native datetime format.\n   - The `linux_auditd_change_file_owner_to_root_filter` macro was not included as we don't have an equivalent in ES|QL. If there are specific filters in this macro, they would need to be added explicitly to the query.\n\nThis ES|QL query will detect the use of the 'chown' command to change a file owner to 'root' on a Linux system, using Elastic Endpoint Security events. It groups the results by the process details and host name, providing a count of occurrences and the first and last times the event was observed."
        ],
        "translation_result": "full",
        "elastic_rule": {
            "severity": "low",
            "query": "FROM logs-endpoint.events.api-* logs-endpoint.events.file-* logs-endpoint.events.library-* logs-endpoint.events.network-* logs-endpoint.events.process-* logs-endpoint.events.registry-* logs-endpoint.events.security-*\n| WHERE process.executable LIKE \"%chown%\" AND process.args LIKE \"%root%\"\n| STATS count = COUNT(*), firstTime = MIN(@timestamp), lastTime = MAX(@timestamp) BY process.executable, \n    process.command_line, \n    process.args, \n    host.name\n| SORT count DESC",
            "description": "The following analytic detects the use of the 'chown' command to change a file owner to 'root' on a Linux system. It leverages Linux Auditd telemetry, specifically monitoring command-line executions and process details. This activity is significant as it may indicate an attempt to escalate privileges by adversaries, malware, or red teamers. If confirmed malicious, this action could allow an attacker to gain root-level access, leading to full control over the compromised host and potential persistence within the environment.",
            "index_patterns": [
                "logs-endpoint.events.api-*",
                "logs-endpoint.events.file-*",
                "logs-endpoint.events.library-*",
                "logs-endpoint.events.network-*",
                "logs-endpoint.events.process-*",
                "logs-endpoint.events.registry-*",
                "logs-endpoint.events.security-*"
            ],
            "query_language": "esql",
            "title": "Linux Auditd Change File Owner To Root",
            "integration_ids": [
                "endpoint"
            ]
        },
        "_id": "8uKDTpMBwtRPKDL_CLKW"
    }
]
```


## Testing locally

Enable the flag
```
xpack.securitySolution.enableExperimental: ['siemMigrationsEnabled']
```

Create the rule migration, add relevant macro resources and initiate the
task.

cURL request examples:

<details>
  <summary>Rules migration `create` POST request</summary>

```
curl --location --request POST 'http://elastic:changeme@localhost:5601/internal/siem_migrations/rules' \
--header 'kbn-xsrf;' \
--header 'x-elastic-internal-origin: security-solution' \
--header 'elastic-api-version: 1' \
--header 'Content-Type: application/json' \
--data '[
    {
        "id": "f8c325ea-506e-4105-8ccf-da1492e90115",
        "vendor": "splunk",
        "title": "Linux Auditd Add User Account Type",
        "description": "The following analytic detects the suspicious add user account type. This behavior is critical for a SOC to monitor because it may indicate attempts to gain unauthorized access or maintain control over a system. Such actions could be signs of malicious activity. If confirmed, this could lead to serious consequences, including a compromised system, unauthorized access to sensitive data, or even a wider breach affecting the entire network. Detecting and responding to these signs early is essential to prevent potential security incidents.",
        "query": "sourcetype=\"linux:audit\" type=ADD_USER \n| rename hostname as dest \n| stats count min(_time) as firstTime max(_time) as lastTime by exe pid dest res UID type \n| `security_content_ctime(firstTime)` \n| `security_content_ctime(lastTime)`\n| search *",
        "query_language":"spl",
        "mitre_attack_ids": [
            "T1136"
        ]
    },
    {
        "id": "7b87c556-0ca4-47e0-b84c-6cd62a0a3e90",
        "vendor": "splunk",
        "title": "Linux Auditd Change File Owner To Root",
        "description": "The following analytic detects the use of the '\''chown'\'' command to change a file owner to '\''root'\'' on a Linux system. It leverages Linux Auditd telemetry, specifically monitoring command-line executions and process details. This activity is significant as it may indicate an attempt to escalate privileges by adversaries, malware, or red teamers. If confirmed malicious, this action could allow an attacker to gain root-level access, leading to full control over the compromised host and potential persistence within the environment.",
        "query": "`linux_auditd` `linux_auditd_normalized_proctitle_process`\r\n| rename host as dest \r\n| where LIKE (process_exec, \"%chown %root%\") \r\n| stats count min(_time) as firstTime max(_time) as lastTime by process_exec proctitle normalized_proctitle_delimiter dest \r\n| `security_content_ctime(firstTime)` \r\n| `security_content_ctime(lastTime)`\r\n| `linux_auditd_change_file_owner_to_root_filter`",
        "query_language": "spl",
        "mitre_attack_ids": [
            "T1222"
        ]
    }
]'
```
</details>

<details>
  <summary>Add resources to migration ID</summary>

- Assuming the connector `azureOpenAiGPT4o` is already created in the
local environment.
- Using the {{`migration_id`}} from the first POST request response

```
curl --location --request POST 'http://elastic:changeme@localhost:5601/internal/siem_migrations/rules/{{migration_id}}/respirces' \
--header 'kbn-xsrf;' \
--header 'x-elastic-internal-origin: security-solution' \
--header 'elastic-api-version: 1' \
--header 'Content-Type: application/json' \
--data '[
    {
        "type": "macro",
        "name": "security_content_ctime",
        "content": "convert timeformat=\"%Y-%m-%dT%H:%M:%S\" ctime($field$)"
    },
    {
        "type": "macro",
        "name": "linux_auditd_add_user_account_type_filter",
        "content": "search *"
    },
    {
        "type": "macro",
        "name": "linux_auditd",
        "content": "sourcetype=\"linux:audit\""
    },
    {
        "type": "macro",
        "name": "linux_auditd_change_file_owner_to_root_filter",
        "content": "search *"
    }
]'
```
</details>

<details>
  <summary>Rules migration `start` task request</summary>

- Assuming the connector `azureOpenAiGPT4o` is already created in the
local environment.
- Using the {{`migration_id`}} from the first POST request response

```
curl --location --request PUT 'http://elastic:changeme@localhost:5601/internal/siem_migrations/rules/{{migration_id}}/start' \
--header 'kbn-xsrf;' \
--header 'x-elastic-internal-origin: security-solution' \
--header 'elastic-api-version: 1' \
--header 'Content-Type: application/json' \
--data '{
    "connectorId": "azureOpenAiGPT4o"
}'
```
</details>
  • Loading branch information
P1llus authored and paulinashakirova committed Nov 26, 2024
1 parent 03284e8 commit 288fd9c
Show file tree
Hide file tree
Showing 32 changed files with 7,542 additions and 136 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,10 @@ export const ElasticRule = z.object({
* The Elastic prebuilt rule id matched.
*/
prebuilt_rule_id: NonEmptyString.optional(),
/**
* The Elastic integration IDs related to the rule.
*/
integration_ids: z.array(z.string()).optional(),
/**
* The Elastic rule id installed as a result.
*/
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,11 @@ components:
prebuilt_rule_id:
description: The Elastic prebuilt rule id matched.
$ref: './common.schema.yaml#/components/schemas/NonEmptyString'
integration_ids:
type: array
items:
type: string
description: The Elastic integration IDs related to the rule.
id:
description: The Elastic rule id installed as a result.
$ref: './common.schema.yaml#/components/schemas/NonEmptyString'
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,11 +8,11 @@
import type { IKibanaResponse, Logger } from '@kbn/core/server';
import { buildRouteValidationWithZod } from '@kbn/zod-helpers';
import { v4 as uuidV4 } from 'uuid';
import { SIEM_RULE_MIGRATIONS_PATH } from '../../../../../common/siem_migrations/constants';
import {
CreateRuleMigrationRequestBody,
type CreateRuleMigrationResponse,
} from '../../../../../common/siem_migrations/model/api/rules/rule_migration.gen';
import { SIEM_RULE_MIGRATIONS_PATH } from '../../../../../common/siem_migrations/constants';
import type { SecuritySolutionPluginRouter } from '../../../../types';
import type { CreateRuleMigrationInput } from '../data/rule_migrations_data_rules_client';
import { withLicense } from './util/with_license';
Expand Down Expand Up @@ -47,7 +47,7 @@ export const registerSiemRuleMigrationsCreateRoute = (
migration_id: migrationId,
original_rule: originalRule,
}));

await ruleMigrationsClient.data.integrations.create();
await ruleMigrationsClient.data.rules.create(ruleMigrations);

return res.ok({ body: { migration_id: migrationId } });
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -32,10 +32,15 @@ export const MockRuleMigrationsDataResourcesClient = jest
.fn()
.mockImplementation(() => mockRuleMigrationsDataResourcesClient);

export const mockRuleMigrationsDataIntegrationsClient = {
retrieveIntegrations: jest.fn().mockResolvedValue([]),
};

// Rule migrations data client
export const mockRuleMigrationsDataClient = {
rules: mockRuleMigrationsDataRulesClient,
resources: mockRuleMigrationsDataResourcesClient,
integrations: mockRuleMigrationsDataIntegrationsClient,
};

export const MockRuleMigrationsDataClient = jest
Expand Down
Loading

0 comments on commit 288fd9c

Please sign in to comment.