Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[Rules migration] Adding initial implementation of integration RAG (#…
…200922) ## Summary This is part 1 of a 2-3 part PR. This involes the initial implementation of the integration RAG, while part 2 focuses on prompt tuning, and a potential part 3 is the change from local JSON file to EPR client and further tuning of prompts and search result ranking. The change introduces a new feature to the rule migration, enriching the current graph implementation with metadata from available integrations, allowing us to currently choose the correct index patterns needed if a relevant integration is found. Changes in the PR: - Introduction of the `integration data client`, which might later be moved under resource. - Moving` translate_rule` node to its own subgraph, then divided into multiple nodes to support the RAG search step. - The creation and population of the index used to store the integration metadata, together with the `semantic_text` mapping used by the default included ELSER model. - Updates to `elastic_rule` type, to include the integration ID's and index patterns. ## Example finished task: ```json [ { "migration_id": "3d4cae35-eb8d-49fe-960a-2ef17bc026c6", "original_rule": { "id": "f8c325ea-506e-4105-8ccf-da1492e90115", "vendor": "splunk", "title": "Linux Auditd Add User Account Type", "description": "The following analytic detects the suspicious add user account type. This behavior is critical for a SOC to monitor because it may indicate attempts to gain unauthorized access or maintain control over a system. Such actions could be signs of malicious activity. If confirmed, this could lead to serious consequences, including a compromised system, unauthorized access to sensitive data, or even a wider breach affecting the entire network. Detecting and responding to these signs early is essential to prevent potential security incidents.", "query": "sourcetype=\"linux:audit\" type=ADD_USER \n| rename hostname as dest \n| stats count min(_time) as firstTime max(_time) as lastTime by exe pid dest res UID type \n| `security_content_ctime(firstTime)` \n| `security_content_ctime(lastTime)`\n| search *", "query_language": "spl", "mitre_attack_ids": [ "T1136" ] }, "@timestamp": "2024-11-21T11:37:10.548Z", "status": "completed", "created_by": "elastic", "updated_at": "2024-11-21T11:38:01.397Z", "updated_by": "elastic", "comments": [ "## Migration Summary\n\n1. Source selection:\n - The original SPL query used `sourcetype=\"linux:audit\"`. In the ES|QL query, we're using the provided index patterns that include logs related to system audit and auditd.\n\n2. Event type filtering:\n - The SPL query filtered for `type=ADD_USER`. In ES|QL, we use `WHERE event.type == \"ADD_USER\"` to achieve the same filtering.\n\n3. Field renaming:\n - The SPL query renamed `hostname` to `dest`. In ES|QL, we use the `RENAME` command to rename `host.hostname` to `dest`, assuming the ECS field mapping.\n\n4. Statistics calculation:\n - The SPL query used `stats` to calculate count, min(_time), and max(_time). In ES|QL, we use the `STATS` command with `COUNT(*)`, `MIN(@timestamp)`, and `MAX(@timestamp)` to achieve similar results.\n - The grouping fields are adjusted to match ECS field names:\n - `exe` -> `process.executable`\n - `pid` -> `process.pid`\n - `dest` remains the same (after renaming)\n - `res` -> `event.outcome`\n - `UID` -> `user.id`\n - `type` -> `event.type`\n\n5. Time conversion:\n - The SPL query used `security_content_ctime` function for time conversion. In ES|QL, we use the `EVAL` command with `TO_DATETIME` function to convert the `firstTime` and `lastTime` fields to datetime format.\n\n6. Additional notes:\n - The `search *` at the end of the SPL query is not necessary in ES|QL as it doesn't change the result set.\n - The ES|QL query assumes that the `@timestamp` field is used for event timestamps, which is standard in ECS.\n\nThis ES|QL query should provide equivalent functionality to the original Splunk query, adapted for Elastic Security and using ECS field names where appropriate." ], "translation_result": "full", "elastic_rule": { "severity": "low", "query": "FROM logs-system_audit.package-* logs-endpoint.events.api-* logs-endpoint.events.file-* logs-endpoint.events.library-* logs-endpoint.events.network-* logs-endpoint.events.process-* logs-endpoint.events.registry-* logs-endpoint.events.security-* logs-auditd.log-*\n| WHERE event.type == \"ADD_USER\"\n| RENAME host.hostname AS dest\n| STATS count = COUNT(*), firstTime = MIN(@timestamp), lastTime = MAX(@timestamp) BY process.executable, process.pid, dest, event.outcome, user.id, event.type\n| EVAL firstTime = TO_DATETIME(firstTime), lastTime = TO_DATETIME(lastTime)", "description": "The following analytic detects the suspicious add user account type. This behavior is critical for a SOC to monitor because it may indicate attempts to gain unauthorized access or maintain control over a system. Such actions could be signs of malicious activity. If confirmed, this could lead to serious consequences, including a compromised system, unauthorized access to sensitive data, or even a wider breach affecting the entire network. Detecting and responding to these signs early is essential to prevent potential security incidents.", "index_patterns": [ "logs-system_audit.package-*", "logs-endpoint.events.api-*", "logs-endpoint.events.file-*", "logs-endpoint.events.library-*", "logs-endpoint.events.network-*", "logs-endpoint.events.process-*", "logs-endpoint.events.registry-*", "logs-endpoint.events.security-*", "logs-auditd.log-*" ], "query_language": "esql", "title": "Linux Auditd Add User Account Type", "integration_ids": [ "system_audit", "endpoint", "auditd" ] }, "_id": "8eKDTpMBwtRPKDL_CLKW" }, { "migration_id": "3d4cae35-eb8d-49fe-960a-2ef17bc026c6", "original_rule": { "id": "7b87c556-0ca4-47e0-b84c-6cd62a0a3e90", "vendor": "splunk", "title": "Linux Auditd Change File Owner To Root", "description": "The following analytic detects the use of the 'chown' command to change a file owner to 'root' on a Linux system. It leverages Linux Auditd telemetry, specifically monitoring command-line executions and process details. This activity is significant as it may indicate an attempt to escalate privileges by adversaries, malware, or red teamers. If confirmed malicious, this action could allow an attacker to gain root-level access, leading to full control over the compromised host and potential persistence within the environment.", "query": "`linux_auditd` `linux_auditd_normalized_proctitle_process`\r\n| rename host as dest \r\n| where LIKE (process_exec, \"%chown %root%\") \r\n| stats count min(_time) as firstTime max(_time) as lastTime by process_exec proctitle normalized_proctitle_delimiter dest \r\n| `security_content_ctime(firstTime)` \r\n| `security_content_ctime(lastTime)`\r\n| `linux_auditd_change_file_owner_to_root_filter`", "query_language": "spl", "mitre_attack_ids": [ "T1222" ] }, "@timestamp": "2024-11-21T11:37:10.548Z", "status": "completed", "created_by": "elastic", "updated_at": "2024-11-21T11:38:04.527Z", "updated_by": "elastic", "comments": [ "## Migration Summary\n\n1. Source selection:\n - The original SPL query used `linux_auditd` and `linux_auditd_normalized_proctitle_process` macros. In ES|QL, we're using the recommended index patterns for Elastic Endpoint Security events.\n\n2. Field mapping:\n - `host` was renamed to `dest` in the original query. In ECS, we use `host.name`.\n - `process_exec` is mapped to `process.executable` in ECS.\n - `proctitle` and `normalized_proctitle_delimiter` are combined into `process.command_line` in ECS.\n - `_time` is replaced with `@timestamp` in ECS.\n\n3. Filtering:\n - The `LIKE` function is used in ES|QL, which is equivalent to the SPL `LIKE` function.\n - We're checking for \"chown\" in the process executable and \"root\" in the process arguments.\n\n4. Statistics:\n - The `STATS` command in ES|QL is similar to SPL's `stats` command.\n - We use `COUNT(*)`, `MIN(@timestamp)`, and `MAX(@timestamp)` for the count, firstTime, and lastTime respectively.\n - The `BY` clause groups the results by the relevant fields.\n\n5. Sorting:\n - Added a `SORT` command to order the results by count in descending order, which wasn't in the original query but is useful for identifying the most frequent occurrences.\n\n6. Removed:\n - The `security_content_ctime` function calls were removed as they are specific to Splunk. ES|QL uses the native datetime format.\n - The `linux_auditd_change_file_owner_to_root_filter` macro was not included as we don't have an equivalent in ES|QL. If there are specific filters in this macro, they would need to be added explicitly to the query.\n\nThis ES|QL query will detect the use of the 'chown' command to change a file owner to 'root' on a Linux system, using Elastic Endpoint Security events. It groups the results by the process details and host name, providing a count of occurrences and the first and last times the event was observed." ], "translation_result": "full", "elastic_rule": { "severity": "low", "query": "FROM logs-endpoint.events.api-* logs-endpoint.events.file-* logs-endpoint.events.library-* logs-endpoint.events.network-* logs-endpoint.events.process-* logs-endpoint.events.registry-* logs-endpoint.events.security-*\n| WHERE process.executable LIKE \"%chown%\" AND process.args LIKE \"%root%\"\n| STATS count = COUNT(*), firstTime = MIN(@timestamp), lastTime = MAX(@timestamp) BY process.executable, \n process.command_line, \n process.args, \n host.name\n| SORT count DESC", "description": "The following analytic detects the use of the 'chown' command to change a file owner to 'root' on a Linux system. It leverages Linux Auditd telemetry, specifically monitoring command-line executions and process details. This activity is significant as it may indicate an attempt to escalate privileges by adversaries, malware, or red teamers. If confirmed malicious, this action could allow an attacker to gain root-level access, leading to full control over the compromised host and potential persistence within the environment.", "index_patterns": [ "logs-endpoint.events.api-*", "logs-endpoint.events.file-*", "logs-endpoint.events.library-*", "logs-endpoint.events.network-*", "logs-endpoint.events.process-*", "logs-endpoint.events.registry-*", "logs-endpoint.events.security-*" ], "query_language": "esql", "title": "Linux Auditd Change File Owner To Root", "integration_ids": [ "endpoint" ] }, "_id": "8uKDTpMBwtRPKDL_CLKW" } ] ``` ## Testing locally Enable the flag ``` xpack.securitySolution.enableExperimental: ['siemMigrationsEnabled'] ``` Create the rule migration, add relevant macro resources and initiate the task. cURL request examples: <details> <summary>Rules migration `create` POST request</summary> ``` curl --location --request POST 'http://elastic:changeme@localhost:5601/internal/siem_migrations/rules' \ --header 'kbn-xsrf;' \ --header 'x-elastic-internal-origin: security-solution' \ --header 'elastic-api-version: 1' \ --header 'Content-Type: application/json' \ --data '[ { "id": "f8c325ea-506e-4105-8ccf-da1492e90115", "vendor": "splunk", "title": "Linux Auditd Add User Account Type", "description": "The following analytic detects the suspicious add user account type. This behavior is critical for a SOC to monitor because it may indicate attempts to gain unauthorized access or maintain control over a system. Such actions could be signs of malicious activity. If confirmed, this could lead to serious consequences, including a compromised system, unauthorized access to sensitive data, or even a wider breach affecting the entire network. Detecting and responding to these signs early is essential to prevent potential security incidents.", "query": "sourcetype=\"linux:audit\" type=ADD_USER \n| rename hostname as dest \n| stats count min(_time) as firstTime max(_time) as lastTime by exe pid dest res UID type \n| `security_content_ctime(firstTime)` \n| `security_content_ctime(lastTime)`\n| search *", "query_language":"spl", "mitre_attack_ids": [ "T1136" ] }, { "id": "7b87c556-0ca4-47e0-b84c-6cd62a0a3e90", "vendor": "splunk", "title": "Linux Auditd Change File Owner To Root", "description": "The following analytic detects the use of the '\''chown'\'' command to change a file owner to '\''root'\'' on a Linux system. It leverages Linux Auditd telemetry, specifically monitoring command-line executions and process details. This activity is significant as it may indicate an attempt to escalate privileges by adversaries, malware, or red teamers. If confirmed malicious, this action could allow an attacker to gain root-level access, leading to full control over the compromised host and potential persistence within the environment.", "query": "`linux_auditd` `linux_auditd_normalized_proctitle_process`\r\n| rename host as dest \r\n| where LIKE (process_exec, \"%chown %root%\") \r\n| stats count min(_time) as firstTime max(_time) as lastTime by process_exec proctitle normalized_proctitle_delimiter dest \r\n| `security_content_ctime(firstTime)` \r\n| `security_content_ctime(lastTime)`\r\n| `linux_auditd_change_file_owner_to_root_filter`", "query_language": "spl", "mitre_attack_ids": [ "T1222" ] } ]' ``` </details> <details> <summary>Add resources to migration ID</summary> - Assuming the connector `azureOpenAiGPT4o` is already created in the local environment. - Using the {{`migration_id`}} from the first POST request response ``` curl --location --request POST 'http://elastic:changeme@localhost:5601/internal/siem_migrations/rules/{{migration_id}}/respirces' \ --header 'kbn-xsrf;' \ --header 'x-elastic-internal-origin: security-solution' \ --header 'elastic-api-version: 1' \ --header 'Content-Type: application/json' \ --data '[ { "type": "macro", "name": "security_content_ctime", "content": "convert timeformat=\"%Y-%m-%dT%H:%M:%S\" ctime($field$)" }, { "type": "macro", "name": "linux_auditd_add_user_account_type_filter", "content": "search *" }, { "type": "macro", "name": "linux_auditd", "content": "sourcetype=\"linux:audit\"" }, { "type": "macro", "name": "linux_auditd_change_file_owner_to_root_filter", "content": "search *" } ]' ``` </details> <details> <summary>Rules migration `start` task request</summary> - Assuming the connector `azureOpenAiGPT4o` is already created in the local environment. - Using the {{`migration_id`}} from the first POST request response ``` curl --location --request PUT 'http://elastic:changeme@localhost:5601/internal/siem_migrations/rules/{{migration_id}}/start' \ --header 'kbn-xsrf;' \ --header 'x-elastic-internal-origin: security-solution' \ --header 'elastic-api-version: 1' \ --header 'Content-Type: application/json' \ --data '{ "connectorId": "azureOpenAiGPT4o" }' ``` </details>
- Loading branch information