From bf5cea1aa92aa54456be575a2ab15badc6b2e95c Mon Sep 17 00:00:00 2001 From: Hai Yan <8153134+oeyh@users.noreply.github.com> Date: Mon, 8 Apr 2024 14:11:06 -0500 Subject: [PATCH] Fix a few issues with Data Prepper common use cases - log enrichment doc (#6916) * Fix a few issues Signed-off-by: Hai Yan * Update _data-prepper/common-use-cases/log-enrichment.md Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --------- Signed-off-by: Hai Yan Signed-off-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> Co-authored-by: Naarcha-AWS <97990722+Naarcha-AWS@users.noreply.github.com> --- .../common-use-cases/log-enrichment.md | 130 +++++++++--------- 1 file changed, 67 insertions(+), 63 deletions(-) diff --git a/_data-prepper/common-use-cases/log-enrichment.md b/_data-prepper/common-use-cases/log-enrichment.md index b4004251c6..0d8ce4ab7d 100644 --- a/_data-prepper/common-use-cases/log-enrichment.md +++ b/_data-prepper/common-use-cases/log-enrichment.md @@ -20,7 +20,7 @@ You can perform different types of log enrichment with Data Prepper, including: Use the [`drop_events`]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/drop-events/) processor to filter out specific log events before sending them to a sink. For example, if you're collecting web request logs and only want to store unsuccessful requests, you can create the following pipeline, which drops any requests for which the response is less than 400 so that only log events with HTTP status codes of 400 and higher remain. -```json +```yaml log-pipeline: source: ... @@ -52,22 +52,22 @@ To perform analysis using the search terms, you can extract the value of `q` fro The following example combines the `split_string` and `key_value` processors to extract query parameters from an Apache log line: -```json -pipeline +```yaml +pipeline: ... processor: - grok: - match: - message: [ "%{COMMONAPACHELOG_DATATYPED}" ] + match: + message: [ "%{COMMONAPACHELOG_DATATYPED}" ] - split_string: - entries: - - source: request - delimiter: "?" + entries: + - source: request + delimiter: "?" - key_value: - source: "/request/1" - field_split_characters: "&" - value_split_characters: "=" - destination: query_params + source: "/request/1" + field_split_characters: "&" + value_split_characters: "=" + destination: query_params ``` {% include copy-curl.html %} @@ -77,20 +77,20 @@ The different [mutate event]({{site.url}}{{site.baseurl}}/data-prepper/pipelines In this example, the first processor sets the value of the `debug` key to `true` if the key already exists in the event. The second processor only sets the `debug` key to `true` if the key doesn't exist in the event because `overwrite_if_key_exists` is set to `true`. -```json +```yaml ... processor: - add_entries: - entries: - - key: "debug" - value: true + entries: + - key: "debug" + value: true ... processor: - add_entries: - entries: - - key: "debug" - value: true - overwrite_if_key_exists: true + entries: + - key: "debug" + value: true + overwrite_if_key_exists: true ... ``` {% include copy-curl.html %} @@ -99,12 +99,12 @@ You can also use a format string to construct new entries from existing events. For example, the following pipeline adds new event entries dynamically from existing events: -```json +```yaml processor: - add_entries: - entries: - - key: "key_three" - format: "${key_one}-${key_two} + entries: + - key: "key_three" + format: "${key_one}-${key_two} ``` {% include copy-curl.html %} @@ -133,13 +133,13 @@ The processor transforms it into an event with a new key named `key_three`, whic The various [mutate string]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/mutate-string/) processors offer tools that you can use to manipulate strings in incoming data. For example, if you need to split a string into an array, you can use the `split_string` processor: -```json +```yaml ... processor: - split_string: - entries: - - source: "message" - delimiter: "&" + entries: + - source: "message" + delimiter: "&" ... ``` {% include copy-curl.html %} @@ -152,15 +152,15 @@ The [`list_to_map`]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configur For example, consider the following processor configuration: -```json +```yaml ... processor: - - list_to_map: - key: "name" - source: "A-car-as-list" - target: "A-car-as-map" - value_key: "value" - flatten: true + - list_to_map: + key: "name" + source: "A-car-as-list" + target: "A-car-as-map" + value_key: "value" + flatten: true ... ``` {% include copy-curl.html %} @@ -230,20 +230,19 @@ As another example, consider an incoming event with the following structure: You can define the following options in the processor configuration: -```json +```yaml ... processor: - - list_to_map: - key: "somekey" - source: "mylist" - target: "myobject" - value_key: "value" - flatten: true + - list_to_map: + key: "somekey" + source: "mylist" + target: "myobject" + flatten: true ... ``` {% include copy-curl.html %} -The processor modifies the event by removing `mylist` and adding the new `myobject` object: +The processor modifies the event by adding the new `myobject` object: ```json { @@ -266,6 +265,7 @@ The processor modifies the event by removing `mylist` and adding the new `myobje "somevalue" : "val-b3", "anothervalue" : "val-b4" } + ] "c" : [ { "somekey" : "c", @@ -278,21 +278,22 @@ The processor modifies the event by removing `mylist` and adding the new `myobje ``` {% include copy-curl.html %} -In many cases, you may want to flatten the array for each key. In these situations, you must choose only one object to retain. The processor offers a choice of either first or last. For example, consider the following: +In many cases, you may want to flatten the array for each key. In these situations, you can choose which object to retain. The processor offers a choice of either first or last. For example, consider the following: -```json +```yaml ... processor: - - list_to_map: - key: "somekey" - source: "mylist" - target: "myobject" - flatten: true + - list_to_map: + key: "somekey" + source: "mylist" + target: "myobject" + flatten: true + flattened_element: first ... ``` {% include copy-curl.html %} -The incoming event structure is then flattened accordingly: +The fields in the newly created `myobject` are then flattened accordingly: ```json { @@ -321,7 +322,7 @@ The incoming event structure is then flattened accordingly: The [`date`]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/date/) processor parses the `timestamp` key from incoming events by converting it to International Organization for Standardization (ISO) 8601 format: -```json +```yaml ... processor: - date: @@ -357,9 +358,9 @@ It converts the event to the following format: The `date` processor can generate timestamps for incoming events if you specify `@timestamp` for the `destination` option: -```json +```yaml ... - processor: + processor: - date: from_time_received: true destination: "@timestamp" @@ -371,24 +372,27 @@ The `date` processor can generate timestamps for incoming events if you specify The [`substitute_string`]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/substitute-string/) processor (which is one of the mutate string processors) lets you derive a punctuation pattern from incoming events. In the following example pipeline, the processor will scan incoming Apache log events and derive punctuation patterns from them: -```json +```yaml processor: - - substitute_string: - entries: - - source: "message" - from: "[a-zA-Z0-9_]+" - to:"" + - substitute_string: + entries: - source: "message" - from: "[ ]+" - to: "_" + from: "[a-zA-Z0-9_]+" + to:"" + - source: "message" + from: "[ ]+" + to: "_" ``` {% include copy-curl.html %} -The following incoming Apache HTTP log will generate a punctuation pattern: +The following incoming Apache HTTP log: ```json [{"message":"10.10.10.11 - admin [19/Feb/2015:15:50:36 -0500] \"GET /big2.pdf HTTP/1.1\" 200 33973115 0.202 \"-\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/40.0.2214.111 Safari/537.36\""}] +``` +Generates the following punctuation pattern: +```json {"message":"..._-_[//:::_-]_\"_/._/.\"_._\"-\"_\"/._(;_)_/._(,_)_/..._/.\""} ``` {% include copy-curl.html %}