Skip to content

Commit

Permalink
Fix a few issues with Data Prepper common use cases - log enrichment …
Browse files Browse the repository at this point in the history
…doc (opensearch-project#6916)

* Fix a few issues

Signed-off-by: Hai Yan <[email protected]>

* Update _data-prepper/common-use-cases/log-enrichment.md

Signed-off-by: Naarcha-AWS <[email protected]>

---------

Signed-off-by: Hai Yan <[email protected]>
Signed-off-by: Naarcha-AWS <[email protected]>
Co-authored-by: Naarcha-AWS <[email protected]>
  • Loading branch information
oeyh and Naarcha-AWS authored Apr 8, 2024
1 parent fb90f57 commit bf5cea1
Showing 1 changed file with 67 additions and 63 deletions.
130 changes: 67 additions & 63 deletions _data-prepper/common-use-cases/log-enrichment.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ You can perform different types of log enrichment with Data Prepper, including:

Use the [`drop_events`]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/drop-events/) processor to filter out specific log events before sending them to a sink. For example, if you're collecting web request logs and only want to store unsuccessful requests, you can create the following pipeline, which drops any requests for which the response is less than 400 so that only log events with HTTP status codes of 400 and higher remain.

```json
```yaml
log-pipeline:
source:
...
Expand Down Expand Up @@ -52,22 +52,22 @@ To perform analysis using the search terms, you can extract the value of `q` fro

The following example combines the `split_string` and `key_value` processors to extract query parameters from an Apache log line:

```json
pipeline
```yaml
pipeline:
...
processor:
- grok:
match:
message: [ "%{COMMONAPACHELOG_DATATYPED}" ]
match:
message: [ "%{COMMONAPACHELOG_DATATYPED}" ]
- split_string:
entries:
- source: request
delimiter: "?"
entries:
- source: request
delimiter: "?"
- key_value:
source: "/request/1"
field_split_characters: "&"
value_split_characters: "="
destination: query_params
source: "/request/1"
field_split_characters: "&"
value_split_characters: "="
destination: query_params
```
{% include copy-curl.html %}

Expand All @@ -77,20 +77,20 @@ The different [mutate event]({{site.url}}{{site.baseurl}}/data-prepper/pipelines

In this example, the first processor sets the value of the `debug` key to `true` if the key already exists in the event. The second processor only sets the `debug` key to `true` if the key doesn't exist in the event because `overwrite_if_key_exists` is set to `true`.

```json
```yaml
...
processor:
- add_entries:
entries:
- key: "debug"
value: true
entries:
- key: "debug"
value: true
...
processor:
- add_entries:
entries:
- key: "debug"
value: true
overwrite_if_key_exists: true
entries:
- key: "debug"
value: true
overwrite_if_key_exists: true
...
```
{% include copy-curl.html %}
Expand All @@ -99,12 +99,12 @@ You can also use a format string to construct new entries from existing events.

For example, the following pipeline adds new event entries dynamically from existing events:

```json
```yaml
processor:
- add_entries:
entries:
- key: "key_three"
format: "${key_one}-${key_two}
entries:
- key: "key_three"
format: "${key_one}-${key_two}
```
{% include copy-curl.html %}

Expand Down Expand Up @@ -133,13 +133,13 @@ The processor transforms it into an event with a new key named `key_three`, whic

The various [mutate string]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/mutate-string/) processors offer tools that you can use to manipulate strings in incoming data. For example, if you need to split a string into an array, you can use the `split_string` processor:

```json
```yaml
...
processor:
- split_string:
entries:
- source: "message"
delimiter: "&"
entries:
- source: "message"
delimiter: "&"
...
```
{% include copy-curl.html %}
Expand All @@ -152,15 +152,15 @@ The [`list_to_map`]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configur

For example, consider the following processor configuration:

```json
```yaml
...
processor:
- list_to_map:
key: "name"
source: "A-car-as-list"
target: "A-car-as-map"
value_key: "value"
flatten: true
- list_to_map:
key: "name"
source: "A-car-as-list"
target: "A-car-as-map"
value_key: "value"
flatten: true
...
```
{% include copy-curl.html %}
Expand Down Expand Up @@ -230,20 +230,19 @@ As another example, consider an incoming event with the following structure:

You can define the following options in the processor configuration:

```json
```yaml
...
processor:
- list_to_map:
key: "somekey"
source: "mylist"
target: "myobject"
value_key: "value"
flatten: true
- list_to_map:
key: "somekey"
source: "mylist"
target: "myobject"
flatten: true
...
```
{% include copy-curl.html %}

The processor modifies the event by removing `mylist` and adding the new `myobject` object:
The processor modifies the event by adding the new `myobject` object:

```json
{
Expand All @@ -266,6 +265,7 @@ The processor modifies the event by removing `mylist` and adding the new `myobje
"somevalue" : "val-b3",
"anothervalue" : "val-b4"
}
]
"c" : [
{
"somekey" : "c",
Expand All @@ -278,21 +278,22 @@ The processor modifies the event by removing `mylist` and adding the new `myobje
```
{% include copy-curl.html %}

In many cases, you may want to flatten the array for each key. In these situations, you must choose only one object to retain. The processor offers a choice of either first or last. For example, consider the following:
In many cases, you may want to flatten the array for each key. In these situations, you can choose which object to retain. The processor offers a choice of either first or last. For example, consider the following:

```json
```yaml
...
processor:
- list_to_map:
key: "somekey"
source: "mylist"
target: "myobject"
flatten: true
- list_to_map:
key: "somekey"
source: "mylist"
target: "myobject"
flatten: true
flattened_element: first
...
```
{% include copy-curl.html %}

The incoming event structure is then flattened accordingly:
The fields in the newly created `myobject` are then flattened accordingly:

```json
{
Expand Down Expand Up @@ -321,7 +322,7 @@ The incoming event structure is then flattened accordingly:

The [`date`]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/date/) processor parses the `timestamp` key from incoming events by converting it to International Organization for Standardization (ISO) 8601 format:

```json
```yaml
...
processor:
- date:
Expand Down Expand Up @@ -357,9 +358,9 @@ It converts the event to the following format:

The `date` processor can generate timestamps for incoming events if you specify `@timestamp` for the `destination` option:

```json
```yaml
...
processor:
processor:
- date:
from_time_received: true
destination: "@timestamp"
Expand All @@ -371,24 +372,27 @@ The `date` processor can generate timestamps for incoming events if you specify

The [`substitute_string`]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/substitute-string/) processor (which is one of the mutate string processors) lets you derive a punctuation pattern from incoming events. In the following example pipeline, the processor will scan incoming Apache log events and derive punctuation patterns from them:

```json
```yaml
processor:
- substitute_string:
entries:
- source: "message"
from: "[a-zA-Z0-9_]+"
to:""
- substitute_string:
entries:
- source: "message"
from: "[ ]+"
to: "_"
from: "[a-zA-Z0-9_]+"
to:""
- source: "message"
from: "[ ]+"
to: "_"
```
{% include copy-curl.html %}

The following incoming Apache HTTP log will generate a punctuation pattern:
The following incoming Apache HTTP log:

```json
[{"message":"10.10.10.11 - admin [19/Feb/2015:15:50:36 -0500] \"GET /big2.pdf HTTP/1.1\" 200 33973115 0.202 \"-\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/40.0.2214.111 Safari/537.36\""}]
```

Generates the following punctuation pattern:
```json
{"message":"..._-_[//:::_-]_\"_/._/.\"_._\"-\"_\"/._(;_)_/._(,_)_/..._/.\""}
```
{% include copy-curl.html %}
Expand Down

0 comments on commit bf5cea1

Please sign in to comment.