Skip to content

Commit

Permalink
Add CIDR and User Agent processors (opensearch-project#4585)
Browse files Browse the repository at this point in the history
* Add CIDR and User Agent processors

Signed-off-by: Naarcha-AWS <[email protected]>

* Add CIDR functions

Signed-off-by: Naarcha-AWS <[email protected]>

* Fix typos

Signed-off-by: Naarcha-AWS <[email protected]>

* Add technical feedback

Signed-off-by: Naarcha-AWS <[email protected]>

* Update user-agent.md

Signed-off-by: Naarcha-AWS <[email protected]>

* Apply suggestions from code review

Co-authored-by: Chris Moore <[email protected]>
Signed-off-by: Naarcha-AWS <[email protected]>

* Update _data-prepper/pipelines/expression-syntax.md

Co-authored-by: Hai Yan <[email protected]>
Signed-off-by: Naarcha-AWS <[email protected]>

* Apply suggestions from code review

Co-authored-by: Nathan Bower <[email protected]>
Signed-off-by: Naarcha-AWS <[email protected]>

* Update user-agent.md

Signed-off-by: Naarcha-AWS <[email protected]>

---------

Signed-off-by: Naarcha-AWS <[email protected]>
Signed-off-by: Naarcha-AWS <[email protected]>
Co-authored-by: Chris Moore <[email protected]>
Co-authored-by: Hai Yan <[email protected]>
Co-authored-by: Nathan Bower <[email protected]>
  • Loading branch information
4 people authored and harshavamsi committed Oct 31, 2023
1 parent a8fb527 commit 8d1e654
Show file tree
Hide file tree
Showing 2 changed files with 100 additions and 0 deletions.
63 changes: 63 additions & 0 deletions _data-prepper/pipelines/configuration/processors/user-agent.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
---
layout: default
title: user_agent
parent: Processors
grand_parent: Pipelines
nav_order: 130
---

# user_agent

The `user_agent` processor parses any user agent (UA) string in an event and then adds the parsing results to the event's write data.

## Usage

In this example, the `user_agent` processor calls the source that contains the UA string, the `ua` field, and indicates the key to which the parsed string will write, `user_agent`, as shown in the following example:

```yaml
processor:
- user_agent:
source: "ua"
target: "user_agent"
```
The following example event contains the `ua` field with a string that provides information about a user:

```json
{
"ua": "Mozilla/5.0 (iPhone; CPU iPhone OS 13_5_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1.1 Mobile/15E148 Safari/604.1"
}
```

The `user_agent` processor parses the string into a format compatible with Elastic Common Schema (ECS) and then adds the result to the specified target, as shown in the following example:

```json
{
"user_agent": {
"original": "Mozilla/5.0 (iPhone; CPU iPhone OS 13_5_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1.1 Mobile/15E148 Safari/604.1",
"os": {
"version": "13.5.1",
"full": "iOS 13.5.1",
"name": "iOS"
},
"name": "Mobile Safari",
"version": "13.1.1",
"device": {
"name": "iPhone"
}
},
"ua": "Mozilla/5.0 (iPhone; CPU iPhone OS 13_5_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1.1 Mobile/15E148 Safari/604.1"
}
```

## Configuration options

You can use the following configuration options with the `user_agent` processor.

| Option | Required | Description |
| :--- | :--- | :--- |
| `source` | Yes | The field in the event that will be parsed.
| `target` | No | The field to which the parsed event will write. Default is `user_agent`.
| `exclude_original` | No | Determines whether to exclude the original UA string from the parsing result. Defaults to `false`.
| `cache_size` | No | The cache size of the parser in megabytes. Defaults to `1000`. |
| `tags_on_parse_failure` | No | The tag to add to an event if the `user_agent` processor fails to parse the UA string. |
37 changes: 37 additions & 0 deletions _data-prepper/pipelines/expression-syntax.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ Operators are listed in order of precedence (top to bottom, left to right).
| `and`, `or` | Conditional Expression | left-to-right |

## Reserved for possible future functionality

Reserved symbol set: `^`, `*`, `/`, `%`, `+`, `-`, `xor`, `=`, `+=`, `-=`, `*=`, `/=`, `%=`, `++`, `--`, `${<text>}`

## Set initializer
Expand All @@ -33,15 +34,19 @@ The set initializer defines a set or term and/or expressions.
The following are examples of set initializer syntax.

#### HTTP status codes

```
{200, 201, 202}
```

#### HTTP response payloads

```
{"Created", "Accepted"}
```

#### Handle multiple event types with different keys

```
{/request_payload, /request_message}
```
Expand All @@ -57,9 +62,11 @@ A priority expression identifies an expression that will be evaluated at the hig
```

## Relational operators

Relational operators are used to test the relationship of two numeric values. The operands must be numbers or JSON Pointers that resolve to numbers.

### Syntax

```
<Number | JSON Pointer> < <Number | JSON Pointer>
<Number | JSON Pointer> <= <Number | JSON Pointer>
Expand All @@ -68,11 +75,13 @@ Relational operators are used to test the relationship of two numeric values. Th
```

### Example

```
/status_code >= 200 and /status_code < 300
```

## Equality operators

Equality operators are used to test whether two values are equivalent.

### Syntax
Expand Down Expand Up @@ -106,6 +115,7 @@ null != /response
```

#### Conditional expression

A conditional expression is used to chain together multiple expressions and/or values.

#### Syntax
Expand Down Expand Up @@ -208,3 +218,30 @@ White space is **required** surrounding set initializers, priority expressions,
| `==`, `!=` | Equality operators | No | `/status == 200`<br>`/status_code==200` | |
| `and`, `or`, `not` | Conditional operators | Yes | `/a<300 and /b>200` | `/b<300and/b>200` |
| `,` | Set value delimiter | No | `/a in {200, 202}`<br>`/a in {200,202}`<br>`/a in {200 , 202}` | `/a in {200,}` |


## Functions

Data Prepper supports the following built-in functions that can be used in an expression.

### `length()`

The `length()` function takes one argument of the JSON pointer type and returns the length of the value passed. For example, `length(/message)` returns a length of `10` when a key message exists in the event and has a value of `1234567890`.

### `hasTags()`

The `hastags()` function takes one or more string type arguments and returns `true` if all the arguments passed are present in an event's tags. When an argument does not exist in the event's tags, the function returns `false`. For example, if you use the expression `hasTags("tag1")` and the event contains `tag1`, Data Prepper returns `true`. If you use the expression `hasTags("tag2")` but the event only contains a `tag1` tag, Data Prepper returns `false`.

### `getMetadata()`

The `getMetadata()` function takes one literal string argument to look up specific keys in a an event's metadata. If the key contains a `/`, then the function looks up the metadata recursively. When passed, the expression returns the value corresponding to the key. The value returned can be of any type. For example, if the metadata contains `{"key1": "value2", "key2": 10}`, then the function, `getMetadata("key1")`, returns `value2`. The function, `getMetadata("key2")`, returns 10.

### `contains()`

The `contains()` function takes two string arguments and determines whether either a literal string or a JSON pointer is contained within an event. When the second argument contains a substring of the first argument, such as `contains("abcde", "abcd")`, the function returns `true`. If the second argument does not contain any substrings, such as `contains("abcde", "xyz")`, it returns `false`.

### `cidrContains()`

The `cidrContains()` function takes two or more arguments. The first argument is a JSON pointer, which represents the key to the IP address that is checked. It supports both IPv4 and IPv6 addresses. Every argument that comes after the key is a string type that represents CIDR blocks that are checked against.

If the IP address in the first argument is in the range of any of the given CIDR blocks, the function returns `true`. If the IP address is not in the range of the CIDR blocks, the function returns `false`. For example, `cidrContains(/sourceIp,"192.0.2.0/24","10.0.1.0/16")` will return `true` if the `sourceIp` field indicated in the JSON pointer has a value of `192.0.2.5`.

0 comments on commit 8d1e654

Please sign in to comment.