-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
chore(docs): Add Log Namespacing docs (#16571)
This updates documentation and adds a blog-post announcing the log namespacing feature (as a beta release). --------- Co-authored-by: Spencer Gilbert <[email protected]>
- Loading branch information
1 parent
b8e3dbe
commit 7d098e4
Showing
5 changed files
with
263 additions
and
4 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,169 @@ | ||
--- | ||
title: Log Namespacing | ||
short: Log Namespacing | ||
description: Changing Vector's data model | ||
authors: ["fuchsnj"] | ||
date: "2023-06-30" | ||
badges: | ||
type: announcement | ||
domains: ["data model"] | ||
tags: [] | ||
--- | ||
|
||
The Vector team has been hard at work improving the data model of events in Vector. These | ||
changes are now available for beta testing for those who want to try it out and give feedback. | ||
This is an opt-in feature. Nothing should change unless you specifically enable it. | ||
|
||
## Why | ||
|
||
Currently, all data for events is placed at the root of the event, regardless of where the data came | ||
from or how it was obtained. Not only can that make it confusing to understand what a certain field | ||
represents (eg: was the `timestamp` field generated by Vector when it was ingested, or is it when | ||
the source originally created the event) but it can easily cause data collisions. | ||
|
||
Log namespacing also unblocks powerful features being worked on, such as end-to-end type checking | ||
of events in Vector. | ||
|
||
## How to enable | ||
|
||
The [global config] `schema.log_namespace` can be set to `true` to enable the new | ||
Log Namespacing feature for all components. The default is `false`. | ||
|
||
Every source also has a `log_namespace` config option. This will override the global setting, | ||
so you can try out Log Namespacing on individual sources. | ||
|
||
The following example enables the `log_namespace` feature globally, then disables it for a single | ||
source. | ||
|
||
```toml | ||
schema.log_namespace = true | ||
|
||
[sources.input_with_log_namespace] | ||
type = "demo_logs" | ||
format = "shuffle" | ||
lines = ["input_with_log_namespace"] | ||
interval = 1 | ||
|
||
[sources.input_without_log_namespace] | ||
type = "demo_logs" | ||
format = "shuffle" | ||
lines = ["input_without_log_namespace"] | ||
interval = 1 | ||
log_namespace = false | ||
|
||
[sinks.console] | ||
type = "console" | ||
inputs = ["input_with_log_namespace", "input_without_log_namespace"] | ||
encoding.codec = "json" | ||
|
||
``` | ||
|
||
## How It Works | ||
|
||
### Data Layout | ||
|
||
When handling log events, information is categorized into one of the following groups: | ||
(Examples are from the `datadog_agent` source) | ||
|
||
- Event Data: The decoded event data. (eg: the log itself) | ||
- Source Metadata: Metadata provided by the source of the event. (eg: hostname / tags) | ||
- Vector Metadata: Metadata provided by Vector. (eg: the time when Vector received the event) | ||
|
||
#### Without Log Namespacing | ||
|
||
All three of these are placed at the root of the event. The exact layout depends on the source, | ||
some fields are configurable, and the [global log schema] can change the name / location of some | ||
fields. | ||
|
||
Example log event from the `datadog_agent` source (with the JSON decoder) | ||
|
||
```json | ||
{ | ||
"ddsource": "vector", | ||
"ddtags": "env:prod", | ||
"hostname": "alpha", | ||
"foo": "foo field", | ||
"service": "cernan", | ||
"source_type": "datadog_agent", | ||
"bar": "bar field", | ||
"status": "warning", | ||
"timestamp": "1970-02-14T20:44:57.570Z" | ||
} | ||
``` | ||
|
||
#### With Log Namespacing | ||
|
||
When enabled, the layout of this data is well-defined and consistent. | ||
|
||
Event Data (and _only_ Event Data) is placed at the root of the event (eg: `.`). | ||
Source metadata is placed in event metadata, prefixed by the source name. (eg: `%datadog_agent`) | ||
Vector metadata is placed in event metadata, prefixed by `vector`. (eg: `%vector`) | ||
|
||
Generally sinks will only send the event data. If you want to include any metadata fields, | ||
it's recommended to use a [remap] transform to add data to the event as needed. | ||
|
||
It's important to note that previously the type of an event (`.`) would always be an object | ||
with fields. Now it is possible for event to be any type, such as a string. | ||
|
||
Example log event from the `datadog agent` source. (same data as the example above) | ||
|
||
Event root (`.`) | ||
|
||
```json | ||
{ | ||
"foo": "foo field", | ||
"bar": "bar field" | ||
} | ||
``` | ||
|
||
Source metadata fields (`%datadog_agent`) | ||
|
||
```json | ||
{ | ||
"ddsource": "vector", | ||
"ddtags": "env:prod", | ||
"hostname": "alpha", | ||
"service": "cernan", | ||
"status": "warning", | ||
"timestamp": "1970-02-14T20:44:57.570Z" | ||
} | ||
``` | ||
|
||
Source vector fields (`%vector`) | ||
|
||
```json | ||
{ | ||
"source_type": "datadog_agent", | ||
"ingest_timestamp": "1970-02-14T20:44:58.236Z" | ||
} | ||
``` | ||
|
||
Here is a sample VRL script accessing different parts of an event when log namespacing is enabled. | ||
|
||
```coffee | ||
event = . | ||
field_from_event = .foo | ||
|
||
all_metadata = % | ||
tags = %datadog_agent.ddtags | ||
timestamp = %vector.ingest_timestamp | ||
|
||
``` | ||
|
||
### Semantic Meaning | ||
|
||
Before Log Namespacing, Vector used the [global log schema] to keep certain types of information | ||
at known locations. This is changing, and when log namespacing is enabled, the [global log schema] | ||
will no longer be used. To replace it, a new feature called "semantic meaning" will be used instead. | ||
This allows assigning meaning to different fields of an event, which allows sinks to access | ||
information needed, such as timestamps, hostname, the message, etc. | ||
|
||
Semantic meaning will automatically be assigned by all sources. Sinks will check on startup to make | ||
sure a meaning exists for all required fields. If a source does not provide a required field, or | ||
a meaning needs to be manually adjusted for any reason, the VRL function [set_semantic_meaning] can | ||
be used. | ||
|
||
[global log schema]: /docs/reference/configuration/global-options/#log_schema | ||
[set_semantic_meaning]: /docs/reference/vrl/functions/#set_semantic_meaning | ||
[remap]: /docs/reference/configuration/transforms/remap/ | ||
[global config]: /docs/reference/configuration/global-options/#log_namespacing |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
44 changes: 44 additions & 0 deletions
44
website/cue/reference/remap/functions/set_semantic_meaning.cue
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,44 @@ | ||
package metadata | ||
|
||
remap: functions: set_semantic_meaning: { | ||
category: "Event" | ||
description: """ | ||
Sets a semantic meaning for an event. Note that this function assigns | ||
meaning at Vector startup, and has _no_ runtime behavior. It is suggested | ||
to put all calls to this function at the beginning of a VRL function. The function | ||
cannot be conditionally called (eg: using an if statement cannot stop the meaning | ||
from being assigned). | ||
""" | ||
|
||
arguments: [ | ||
{ | ||
name: "target" | ||
description: """ | ||
The path of the value that will be assigned a meaning. | ||
""" | ||
required: true | ||
type: ["path"] | ||
}, | ||
{ | ||
name: "meaning" | ||
description: """ | ||
The name of the meaning to assign. | ||
""" | ||
required: true | ||
type: ["string"] | ||
}, | ||
] | ||
internal_failure_reasons: [ | ||
] | ||
return: types: ["null"] | ||
|
||
examples: [ | ||
{ | ||
title: "Sets custom field semantic meaning" | ||
source: #""" | ||
set_semantic_meaning(.foo, "bar") | ||
"""# | ||
return: null | ||
}, | ||
] | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters