This RFC describes an approach for transforming one input log event into multiple log events.
In:
- Turning a log event into multiple log events
Out:
- Turning a log event into multiple metrics
- Turning a metric into multiple metrics
- Turning a metric into multiple logs
Users have asked 1 2 to be able to transform an incoming event into multiple events. This is useful, for example, when parsing an incoming JSON payload that contains an array for which you'd like to publish an event for each element.
Currently, users are restricted to using the Lua or WASM transform for this, introducing a substantial bottleneck.
The proposal is to:
- Extend
remap
so that if.
is an array at the end, it will emit one event for each element in that array - Add an
unnest
VRL function that will transform an object into an array of objects using a specified field on the input object
Example input:
{ "host": "localhost", "events": [{ "message": "foo" }, { "message": "bar" }] }
Remap transform config:
[transforms.remap]
type = "remap"
source = """
. = unnest(., "events")
"""
Output:
{ "host": "localhost", "message": "foo" }
{ "host": "localhost", "message": "bar" }
Additionally, we will provide only_fields
and except_fields
as options on the unnest
function to allow users to
select which fields will be kept. These match similar semantics to the encoding
options on sinks.
Example input:
{ "timestamp": "2020-12-09T16:09:53+00:00", "host": "localhost", "events": [{ "message": "foo" }, { "message": "bar" }] }
Remap transform config:
[transforms.remap]
type = "remap"
source = """
. = unnest(., "events", only_fields: ["host"])
"""
Output:
{ "host": "localhost", "message": "foo" }
{ "host": "localhost", "message": "bar" }
Here the timestamp
field is not preserved.
The remap
transform can also be used to emit multiple events from a single incoming event by setting the root path,
.
, to an array.
For example, given an input of:
{ "host": "localhost", "events": [{ "message": "foo" }, { "message": "bar" }, 1] }
And a transform of:
[transforms.remap]
type = "remap"
source = """
. = unnest(., "events")
"""
The following events will be output:
{ "host": "localhost", "message": "foo" }
{ "host": "localhost", "message": "bar" }
{ "host": "localhost", "message": "1" }
That is, each record in the indicated field will be emitted as its own event, merged with any other fields existing at the top-level of the event.
If any elements in the array field are not an object, they will be set as the message
key.
This enhances remap
to be able to emit multiple events. Without this, users will continue to have to use Lua or WASM
to achieve this, which introduces a performance bottleneck compared to this proposal.
These are similar to the proposed approach.
- Adds an additional transform to be aware of.
- Less flexible than
emit_log
alternative which could emit arbitrary events fromremap
transform. - Ongoing maintenance burden should be minimal.
(previous proposal)
We add an explode
transform that makes use of the Vector Remap Language (VRL) to emit a set of events from one input
event by requiring the VRL program to resolve to an array. For each element of the array, a separate event will be
published.
Example input:
{ "events": [{ "message": "foo" }, { "message": "bar" }] }
Transform config:
[transforms.explode]
type = "explode"
source = "array!(.events) ?? []" # will be typechecked at compile-time
Output:
{"message": "foo"}
{"message": "bar"}
Support for iteration as part of #6031 will allow for users to do things like map fields onto each element. An example might look something like:
Input:
{ "host": "foobar", "events": [{ "message": "foo" }, { "message": "bar" }] }
Transform config (actual mapping syntax TBA):
[transforms.explode]
type = "explode"
source = "map(array!(.events), |event| event.host = .host) ?? []"
Output:
{"host": "foobar", "message": "foo"}
{"host": "foobar", "message": "bar"}
This is similar to the support that the current explode
transform PR
has for merging in top-level fields when creating events from a subfield that has an array.
(previous proposal)
This is roughly the same as vectordotdev#6330 (comment) with some slight tweaks.
A new emit_log
function will be added to the VRL stdlib
emit_log(value: Object)
This function will cause the object passed as value to be emitted at that point and flushed downstream. The emitted log will have its metadata copied from the input event.
Additionally, an emit_root
(we can work on the naming) config option will be added to the remap
transform to
configure whether .
is emitted after the transform runs. It will default to true
to preserve the current behavior
but can be set to false
by users to suppress this behavior. Admittedly, I'm not wild about introducing this additional
config option, but I'm not seeing another great alternative.
This will be able to be combined with the iteration mechanism that will be introduced #6031 to emit an unknown number of events. Naively this might look something like:
for stooge in .stooges
emit_log(stooge)
end
In the future we can also add functions for emitting metrics like:
emit_counter(namespace: String, name: String, timestamp: Timestamp, value: Float, kind: "absolute"|"relative")
I considered having just an emit_metric()
but it would require users to pass in objects that match exactly the
internal representation we have for metrics.
The remap
transform would gain an extra configuration option:
emit_root = true/false # default false
When emit_root
is true
, the value of .
will be emitted at the end of the remap program. When emit_root
is false,
the value of .
will not be emitted. Instead users should use the emit_log
function to emit.
We could avoid having an emit_root
config option on the remap transform by just not emitting automatically if we see
an emit_log
function in the user-provided source. I personally think this would be a bit surprising, but it is an
option.
This would modify remap to allow setting .
to an array of objects to have each element emitted independently.
This turned out to require a bigger change than I expected in that .
is linked to mutating the underlying event
(metric or log). It's definitely doable, but would require a substantial refactoring and so caused me to take a step
back and consider the alternatives, prompting this RFC.
Using a separate explode
transform keeps the responsibilities of the transform more clear and avoids having to
refactor the remap
transform to decouple .
from the underlying Vector Event
object; though we may still want to do
this in the future anyway.
- Modify
remap
to treat setting.
to an array to indicate that multiple events should be emitted - Implement
unnest
VRL function