Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a way to select only certain fields in VRL #118

Open
Tracked by #8258
jszwedko opened this issue May 5, 2021 · 8 comments
Open
Tracked by #8258

Add a way to select only certain fields in VRL #118

jszwedko opened this issue May 5, 2021 · 8 comments
Labels
type: enhancement A value-adding code change that enhances its existing functionality vrl: stdlib Changes to the standard library

Comments

@jszwedko
Copy link
Member

jszwedko commented May 5, 2021

Requested by user in discord: https://discord.com/channels/742820443487993987/764187584452493323/839503742813732884

We had an only_fields at one point, but it got backed out. I can't remember why though.

@jszwedko jszwedko added the type: enhancement A value-adding code change that enhances its existing functionality label May 5, 2021
@jszwedko jszwedko changed the title Add a way to select _only_ certain fields in VRL Add a way to select _only_ certain fields in VRL May 5, 2021
@jszwedko jszwedko changed the title Add a way to select _only_ certain fields in VRL Add a way to select only certain fields in VRL May 5, 2021
@JeanMertz
Copy link
Contributor

We had an only_fields at one point, but it got backed out. I can't remember why though.

The reason was that it was sufficiently different from other functions (in that it was the only one that had a (hacky) way to provide a variable number of arguments to it, up to a limit of 32) that we wanted to take our time getting that right.

It was also discussed that having iteration in the future would make this function obsolete, as you could do this via iteration + del (but this wasn't the reason we backed out of the function).

@jszwedko
Copy link
Member Author

jszwedko commented May 7, 2021

Aha I see.

We could support it with our current compiler with something like:

. = only_fields(., ["field1", "field2", "field3"])

I realize this differs a bit from del() though in that it takes paths as strings.

This would also be solved by iteration.

@JeanMertz
Copy link
Contributor

I realize this differs a bit from del() though in that it takes paths as strings.

Yeah, I'm not a fan of mixing "fields" (string type) and "paths" (path type) for the approximately same use-case. I'd prefer it if we stick to paths wherever they are needed, but I also reckon there's still some gaps there for that to be a viable solution everywhere.

This would also be solved by iteration.

Indeed. Although there we'd still have the issue that paths are currently evaluated at the call site, so you couldn't do something like foo = [.foo, .bar], as that would expand to foo = [<value of .foo>, <value of .bar>]. Nothing we can't solve, but worth pointing out as a blocker for allowing iteration over paths.

@JeanMertz JeanMertz added the vrl: stdlib Changes to the standard library label Jun 7, 2022
@fuchsnj fuchsnj transferred this issue from vectordotdev/vector Mar 28, 2023
@kkornienko-aparavi
Copy link

@jszwedko , in discord you mentioned that current event object can be copied, like:

original_event = .
. = { "message": .message }

and then copy can be changed.

Tried and faced an issue with this approach:

original_event = .
. = { "message": .message }

original_event.new_field.new_subfield = "test"  # it works
.new_field.new_subfield = "test" # fails with 'querying a field of a non-object type is unsupported'

So looks like new '.' is not the same type of object as an original.
Found a workaround, but it looks a bit strange:

. = object!(parse_json!("{}"))
.new_field.new_subfield = "test" # works

So I want to ask:

  1. Is this approach (copying original event to a new var, redefining . var to a new object) is at all safe from the memory leaking perspective?
  2. How the new object can be properly defined to be a 100% :) object (i.e. support defining sub-fields)

Thanks a lot!

@jszwedko
Copy link
Member Author

Hi @kkornienko-aparavi ,

Assignment in VRL creates a copy rather than a reference. That is:

original_event = . # this makes a copy of `.` and stores it in `original_event`
. = { "message": .message } # this updates `.` but does not modify `original_event`

original_event.new_field.new_subfield = "test"  # this only updates `original_event` and not `.`
.new_field.new_subfield = "test" # this field doesn't exist on `.` but does exist on `original_event`

Does that make sense?

@kkornienko-aparavi
Copy link

@jszwedko , thanks, I've got an idea about copy, sure.
But what I can't get why the copy and the original objects behave differently

why original_event.new_field.new_subfield = "test" works and .new_field.new_subfield = "test" doesn't

Or, differently - why this example throws an error:
https://playground.vrl.dev/?state=eyJwcm9ncmFtIjoiLiA9IHt9XG4ubmV3LnN1YmZpZWxkID0gXCJ0ZXN0XCIiLCJldmVudCI6eyJtZXNzYWdlIjoiSGVsbG8gVlJMIiwiZm9vIjoiZGVsZXRlIG1lIiwiaHR0cF9zdGF0dXMiOiIyMDAifSwiaXNfanNvbmwiOmZhbHNlLCJlcnJvciI6bnVsbH0%3D

but if the first line is commented out - it works...:
https://playground.vrl.dev/?state=eyJwcm9ncmFtIjoiIy4gPSB7fVxuLm5ldy5zdWJmaWVsZCA9IFwidGVzdFwiIiwiZXZlbnQiOnsibWVzc2FnZSI6IkhlbGxvIFZSTCIsImZvbyI6ImRlbGV0ZSBtZSIsImh0dHBfc3RhdHVzIjoiMjAwIn0sImlzX2pzb25sIjpmYWxzZSwiZXJyb3IiOm51bGx9

@pront
Copy link
Member

pront commented Oct 19, 2023

This is working as designed.

The intention is to raise an error when the compiler can determine the exact fields. When you assign . = { "message": .message }, the compiler knows exactly what fields it has; in this example it only has a message field. So when you attempt to add a .new_field.new_subfield the compiler knows that a new_field object doesn't exist. Compare this with a modified example.

The main motivation behind this choice is to avoid accidental data overwrite. Example:

o = {}
o.a = "important data"
o.a.b = 1 # attempt to overwrite

I understand it can be a bit confusing and we should probably improve the error message.

cc @fuchsnj (since he's the expert here)

@kkornienko-aparavi
Copy link

@pront , thanks a lot for such a detailed explanation! Now it's crystal clear for me.
It'd be very helpful for all users of VRL, I suppose.

My guess was that {} and . are objects of different types/classes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: enhancement A value-adding code change that enhances its existing functionality vrl: stdlib Changes to the standard library
Projects
None yet
Development

No branches or pull requests

4 participants