Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New shape transform #750

Closed
binarylogic opened this issue Aug 13, 2019 · 13 comments
Closed

New shape transform #750

binarylogic opened this issue Aug 13, 2019 · 13 comments
Assignees
Labels
meta: blocked Anything that is blocked to the point where it cannot be worked on. needs: approval Needs review & approval before work can begin. needs: requirements Needs a a list of requirements before work can be begin type: feature A value-adding code addition that introduce new functionality.

Comments

@binarylogic
Copy link
Contributor

binarylogic commented Aug 13, 2019

We need a better way to add, remove, rename, and possibly coerce fields in one shot. If simple enough, this could replace the add_fields, remove_fields, and coerce transform.

I'm open to better naming, such as simply transform.

Ref #377

@binarylogic binarylogic added Type: New Feature needs: approval Needs review & approval before work can begin. labels Aug 13, 2019
@binarylogic binarylogic added this to the Improved support for schemas milestone Sep 7, 2019
@xcaptain
Copy link

xcaptain commented Sep 16, 2019

Would be awesome if this new transform support string split, like the example below

from

{
    "args": "a=1&b=2&c"
}

to

{
    "args_a": "1",
    "args_b": "2",
    "args_c": ""

nginx produces this kind of log http://nginx.org/en/docs/http/ngx_http_core_module.html#var_args

My current implementation is using lua, but it's not elegant

[transforms.parse_args]
  type = "lua"
  inputs = ["datasource"]
  source = """
function Split(szFullString, szSeparator)
local nFindStartIndex = 1
local nSplitIndex = 1
local nSplitArray = {}
while true do
   local nFindLastIndex = string.find(szFullString, szSeparator, nFindStartIndex)
   if not nFindLastIndex then
    nSplitArray[nSplitIndex] = string.sub(szFullString, nFindStartIndex, string.len(szFullString))
    break
   end
   nSplitArray[nSplitIndex] = string.sub(szFullString, nFindStartIndex, nFindLastIndex - 1)
   nFindStartIndex = nFindLastIndex + string.len(szSeparator)
   nSplitIndex = nSplitIndex + 1
end
return nSplitArray
end

local args = Split(event["args"], "&")
for i, arg in ipairs(args) do
   local arg_d = Split(arg, "=")
   if arg_d[1] and arg_d[2] then
      local key = string.format("args_%s", arg_d[1], arg_d[2])
      event[key] = arg_d[2]
   elseif arg_d[1] then
      local key = string.format("args_%s", arg_d[1])
      event[key] = ""
   end
end
"""

@Jeffail
Copy link
Contributor

Jeffail commented Dec 16, 2019

It would also be cool to add some sort of expand operator to this, so that (once our internal data is structured) we can expand an event into N events by selecting an array field (or multiple). That would allow us to turn an event such as:

{
  "id": "whatever",
  "foo": [1,2]
}

Into two events:

{ "id": "whatever", "foo": 1 }
{ "id": "whatever", "foo": 2 }

@binarylogic binarylogic added the needs: requirements Needs a a list of requirements before work can be begin label Dec 20, 2019
@binarylogic
Copy link
Contributor Author

I like the "expand" idea. I added the needs: spec label to represent the high-level work around transforming events. We need to think about the experience holistically and determine if all of these transformations are best handled with individual transforms, a single transform, a programmable transform, or all of them 😄 .

@binarylogic
Copy link
Contributor Author

Just adding a couple of libraries here for inspiration:

@binarylogic
Copy link
Contributor Author

To clarify, we probably shouldn't use these libraries, but we can expose a similar interface:

[transforms.change_it_up]
  type = "shape"

  changes = [
    {add.new_field = "my value"},
    {add.parent.child = "this is a nested field value"},
    {coerce.another_field = "int"},
    {rename.old_field = "new_name"},
    {remove.bad_field = true}
  ]

This is a very rough example, but you get the idea. The awkwardness with the above syntax is removing fields, but you get the idea.

@binarylogic
Copy link
Contributor Author

Finally, another interesting source of inspiration is the jq library and it's interface:

jq '[.[] | {message: .commit.message, name: .commit.committer.name, parents: [.parents[].html_url]}]'

It's proven to be a powerful succinct syntax that engineers already understand. I'm not proposing that this interface would power the generic shape transform, but it is interesting as an alternative jq transform that exposes the same syntax.

@anton-ryzhov
Copy link
Contributor

We need a better way to add, remove, rename

Definitely! Also don't forget fields coping.

Parsing query strings for me sounds like a task to a separate transform.

Even with add+remove+copy+rename you need to define some strict rules to not confuse users.
Like field couldn't be renamed foo=>bar and then foo deleted and vice versa, foo copied to bar and then bar added one more time.

@binarylogic
Copy link
Contributor Author

Agree! And thanks for the feedback. @Hoverbear is working all of this right now. She's putting together a rough guide on schema management that will summarize the work we do here. Would love to get your thoughts when that is done.

@Hoverbear
Copy link
Contributor

@anton-ryzhov I agree about query strings!

If we do end up combing add, rename etc it'll probably be as a rework or reshape that takes a sequence of steps, so no worry about user confusion around order of operations. :)

@Hoverbear
Copy link
Contributor

@binarylogic Bumblebee looks pretty compelling, it's just a sequence of steps. I worry Proteus feels very custom with that DSL.

@Hoverbear
Copy link
Contributor

Hoverbear commented Feb 8, 2020

@binarylogic That example in #750 (comment) is unfortunately not valid toml. :(

Here's the closest I could come up with that is valid and clean:

[transforms.change_it_up]
  type = "shape"
  inputs = ["my-source-id"]
  [[transforms.change_it_up.changes]]
  add = "new_field"
  value = "my value" # All value fields below allow templating.
  [[transforms.change_it_up.changes]]
  create = "parent.child" # Users can escape with \. for period-containing fields.
  value = "this is a nested value"
  [[transforms.change_it_up.changes]] # Consider leaving this as a separate transform.
  rename = "old_field"
  to = "new_name"
  [[transforms.change_it_up.changes]]
  remove = "bad_field" # Optionally an array

@binarylogic
Copy link
Contributor Author

binarylogic commented Feb 8, 2020

Say what? Tom... Before we decide on this change, we probably need to clarify #1731. Because I'm wondering if a higher-level solution like the pipelines syntax in #1679 or the compose transform proposed in #1653 would solve this better. Your example above is very similar to the compose transform, but less powerful.

We'll make a decision on Monday what to do here.

@binarylogic binarylogic added the meta: blocked Anything that is blocked to the point where it cannot be worked on. label Feb 8, 2020
@Hoverbear
Copy link
Contributor

We've chosen a general approach discussed in #1653, since this would be just a add/remove/rename inside of a compose. :)

This is being closed and please direct your attentions to #1653.

@binarylogic binarylogic added type: feature A value-adding code addition that introduce new functionality. and removed type: new feature labels Jun 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
meta: blocked Anything that is blocked to the point where it cannot be worked on. needs: approval Needs review & approval before work can begin. needs: requirements Needs a a list of requirements before work can be begin type: feature A value-adding code addition that introduce new functionality.
Projects
None yet
Development

No branches or pull requests

5 participants