Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add multi property to streams #110

Merged
merged 3 commits into from
Mar 2, 2021
Merged

Conversation

leehinman
Copy link
Contributor

@leehinman leehinman commented Jan 19, 2021

What does this PR do?

Adds multi property to streams.

Why is it important?

Useful when input for stream may be defined multiple times with only a
few parameters changing. For example queries for osquery, or event
providers for winlog.

Checklist

Related issues

@elasticmachine
Copy link

elasticmachine commented Jan 19, 2021

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Build Cause: Pull request #110 updated

  • Start Time: 2021-03-02T22:35:41.126+0000

  • Duration: 2 min 7 sec

  • Commit: 02c4e3d

Trends 🧪

Image of Build Times

@ruflin
Copy link
Contributor

ruflin commented Jan 20, 2021

Based on #108 would be good to pull in some folks to get feedback on this:

@ruflin
Copy link
Contributor

ruflin commented Jan 20, 2021

I'm wondering if what we want to repeat is the stream or the full input? Is there a case that we have an input with multiple data streams and we want only one of the data streams to be repeated?

@mtojek mtojek requested review from mtojek and jen-huang January 20, 2021 09:00
Copy link
Contributor

@mtojek mtojek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see a bunch of use cases around database generic queries. Nice. I'm not convinced about the multi property name, but I can't propose anything better (collection? expandable?).

@leehinman could you please add this property to the test package (the good package)? (nevermind, I missed it's there)

Copy link
Contributor

@mtojek mtojek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing blocking from the package-registry perspective. I'll leave the final word to the Kibana's representive, @jen-huang .

description: Can stream be defined multiple times
type: boolean
default: false
examples:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: well, this is a boolean field. I don't think we need to give an example here :)

@jen-huang
Copy link
Contributor

I'm wondering if what we want to repeat is the stream or the full input? Is there a case that we have an input with multiple data streams and we want only one of the data streams to be repeated?

++ to @ruflin's question. We discussed this on the Kibana issue back in November: example of only stream repeating, example of full input repeating.

Which do you expect Kibana to generate?

@ycombinator
Copy link
Contributor

@ruflin @jen-huang, just going through PRs today after being on PTO for a week. Seems like this PR is currently blocked on this question:

I'm wondering if what we want to repeat is the stream or the full input? Is there a case that we have an input with multiple data streams and we want only one of the data streams to be repeated?

Looking at the two examples that @jen-huang linked to in her comment, @ruflin can you explain why repeating the full input in the agent config would be better than repeating the data stream? I think understanding the pros/cons of each approach might help move this discussion forward to a resolution.

@ruflin
Copy link
Contributor

ruflin commented Jan 26, 2021

For the input vs stream part: I don't think one is better than the other, it is more the question on what is trying to be solved here. Looking at elastic/kibana#83141 There are 2 checkboxes, one for the input and one for the stream. But as there is only 1 stream, the two should probably be merged (other issue). If both are merged, the only thing that is left is the input.

Where the two differ is on shared config: Repeating the stream would allow to reuse the input config for multiple streams. Repeating the input allows to have different input config for each entry.

In the case of osquery, it seems nothing is shared. So the outcome would be the same. Is osquery really using a single data stream for all the data? Or is it more similar to apm and the input configs could even be global.

What would be clarifying I think would be seeing a mockup on how it should look in the UI and decide if input configs should be shared or not.

@jamiehynds
Copy link

Is osquery really using a single data stream for all the data? Or is it more similar to apm and the input configs could even be global.

@james-elastic could you confirm which approach the new osquery integration has taken?

@aleksmaus
Copy link
Member

@jamiehynds as of now it's a single output datastream.
The results of all the queries that osquery executes end up in the same ES datastream.
Each query was configured as input, and we wanted to be able to run multiple queries at different time intervals. That's why we are asking to be able to configure multiple. If there is a better way to accomplish that, let us know.

@mtojek
Copy link
Contributor

mtojek commented Feb 16, 2021

@ruflin @leehinman is this PR still valid?

@ruflin
Copy link
Contributor

ruflin commented Feb 16, 2021

I think it is still valid but would be good to get clarification on #110 (comment) from @leehinman

@leehinman
Copy link
Contributor Author

For the input vs stream part: I don't think one is better than the other, it is more the question on what is trying to be solved here. Looking at elastic/kibana#83141 There are 2 checkboxes, one for the input and one for the stream. But as there is only 1 stream, the two should probably be merged (other issue). If both are merged, the only thing that is left is the input.

Where the two differ is on shared config: Repeating the stream would allow to reuse the input config for multiple streams. Repeating the input allows to have different input config for each entry.

In the case of osquery, it seems nothing is shared. So the outcome would be the same. Is osquery really using a single data stream for all the data? Or is it more similar to apm and the input configs could even be global.

What would be clarifying I think would be seeing a mockup on how it should look in the UI and decide if input configs should be shared or not.

For Windows Event logs I think repeating the input is a better fit, this is to support Custom Windows Event Logs, so each entry might need access to every option.

@jen-huang
Copy link
Contributor

For Windows Event logs I think repeating the input is a better fit, this is to support Custom Windows Event Logs, so each entry might need access to every option.

It sounds like we agree on repeating the full input for this use case. With that in mind, might it be better to add multi: true to the package's policy_templates.inputs on the main manifest instead of at the stream level? I think this is a more accurate representation of what we want to achieve in the UI. If we find a case in the future where we want repeatable streams, in addition to repeatable inputs, then we can add the stream-level flag at that point.

What do you think @mtojek @ruflin?

@ruflin
Copy link
Contributor

ruflin commented Feb 18, 2021

@jen-huang Agree

Copy link
Contributor

@jen-huang jen-huang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After understanding what we want the resulting agent yaml to look like, I'd like to see this property moved to the policy_templates.inputs level on the main package manifest instead of at the stream level. This makes more sense from a Kibana perspective too.

@leehinman
Copy link
Contributor Author

After understanding what we want the resulting agent yaml to look like, I'd like to see this property moved to the policy_templates.inputs level on the main package manifest instead of at the stream level. This makes more sense from a Kibana perspective too.

Made the change and rebased to master. Let me know if we need to change anything else. Thanks.

@ruflin
Copy link
Contributor

ruflin commented Feb 24, 2021

Thanks @leehinman Would be good for @jen-huang to have a quick look at it when she is back.

@mtojek @sorantis Will this work as expected with thew new introduce of multiple policy_templates? I assume yes but want to make sure I don't miss something.

Copy link
Contributor

@jen-huang jen-huang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM from Kibana side. Thanks for the discourse and making the changes!

@mtojek
Copy link
Contributor

mtojek commented Mar 2, 2021

Please make sure to correct all conflicted files and merge the PR if only the status is green.

Useful when input for stream may be defined multiple times with only a
few parameters changing.  For example queries for osquery, or event
providers for winlog.
@leehinman leehinman merged commit 714ec42 into elastic:master Mar 2, 2021
@leehinman leehinman deleted the multi-streams branch March 2, 2021 22:42
rw-access pushed a commit to rw-access/package-spec that referenced this pull request Mar 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants