-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add "data" asset type #37
Comments
The description is too vague. Could you please add more details and a use case for such assets? |
We currently have 3 proposed use cases:
The issue will be filled with more details as soon as we pick it up. At the moment it's more a placeholder. |
Adding the details and the requirements at least from my personal perspective when it comes to allowing modules to include enrichment policies and the data to be used for the enrichment indices. First let's describe exactly what the use case is and why it can make a big impact: Current behaviour: This creates certain issues and restrictions with how much we can actually enrich incoming events. This is because if the logic is simple, we would include a simple "if" condition to a pipeline and if its complex we would use a script processor. These requires script compilations quite often, the scripts to be cached and has a fair impact on performance in smaller scenarios and larger impact in bigger implementations. Example condition:
Example Script(Line 503 to 926 could be removed): Now if I was a regular user, that parses, map and normalise the events myself, using an enrichment index makes much more sense, it removes all logic needed and drills it down to just(just an example):
Impact with enrichment index support in modules:
How would an implementation look like? Installation Package update Package deletion Hopefully this makes sense, and looking forward to hear any thoughts! |
Use caseIn AWS cloudtrail we have the situation where one of the log fields a cloudtrail eventName (which gets mapped to event.action) of This pattern has resulted in the following script processor in the - script:
lang: painless
ignore_failure: true
params:
AddUserToGroup:
category:
- iam
kind: event
type:
- group
- change
.
.
.
UpdateUser:
category:
- iam
kind: event
type:
- user
- change
source: >-
def hm = new HashMap(params.get(ctx.event.action));
hm.forEach((k, v) -> ctx.event[k] = v); AWS cloudtrail has thousands of eventNames, it would be nice to replace pipeline - enrich:
policy_name: cloudtrail-eventname-policy
field: event.action
target_field: event
max_matches: 1 policy - match:
indices: cloudtrail-eventname
match_field: action,
enrich_fields:
- category
- kind
- type doc action: UpdateUser
category:
- iam
kind: event
type:
- user
- change |
Hope this is the right place to chime in with a couple questions and ideas.
Other idea's I've had for data ingestion:
That link lets you use Google BigQuery to explore any of those datasets. It'd be super cool if we had a public hosted Kibana/Elasticsearch instance that let users play around with these datasets. I guess my question is, is the "ingest data manager" project headed towards something like that? Or if we ever wanted to have one click installs, should they be separate Kibana plugins exposed in a marketplace? |
Thanks everyone for chiming in. This is VERY useful. We will be looking into add data in a few weeks as focus is on 7.9 at the moment. @stacey-gammon For your first use case: Yes. For the third-party API: Did not think of this use case yet. Could be interesting, but would it also mean shipping code as part of the package? |
I finally got back to this issue and I have a few follow up questions:
@leehinman Is your use case also covered by what @P1llus describes above?
|
@ycombinator For awareness, that data + enrich are on the radar and potentially will need to be added to package-spec if we move forward. |
@ruflin Upgrade path: Everything is wiped on upgrade if I understand this correct Will multiple packages use the same enrichment data? Would it be a problem if it is per package and perhaps some of it is duplicated? This removes many version challenges. We need support for enrichment policies in any case. Is the enrichment policy per dataset or global per package? Can you dig into a bit more detail around this temp index and why it is removed again afterwards? When you want to create a enrichment index and policy this would be the steps:
Hopefully this makes sense, and as you might notice, after we have executed step 4-5, we will never have any need for the initial "temp" index so to speak, as we won't be updating it anytime soon, and it would just stay there on the cluster until it would have to be recreated during an upgrade anyway. That also makes it easier since we don't need to mess with things like ILM for these. |
I think it should be named in a similar manner to the ingest pipeline
Yes
I think it is OK to start with enrichment data & policies per
Yes |
I'm strongly in favor of having everything per dataset as I think it simplifies things. As we currently also discuss some other assets to be added to the package, I started a checklist on what needs to be done / discussed: #27 @leehinman @P1llus To keep this moving, I'm wondering if you would have time / possibility to move this forward? My proposal would be similar to elastic/kibana#75153 (comment) |
I moved this to the package-spect repo as I think it is now more fitting here. This should not change the conversation. |
Is there anything more you would need from me @ruflin ? I am happy with the current comments from the others and the current state as long as we all agree on this topic. I didn't notice the notification since when you moved the issue to another repo it disappears from all notifications for me. |
@P1llus @leehinman as @ruflin will you be driving that on your side? |
Happy to coordinate all the efforts but would be great if @P1llus @leehinman could find out who does especially the part on the Kibana side. |
Here's another use case for having the ability to have data as part of a package. I would like to be able to create a documentation/tutorial package for ECS. This package could contain assets that are meant to explore what's in ECS. One part of this would be an index that contains all of the ECS fields, their definitions, datatype and other details. This way users could explore ECS or answer questions such as:
This could also come with a dashboard to get started exploring. This would essentially be a more polished version of what I showed in last holiday season's ECS Advent blog post :-) The "data" part of the package would simply be all ECS field definitions of a given ECS release. |
The training team is really interested in this effort. It is currently cumbersome to add a "demo" dataset for users to play with. It would be great if we could generate packages that anyone can consume to play and learn new features. |
Add support to the package to add data.
It needs to be defined which format this data is stored in the package.
The text was updated successfully, but these errors were encountered: