Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Threat Enrichment - Stage 1 #1400

Merged
merged 17 commits into from
Jun 10, 2021

Conversation

rylnd
Copy link
Contributor

@rylnd rylnd commented May 6, 2021

Much of this was taken from what was deleted from #1293 and is in various stages of completion. Will annotate and iterate accordingly.

RFC Preview

Notes

I've stuck with threat.enrichments for this field name, as it is agnostic of any "matching" verbiage that may not make sense universally, but that brings up its own question (see below). Of note, other potential names that were discussed:

  • threat.matches
  • threat.indicators
  • threat.indicator_matches

Outstanding Questions

  • Is matched.type sufficient to cover all enrichment cases?
    • matched.type: "manual" or matched.type: "analyst" would cover the case where an analyst manually finds a matching indicator
    • Moving forward as is; see summary below
    • Would it ever be valid to enrich an event without a field/value to reference? E.g. if an analyst suspects a potential correlation?
  • How do I indicate that the threat.indicator fields are nested (as a nested array) under threat.enrichments?

rylnd added 2 commits May 6, 2021 17:41
Much of this was taken from what was deleted from elastic#1293 and is in
various stages of completion. Will annotate and iterate on the PR.
@rylnd rylnd added the RFC label May 6, 2021
particular enrichment. If multiple matches for this indicator object, this could
be a list */
"matched": {
"atomic": "0c415dd718e3b3728707d579cf8214f54c2942e964975a5f925e0b82fea644b4",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced that matched.atomic and matched.field are sufficient to cover more sophisticated matching mechanisms than an exact match: if e.g. a user wanted to write an indicator match rule with a partial match, or if there were a more sophisticated indicator that itself represented a wildcard/regex value, then the value of the indicator would not be identical to the value from the event and so we may want two fields here.

Similarly, we would not be able to reproduce the exact conditions of the match with only one field value. While I've been told that the indicator field being matched upon should be self-evident, it seems safer to explicitly state it in another field, and I wanted to bring it up one more time before I shut up about it 😉 .

In general, I'm viewing these matched.* fields as the answer to both HOW and WHY a given event was enriched, so keep that in mind and/or correct me on that thinking.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a user wanted to write an indicator match rule with a partial match, or if there were a more sophisticated indicator that itself represented a wildcard/regex value, then the value of the indicator would not be identical to the value from the event

How are you envisioning this? Are you thinking that there is a mechanical way needed to match like fields. I can think of a use-case, DGAs.

So, if you know an aggressor uses a specific domain structure for C2 (like abc123\.12345abcdef\.xyz), you'd want to be able to match if they use

abc123\.12345abcdef\.xyz
plo958\.59874qwersd\.xyz
lje456\.01258iekduh\.xyz

So, you'd need a way to match url.full:/[aA-zZ]{3}[0-9]{3}\.[0-9]{5}[aA-zZ]{6}\.xyz MATCHES threat.indicator.url.full?

Or am I off?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@peasead right, the idea is that with anything other than an exact match, the values for the LHS and RHS of the match are going to be different, and we don't have fields to account for both of those right now.

IPs is another example, where one could specify a CIDR block instead of a single IP address.

What is SOP for DGAs within threat intel, currently? Is there any attempt to generalize the pattern/algorithm generating those values?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not saying it's a must now.

There are some experimental machine learning jobs for DGAs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can continue with the implicit "exact value match" semantics for this RFC. If in the future we need to support the aforementioned functionality, I think that we can do so with the addition of two new fields as discussed.

}
```

### Proposed enrichment pipeline mechanics pseudocode
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dcode @peasead I know this section is out of date, but I had some trouble grokking and could use some help on this one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dcode, @rylnd and I chatted about this, but may still need some specifics if you could provide those.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rylnd did you get the information you needed on this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@peasead I think I finally grokked and updated that section appropriately. If you and @dcode can verify that what I wrote still makes sense, lemme know 👍

@ebeahan ebeahan requested review from dcode, peasead and ebeahan May 13, 2021 13:45
Copy link
Contributor

@peasead peasead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Excited to see this taking shape.

Copy link
Member

@ebeahan ebeahan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for continuing to move this forward, @rylnd!

Do we have a sponsor for this RFC yet?

rfcs/text/0021-threat-enrichment.md Outdated Show resolved Hide resolved
rfcs/text/0021-threat-enrichment.md Outdated Show resolved Hide resolved
rfcs/text/0021-threat-enrichment.md Outdated Show resolved Hide resolved
Co-authored-by: Eric Beahan <[email protected]>
rylnd added 4 commits May 28, 2021 16:57
* Removes unnecessary field renames, as fields no longer conflict
* Adds a clause for setting a default array value for
  `threat.enrichments`
This is in fact redundant, but still useful.
* master:
  Stage 2 changes for RFC 0018 - extending the `threat.*` field set (elastic#1438)
  Remove deprecated `host.user.*` fields (elastic#1439)
  Explicitly include user identifiers in `related.user` field description (elastic#1420)
  Set the merge date on RFC 0018 stage 2 (elastic#1429)
  [RFC] Extend Threat Fieldset - Stage 2 Proposal (elastic#1395)
  [Tooling] Add --exclude flag to Generator to support field removal testing (elastic#1411)
  Add `host.user.*` deprecation notice in field reuse description (elastic#1422)
  Stage 2 changes for RFC 0015 - `elf` header (elastic#1410)
  Stage 3 changes for RFC 0012 - `orchestrator` field set (elastic#1417)
  Support `match_only_text` in Go code generator (elastic#1418)
  Stage 3 Orchestrator RFC (elastic#1343)
  moving into folder (elastic#1416)
  removing use-cases (elastic#1405)
  removing --oss (elastic#1404)
  Set the merge date on RFC 0015 stage 2 (elastic#1409)
  Consolidate `Breaking changes` sections in `CHANGELOG.next` (elastic#1408)
  RFC-Stage-0: Proposal to add a "ticket" schema / field definition to ECS (elastic#1383)
  [RFC] `match_only_text` type migration - Stage 0 (elastic#1396)
  Client port is wrongly documented (elastic#1402) (elastic#1406)
@rylnd
Copy link
Contributor Author

rylnd commented May 28, 2021

Updated all outstanding threads; I think this is ready for review!

@devonakerr has agreed to be sponsor on this one, as well.

@rylnd rylnd self-assigned this May 28, 2021
Copy link
Member

@ebeahan ebeahan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks, @rylnd, for the iterations here!

Can we capture @devonakerr as the sponsor under the People section in the document?

@ebeahan ebeahan requested a review from devonakerr June 2, 2021 18:08
"event": {
"provider": "Abuse.ch",
"dataset": "threatintel.abusemalware",
"module": "threatintel"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: add event.reference here as well

Copy link

@devonakerr devonakerr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tidy PR, Ryland - LGTM

@ebeahan
Copy link
Member

ebeahan commented Jun 9, 2021

@rylnd Am I good to set the date and merge this PR?

You had this note; I wasn't sure if you wanted to make those changes here or in a later stage PR.

@rylnd
Copy link
Contributor Author

rylnd commented Jun 9, 2021

@ebeahan I'll make that change right now, and then we should be good to merge here 👍

@rylnd
Copy link
Contributor Author

rylnd commented Jun 9, 2021

@ebeahan all good on my end 👍

@ebeahan ebeahan merged commit 676a9fe into elastic:master Jun 10, 2021
@rylnd rylnd mentioned this pull request Jun 15, 2021
2 tasks
@rylnd rylnd deleted the threat-enrichment-stage-1 branch June 22, 2021 15:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants