Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expand observables data types, add Name & ID pairs #960

Closed
jonrau-at-queryai opened this issue Feb 7, 2024 · 8 comments
Closed

Expand observables data types, add Name & ID pairs #960

jonrau-at-queryai opened this issue Feb 7, 2024 · 8 comments
Assignees

Comments

@jonrau-at-queryai
Copy link
Contributor

Currently the scalar values represented in observables.type_id have several "ID" types and several "Name" types without their pair being added which may matter to a source system. Additionally, there are data types defined in dictionary.json that do not have an Observable but should, port_t and subnet_t.

My proposal is as follows (PR to come from @query-jeremy or myself)

  1. Add type_id = 11 to port_t and add type_id = 12 to subnet

  2. Create the following data types and type_id pairs, in [brackets] will be the object attributes where the data type would be updated.

    • user_id_t: type_id = 13 [user.uid, user.alt_uid] - this is to match against Username for UPNs, ARNs, and other GUIDs for users in identity/directories
    • group_name_t: type_id = 14 [group.name] - net-new type to denote a variety of "groups" be it IAM, network security or hierarchy. There is an argument to be made to include organization.name and organization.ou_name as well
    • group_id_t: type_id = 15 [group.id] - pair to group name with an argument to add organization.id and organization.ou_id
    • vulnerability_id: type_id = 16 [cve.uid, cwe.uid] - could make the argument to also map to vulnerability.title as a quick reference to any identifier of a form of a vulnerability, weakness or bug such as CVE, CWE, GHSA, etc.
    • process_id_t: type_id = 17 [process.pid, process.tid,process.uid,process.parent_process.pid, process.parent_process.tid,process.parent_process.uid] - a pair to process name that also accounts for the various identifiers in the process object
    • resource_name_t: type_id = 18 [resource.name, device.name, endpoint.name] - a pair to Resource ID, any name, label or value from a "name" tag on a resource, computer, endpoint, etc.
    • user_agent_t type_id = 19 [http_request.user_agent] - used as an indicator semi-frequently and adds to the depth of network-related observables
  3. I also noticed that Observables Description and type_id change  #891 there was a mention to change some of the values there into scalars such as Location, Registry Key, Registry Value and Container that we could take up to add as scalar types directly after type_id = 30

@jonrau-at-queryai jonrau-at-queryai changed the title Expand observables data types, add Name & ID pairs (and vice-versa) Expand observables data types, add Name & ID pairs Feb 7, 2024
@pagbabian-splunk
Copy link
Contributor

Good suggestions / issue, I will add to this week's call agenda (2/13/24).

@query-jeremy
Copy link
Contributor

My notes from this topic on today's call:

  • The concept is good. Many contributors desire more observable types.
  • The interest is in scalar observables. Nobody present wanted to defend pointers/objects as observables.
  • There's concern that type names ending in _id_t may be misleading because _id is used to denote enums everywhere else. Consider renaming the types.
  • There's concern that changing the type of specific properties (see the PR) will be a breaking change. Consider deferring the property changes to a later release or adding a new way to identify observables, perhaps by allowing an observable property of attributes.

There will be a follow up discussion at 10a PT tomorrow, Feb 14th, to discuss how to proceed.

@jonrau-at-queryai
Copy link
Contributor Author

Based on notes from today's call (14 FEB 24 @ network activity) will modify #961 and spin off a new PR to add observable IDs to port_t and subnet_t

@rmouritzen-splunk
Copy link
Contributor

Two things.

First, the OCSF server needs to be updated to support this new observable capability. I can start that work once we settle on the details.

Second, we need to support observable values other than strings somehow. The port_t is a number, and so far all observable values are of type string -- the observable object's value field has type string_t.

Here are some possible paths forward:

Option 1: In the observable, the port_t value is converted to a string.

Option 2 (much harder): Come up with a way to represent non-string values in the observable object. Here are some variations we might consider:

  • Option 2a: Perhaps the most workable solution here would be adding new value fields, like value_int and value_long -- knowing which to use depends on the underlying type implied by the type_id field.
  • Option 2b: The other alternative would be changing the definition of value to json_t, where json_t means "anything". This, however, gets in to the current type confusion around json_t, which is not yet settled.

@query-jeremy
Copy link
Contributor

  • Option 1 seems like a great "right now" solution. It's useful to treat port as an int for range searches but treating it as a string doesn't seem like too great a compromise.
  • Option 2a might complicate the use of observables. They're meant to simplify things, so 2a doesn't seem great. But it will work with more type systems than 2b and provide stronger typing than 1.
  • Option 2b might be the best long-term approach.

A third option is to add union types, e.g. "type": ["str_t", "int_t"], but many data platforms won't support it and it adds complication.

@rmouritzen-splunk
Copy link
Contributor

That kind of more explicit union type is interesting, though adding a "tag" to indicate the actual type is probably helpful -- especially if this helps non-JSON encodings. https://en.wikipedia.org/wiki/Tagged_union

Paul and I had a chat about this, and we decided to start with converting non-string attributes to string.

Let us know if you actually do need the observable value to be of the original type. Specifically, are you OK with port_id becoming a string in its observable, or should it remain an integer_t?

@query-jeremy
Copy link
Contributor

Converting port to strings in observables satisfies Query's need.

@rmouritzen-splunk
Copy link
Contributor

@query-jeremy : This should be solved now after the recent PR merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants