Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Stage 0: Introduce Entity Field Set into ECS #2434

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

tinnytintin10
Copy link

Overview

An entity represents a discrete, identifiable component within an IT environment that can be described by a set of attributes and maintains its identity over time. Entities can be physical (like hosts or devices), logical (like containers or processes), or abstract (like applications or services).

Currently, ECS provides specific field sets for certain categories of entities (e.g., host, user, cloud, orchestrator) to capture their metadata. However, as IT infrastructure continues to evolve, we encounter an increasing number of entity types that don't cleanly fit into existing field sets – for example, storage services like S3, database instances like DynamoDB, or various other cloud services and IT-related infrastructure components (both digital and physical).

This RFC proposes a new entity fieldset that aims to solve this and several other challenges. Currently at Stage 0 (strawperson), seeking initial feedback on the approach and concept. See /rfcs/text/0049-entity-fields.md for more details.

PR Guidelines

  • Have you signed the contributor license agreement? ✅
  • Have you followed the contributor guidelines? ✅
  • For proposing substantial changes or additions to the schema, have you reviewed the [RFC process] (https://github.com/elastic/ecs/blob/main/rfcs/README.md)? ✅
  • If submitting code/script changes, have you verified all tests pass locally using make test? N/A
  • If submitting schema/fields updates, have you generated new artifacts by running make and committed those changes? N/A
  • Is your pull request against main? Unless there is a good reason otherwise, we prefer pull requests against main and will backport as needed. ✅
  • Have you added an entry to the CHANGELOG.next.md? N/A

@tinnytintin10
Copy link
Author

@MikePaquette @YulNaumenko, I've drafted the RFC to introduce the entity field set into ECS like we talked about. Before taking it out of draft, I wanted to check with you both to see if there's anything you think should be included or addressed as part of this stage. Lmk 🙏🏾

| entity.name | keyword, text | The human-readable name of the entity. The keyword field enables exact matches for filtering and aggregations, while the text field enables full-text search. For entities with dedicated field sets (e.g., `host`), this field should mirrors the corresponding *.name value. |
| entity.address | keyword | A URI, URL, or other direct reference to access or locate the entity in its source system. This could be an API endpoint, web console URL, or other addressable location. Format may vary by entity type and source system. |
| entity.Attributes.* | object | Entity type-specific attributes using capitalized field names to indicate custom field space. The capital `A` in "Attributes" and the capitalization of all subfields (e.g., `entity.Attributes.StorageClass`, `entity.Attributes.EngineVersion`) distinguishes these as custom entity-type-specific fields that won't be enumerated in the ECS schema. |
| entity.metadata.* | flattened | A flexible container for entity metadata that doesn't fit into other structured fields. This field uses the flattened type to allow arbitrary key-value pairs while maintaining searchability. Useful for provider-specific or non-standardized attributes that don't warrant dedicated fields. |

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tinnytintin10 would we still need it?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@oren-zohar Can you elaborate? Are you referring to entity.metadat (i.e, do we still need metadata when we have attribute)? Also, see the updated name and description (now entity.raw instead of metadata)


| Field | Type | Description |
|-------|------|-------------|
| entity.id | keyword | A unique identifier for the entity. This should be a stable, unique value that persists across different observations of the same entity. For entities with dedicated field sets (e.g., host.id, user.id), this value should match the corresponding *.id field. |
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to solve the problem of different id/name sources for entities? eg. EC2 instance id vs arn?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the entity.id field description to provide guidance for choosing between multiple identifiers. To be transparent - there will always be some ambiguity in this selection that we can't fully resolve yet as it depends on specific use cases and contexts. However, we can provide these basic criteria for selecting the primary identifier:

  1. Persists across the entity's lifecycle
  2. Ensures uniqueness within its scope
  3. Is commonly used for queries and correlation
  4. Is readily available in most observations (events/logs, etc.,)

Alternative identifiers are preserved in entity.raw. Wdyt?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I don't have a good way to solve it tbh. What I can only think of is another field, smth like id type which could be smth like arn or instance-id but I'm not sure it's a good idea tbh

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue with ids is something beyond the scope of this RFC, in my opinion. More important than knowing what type of id is being persisted, is having a consistent id of that entity when ingested from different sources. I don't think adding a id type field would ensure consistency, it only gives a context in which I'm not sure it's needed.

We need consistent ids and entity resolution when needed. Neither of these solutions touch the RFC right now, as far as I understand.

|-------|------|-------------|
| entity.id | keyword | A unique identifier for the entity. This should be a stable, unique value that persists across different observations of the same entity. For entities with dedicated field sets (e.g., host.id, user.id), this value should match the corresponding *.id field. |
| entity.source | keyword | The module or integration that provided this entity data (similar to event.module). |
| entity.category | keyword | A standardized high-level classification of the entity type. This provides a normalized way to group similar entities across different providers or systems. Example values: `bucket`, `database`, `container`, `function`, `queue`, `host`, `user`, etc.,. There will be an allowed set of values maintained for this field to ensure consistency. |
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing category and type are chosen to be consistent with event.* , but should we be consistent here? I personally always found type and category confusing, especially when it comes to understanding what is higher in the hierarchy. Wouldn't it be simpler to have category and sub-category or smth like that?

Copy link

@JordanSh JordanSh Feb 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've had some offline discussions with tin on this subject, my suggestion is type as the higher classification and subtype (or anything else) for secondary. This will better align with Observability Inventory which uses entity.type to search, group and filter:

image

image

And will also align better with our codebase which already mentions and uses entity.type quite a bit. having two different fields will require us to "juggle" between them depending on the part of the code we are working in.

image

image

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the suggestion of using more explicitly hierarchical terms. Something like type/sub_type would make the relationship self-evident without needing to know any conventions.

However, I deliberately aligned with ECS's categorization hierarchy (where event.category is higher level than event.type) when coming up with this. I'd love to get input from the ECS team on whether maintaining this consistency is warranted here. While consistency with existing patterns has value, entities serve a different purpose and might benefit from more intuitive naming.

cc @MikePaquette

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There will be an allowed set of values maintained for this field to ensure consistency.

In that case, there will be an allowed_values property to entity.category, similarly as for the event.category? That would be nice, as it allows us to list the expected values and include a description for every category, such as the example below:

    - name: category
      allowed_values:
        - name: host
          description: >
            Entities in this category represent computing devices such as physical machines, virtual machines, or cloud instances. This includes hosts in on-premise data centers, cloud providers (AWS EC2, GCP Compute Engine, Azure VM), and edge devices.
Events in this category may relate to system health, performance metrics, security monitoring, and configuration changes.
        - name: user
          description: >
            This category represents human or service identities that interact with systems and resources. Users can be defined in directories such as Active Directory, IAM roles in cloud providers, or application-specific accounts.
Events in this category include authentication attempts, role changes, permissions updates, and identity-related security incidents.
...

Copy link

cla-checker-service bot commented Feb 24, 2025

❌ Author of the following commits did not sign a Contributor Agreement:
c20956e, 8d583c0, 347c3d3

Please, read and sign the above mentioned agreement if you want to contribute to this project

@tinnytintin10 tinnytintin10 marked this pull request as ready for review February 24, 2025 03:34
@tinnytintin10 tinnytintin10 requested a review from a team as a code owner February 24, 2025 03:34
@tinnytintin10
Copy link
Author

Reviewed this with @MikePaquette and are good to go for boarder reviews 🚀

cc @tehilashn @oren-zohar @YulNaumenko

|-------|------|-------------|
| entity.id | keyword | A unique identifier for the entity. This should be a stable, unique value that persists across different observations of the same entity. For entities with dedicated field sets (e.g., host.id, user.id), this value should match the corresponding *.id field. |
| entity.source | keyword | The module or integration that provided this entity data (similar to event.module). |
| entity.category | keyword | A standardized high-level classification of the entity type. This provides a normalized way to group similar entities across different providers or systems. Example values: `bucket`, `database`, `container`, `function`, `queue`, `host`, `user`, etc.,. There will be an allowed set of values maintained for this field to ensure consistency. |
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There will be an allowed set of values maintained for this field to ensure consistency.

In that case, there will be an allowed_values property to entity.category, similarly as for the event.category? That would be nice, as it allows us to list the expected values and include a description for every category, such as the example below:

    - name: category
      allowed_values:
        - name: host
          description: >
            Entities in this category represent computing devices such as physical machines, virtual machines, or cloud instances. This includes hosts in on-premise data centers, cloud providers (AWS EC2, GCP Compute Engine, Azure VM), and edge devices.
Events in this category may relate to system health, performance metrics, security monitoring, and configuration changes.
        - name: user
          description: >
            This category represents human or service identities that interact with systems and resources. Users can be defined in directories such as Active Directory, IAM roles in cloud providers, or application-specific accounts.
Events in this category include authentication attempts, role changes, permissions updates, and identity-related security incidents.
...


| Field | Type | Description |
|-------|------|-------------|
| entity.risk.* | * | Fields for describing risk score and risk level of entities such as hosts and users. |
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should criticality be included in entity as well? i.e entity.criticality


This approach would allow ECS to accommodate new types of entities without requiring continuous schema expansion through new field sets, while maintaining a consistent structure for entity representation.

## Fields

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't tags and labels also be included in entity?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants