[Meta] Audit Logging #52125

jportner · 2019-12-03T21:16:09Z

Overview

The current state of audit logging in Kibana is not sufficient for many users' needs. Kibana outputs only a few types of events, without much detail, in the same transport as regular log messages. This can be improved in many ways.

Enhancements in scope:

More audit events and information regarding authentication -- e.g., log in and log out events
More audit events for accessing objects
Additional attributes for objects -- usernames, names, IPs, space IDs/names, object URLs, timestamps, authentication
Additional information to differentiate specific user sessions
Additional information to allow for correlation with Elasticsearch audit records
Ability to include/exclude certain events and attributes
Separate audit log transport with rotation capabilities
Fail-safe to stop the Kibana process if audit records cannot be written
Additional configuration to support all of the above enhancements

Current state vs. desired state...

Current state

Audit records in Kibana are displayed in plaintext like so:

  log   [23:26:50.059] [info][audit][saved_objects_authorization_success][security] jdoe authorized to get config
  log   [23:26:50.067] [info][audit][saved_objects_authorization_success][security] jdoe authorized to find index-pattern

If JSON output is enabled:

  {
    "type": "log",
    "@timestamp": "2020-02-18T14:58:44-05:00",
    "tags": [
      "info",
      "audit",
      "security",
      "saved_objects_authorization_success"
    ],
    "pid": 38933,
    "username": "jojo",
    "action": "get",
    "types": [
      "config"
    ],
    "args": {
      "type": "config",
      "id": "8.0.0",
      "options": {}
    },
    "eventType": "saved_objects_authorization_success",
    "message": "jojo authorized to get config"
  }
  {
    "type": "log",
    "@timestamp": "2020-02-18T14:58:44-05:00",
    "tags": [
      "info",
      "audit",
      "security",
      "saved_objects_authorization_success"
    ],
    "pid": 38933,
    "username": "jojo",
    "action": "find",
    "types": [
      "index-pattern"
    ],
    "args": {
      "options": {
        "perPage": 1,
        "page": 1,
        "type": [
          "index-pattern"
        ],
        "search": "*",
        "defaultSearchOperator": "OR",
        "searchFields": [
          "title"
        ],
        "fields": [
          "title"
        ]
      }
    },
    "eventType": "saved_objects_authorization_success",
    "message": "jojo authorized to find index-pattern"
  }

Future state

Audit records should be written in a standard format (ECS), should contain more information about the event that occurred and who originated the action, and fields should be configurable to include more or less information. Such an audit record would look something like this:

{
  "@timestamp": "2019-12-05T00:00:02.000Z",
  "event": {
    "action": "get config",
    "category": "saved_objects_authorization",
    "duration": 453,
    "end": "2019-12-05T00:00:02.453Z",
    "module": "security",
    "outcome": "success",
    "start": "2019-12-05T00:00:02.000Z"
  },
  "host": {
    "id": "5b2de169-2785-441b-ae8c-186a1936b17d",
    "ip": "34.56.78.90",
    "hostname": "hostname"
  },
  "http": {
    "request": {
      "body": {
        "bytes": 887,
        "content": "Hello world"
      },
      "bytes": 1437,
      "method": "get",
      "referrer": "https://blog.example.com/"
    }
  },
  "labels": {
    "spaceId": "default"
  },
  "source": {
    "address": "12.34.56.78",
    "ip": "12.34.56.78"
  },
  "url": {
    "domain": "www.elastic.co",
    "full": "https://www.elastic.co:443/search?q=elasticsearch",
    "path": "/search",
    "port": "443",
    "query": "q=elasticsearch",
    "scheme": "https"
  },
  "user": {
    "email": "[email protected]",
    "full_name": "John Doe",
    "hash": "D30A5F57532A603697CCBB51558FA02CCADD74A0C499FCF9D45B...",
    "sid": "2FBAF28F6427B1832F2924E4C22C66E85FE96AFBDC3541C659B67...",
    "name": "jdoe",
    "roles": [ "kibana_user" ]
  },
  "trace": {
    "id": "8a4f500d"
  }
}

Note: in the example above, the user.hash (a hash of the user.name field) would not be included by default; it would be an optional field that could be included if the user.name needed to be excluded for privacy reasons.

First Phase

Prerequisites (in progress):

Format audit records in JSON using the Elastic Common Schema (ECS) [logging] Use Elastic Common Schema (ECS) #52226
Modify Elasticsearch client to pass X-Opaque-Id header for unique events for correlation Pipe X-Opaque-Id header to AuditTrail logs and Elasticsearch API calls #62018
Collect audit logs for ES client Add generic AuditTrail service #60119
Implement server-side sessions Kibana Security to use Server Side Sessions #17870

Phase 1 implementation: #54836

Future Phase

Enriching events with session ID
Support for log rotation (prerequisite: [KP] Implement Log rotation appender #56291)
Additional attributes such as IP address (Add client IP address to audit records #127481) and user profile ID (Add user profile ID to audit log events #125932)
Fail-safe to stop Kibana process if audit records cannot be written Plugins can initiate Kibana graceful shutdown #60636
Additional transport options (human-readable message formatting, multiple appenders)
Support for including/excluding event attributes
Include/exclude events based on attributes (such as saved object type)
Additional configuration to support the above

The text was updated successfully, but these errors were encountered:

elasticmachine · 2019-12-03T21:16:23Z

Pinging @elastic/kibana-security (Team:Security)

jportner · 2019-12-03T21:18:00Z

@arisonl FYI

joshdover · 2020-02-19T21:09:56Z

I see the output format is going to be in ECS which is great. Will we support ingesting this data into Elasticsearch and using it in the product for inspection by admins? We should be able to leverage Core's logging appenders to accomplish the ingestion piece.

jportner · 2020-02-20T14:19:14Z

I see the output format is going to be in ECS which is great. Will we support ingesting this data into Elasticsearch and using it in the product for inspection by admins? We should be able to leverage Core's logging appenders to accomplish the ingestion piece.

My take on it is that the ingestion itself is out of scope for this feature. As long as we can output to JSON on the file system (which we were intending to use Core's logging appenders to do), Filebeat can be used for ingestion. Is that what you meant? Or are the logging appenders going to support ingestion directly?

joshdover · 2020-02-20T18:06:28Z

Filebeat would definitely work. It'd be interesting if we could actually ship Filebeat with Kibana configured to do this automatically. Of course there's some complexity with that as well (process monitoring, licensing, etc.)

My broader question is about whether or not there are plans to use this data in the product. For example, it'd be great if there was a menu item on an visualization that opened a UI with a history of edits to that visualization.

jportner · 2020-02-25T19:29:39Z

My broader question is about whether or not there are plans to use this data in the product. For example, it'd be great if there was a menu item on an visualization that opened a UI with a history of edits to that visualization.

In short: no. There is overlap of what information we need / what conclusions we can draw with audit logging and what we're calling "usage data". However, there is a strong separation of concerns there. We ultimately decided to keep this at a smaller scope just for the auditing use case.

I do think that once we have all of the new audit logging in place, we'll have all of the hooks/plumbing necessary to track and provide robust usage data. But we don't want to conflate audit records and usage data.

kobelb · 2020-02-27T18:25:59Z

During a Zoom meeting today, there was some discussion about which events and attributes should be in the "normal logs" vs what should be in the "audit logs". @jportner and I discussed this further and I've summarized the consensus that we reached.

The normal logs should not include user-specific information. User information is particularly sensitive, and augmenting normal log events with this information is potentially problematic. However, it's perfectly fine for these to include opaque identifiers for the session and the HTTP request. The normal logs should include all events which are logged using the standard logging infrastructure and be filtered however the user chooses.

The audit logs should include user-specific information, and controls will be put in place to only log entries for specific users or only specific user information. The audit logs will include only audit specific events. There is potentially some overlap here with regard to the events which appear in the normal logs and in the audit logs, but they're generally completely separate. The audit logs will include all authorization and authentication based events, in addition to events for specific operations of interest, including but not limited to: saved-object CRUD, Elasticsearch queries. The mechanism for creating the audit events for operations which aren't auth related needs to be explored further.

joshdover · 2020-03-02T16:11:43Z

Components needed:

Scoped contextual logger (Platform)
- Includes a unique identifier for current "context" which may be represented as a string (could be an X-Opaque-Id)
  - Generate if not present on incoming request, unique per request
  - Add config for only accepting X-Opaque-Id set by specific IP addresses (trusted proxies)
  - Add config to source session ID from another header
- Does not include data about current user - we don't want this in OSS, security should add it itself (maybe we add a addScopeProvider API to the logger API?)
- Provided to HTTP routes via RouteHandlerContext
- Wired into RouteHandlerContext's elasticsearch client, SO client, and uisettings client
A way for the audit logging system to receive ALL logs (Platform)
- This is the most unclear part of this plan, there may be better alternatives
- Expose a "firehose" API, eg. records$(): Observable<LogRecord>
- Allow security to read and update the logging config dynamically so it could add its audit appender to all logging contexts
  - We could probably start with this and change to something else if it's a hassle
A new layout (Operations / Security)
- ECS (OSS, default JSON layout?)
- ECS (audit / extended)
  - an extension of the normal ECS layout that also logs user + event data
New audit log events (Security)
- Events can be sourced from the firehose or directly added as domain events in SO client for example
Other logging enhancements (Operations)
- Log rotation
- Elasticsearch query logs
- HTTP requests, responses
- Ops metrics logs
- etc.... (New platform Logging service improvements #58261)

Open questions:

Should the audit logger be a child context of the root Core logger or should it be its own instance of the same system that is configured separately?
What's the best way for the audit logger to access other events (eg. incoming requests, Elasticsearch queries, etc.)?
How can we make sure the scoped logger has the appropriate information needed for log events without exposing sensitive information to the normal logger (eg. username)?

mshustov · 2020-03-03T11:30:49Z

I see the Audit service as a separate top-level service (the outer circle in the onion architecture)

No plugins depend on the AuditTrail. AuditTrail Service may depend on any plugin.
The platform and plugins emit auditable events. AuditTrail service listen to them and call plugin API to collect the necessary data.

security.on('authenticationSuccess', (message: string, request: KibanaRequest) => {
  const auditData = {
    message,
    action: 'authenticationSuccess'
    user: security.getUser(request),
    spaces: spaces.getSpace(request),
    server: core.http.getServerInfo(),
   ...
}
// has a well-known prefix
log.logger(auditData);

As an alternative, Platform provides Auditable hook and AuditTrail service registers itself via this hook.

registerAuditable(({ action: string, message: string, request: KibanaRequest }) => void): void;

To define the logging layout, we can use the same approach as elasticsearch does for SecurityAudit - add an explicit config in x-pack that enhance OSS kibana.yml config.
https://github.com/elastic/elasticsearch/blob/fb86e8d6d67d95a8f2e99a175e3a6d7bbb4b196e/distribution/docker/src/docker/config/log4j2.properties#L47-L82
That would allow users to configure layout and destination as requied.

The open question for me: What type of unique data each auditable event has got? I suspect a dataset for Elasticsearch query and authentication denied events can be different.
If the dataset for every auditable event is the same, we can use a common interface for Audit service. Otherwise, we might want to separate common fields from event-specific fields.
AuditTrail implementation in Elasticsearch:
https://github.com/elastic/elasticsearch/blob/5775ca83dbee90d3988faa611024bfaf42b13073/x-pack/plugin/security/src/main/java/org/elasticsearch/xpack/security/audit/logfile/LoggingAuditTrail.java

ECS (OSS, default JSON layout?)

Elasticsearch doesn't use the ECS type. Instead, their JSON layout follows the ECS format by default.

Does not include data about current user - we don't want this in OSS, security should add it itself (maybe we add a addScopeProvider API to the logger API?)

We already have RequestHandlerContext. It might expose addMetaData() to extend request with additional data. If we consider some data as sensitive, we shouldn't provide read access to it. The main problem with this approach that the AuditTrail plugin hasn't got control over the shape of data, but it needs to filter and to format them to a necessary layout (that differentiate it from telemetry plugin approach)

joshdover · 2020-03-03T22:02:47Z

I think we are largely on the same page here. I'd like to layout this plan with a distinction between some of the concerns. Namely, I'd like to separate what is necessary to support general observability and tracing within Kibana logs (OSS and otherwise) and what is necessary to support audit logs (X-Pack).

General observability requirements:

Be able to find all log messages that occur during an HTTP request
- This includes Elasticsearch logs
Be able to configure log layouts and appenders to output this data
Support ECS in JSON layout

Audit logging requirements:

Be able to trace key domain actions within Kibana
- Examples: saved objects access events, authentication events, etc.
Be able to correlate domain action events with regular request logs

For the general observability case, we need a couple new components:

Contextual data on log records that includes information about the request that initiated the log
An ECS-compatible JSON log layout

I think we're both in agreement on how to accomplish these two requirements.

(1) can be solved by introducing a formal "LogContext` struct that is used by both the Logger and the Elasticsearch and SavedObjects clients. This struct would be created by Core's request context provider and injected into the ES and SO clients exposed by RequestHandlerContext. This enables every log message in those clients to include data about the current request (would not include user data).

(2) is solved by changing our JSON log layout to be ECS-compatible.

For the audit logging case, we need:

A way to produce domain action events, a few options:
- Specific emit points in the OSS code;
- Leveraging the existing logs and translating them to domain events; or
- A registerAuditable interface as above
A way in which domain actions can be mapped back to additional context information not included in OSS
- For most cases this is being able to call security.authc.getCurrentUser with the KibanaRequest object.

(1) is where I think we need some discussion.

My only concern about adding domain-specific events is that they may be abused by other plugins for different purposes. For example, we've gotten requests to add hooks like onDelete to the SavedObjectsClient. Having generic hooks like this can lead to a complex web of business logic that relies on these hooks executing in order to keep the system in a valid state.

I think we just need to take care in how we implement such events so that the timing of when they are executed is not depended on by business logic. In other words, I want to avoid a situation where an app is dependent on these hooks in order to function correctly (other than audit logging itself).

This makes me lean slightly towards the registerAuditable interface or something similar. I think it's much more likely that these type of events are consumed responsibly if they are exposed this way, rather than each sub-system emitting these events.

legrego · 2020-03-05T13:25:55Z

Sorry for being dense, I'm a bit confused about the proposed use of registerAuditable. @restrry's example makes me think that it would be used by the service responsible for decorating and writing the already-generated audit log events to disk, but @joshdover's comment leads me to believe that it would be used to produce the domain action events that eventually get decorated and logged downstream.

Can we outline what a couple of domain action event might look like? Let's say that both Security and Spaces are enabled:

View diagram markup

title Create Dashboard
User->Kibana: Create Dashboard Request

Kibana->Kibana: Unique Request identifier created

Kibana->Saved Objects Service: Create Dashboard Request

Saved Objects Service->Security SOC Wrapper: AuthN Check

Security SOC Wrapper->ES: _has_privileges request

ES->Security SOC Wrapper: _has_privileges response

Security SOC Wrapper->Saved Objects Service: OK

Saved Objects Service->ES: index { bigBlobOfJSON }

ES->Saved Objects Service: { bigBlobOfJSON }

Saved Objects Service->Kibana: { bigBlobOfJSON }

Kibana->User: Create Dashboard Response

In this example, we have 2 requests made to ES: one for the privileges check, and another to actually index the saved object. In this example, I'd expect a single "Create dashboard" audit record, as the privileges check is a simple implementation detail, which would still be captured by the ES audit logs.

What about a more complex example though? Consider the "Copy to space" feature. This works by first performing a server-side export, followed by a server-side import:

View diagram markup

title Copy to Space
User->Kibana: Copy to space Request

Kibana->Kibana: Unique Request identifier created

Kibana->Saved Objects Service: bulk_get objects to be copied

Saved Objects Service->Security SOC Wrapper: AuthN Check

Security SOC Wrapper->ES: _has_privileges request

ES->Security SOC Wrapper: _has_privileges response

Security SOC Wrapper->Saved Objects Service: OK

Saved Objects Service->ES: bulk_get [{type: 'search', id: 'foo'}, ...]

ES->Saved Objects Service: bulk_get response [{ bigBlobOfJSON }]

Saved Objects Service->Kibana: [{ bigBlobOfJSON }]

Kibana->Saved Objects Service: bulk_create objects to be copied

Saved Objects Service->Security SOC Wrapper: AuthN Check

Security SOC Wrapper->ES: _has_privileges request

ES->Security SOC Wrapper: _has_privileges response

Security SOC Wrapper->Saved Objects Service: OK

Saved Objects Service->ES: bulk_create [{type: 'search', id: 'foo'}, ...]

ES->Saved Objects Service: bulk_create response [{ bigBlobOfJSON }]

Saved Objects Service->Kibana: [{ bigBlobOfJSON }]

Kibana->User: Copy to space Response

How many audit records would we expect to see here? Somewhere between 1 and 3?

"Copy to space" record
"Export/ bulk_get saved objects" record
"Import / bulk_create saved objects" record

My initial reaction is that 2 and 3 are implementation details of 1, and therefore might not make sense in the audit log. They should show up in the general log, however. Someone trying to understand the audit logs might get confused that they're seeing bulk_get and bulk_create requests, when they in fact "only" performed a Copy to space action.

To make a comparison to the ES audit logs, I don't think they record shard read/writes that occur as part of a user's request. They log that the request happened, and the "implementation details" are kept out of the audit logs.

I only bring this up because it's not immediately clear to me where we'll choose to generate/emit these audit events. Doing so at the saved objects client would cause these "implementation details" to be logged for various domain action events. Emitting from the http routes (the public API) would probably get us most of the way there, but that doesn't handle actions like background jobs.

mshustov · 2020-03-05T14:47:00Z

In this example, we have 2 requests made to ES: one for the privileges check, and another to actually index the saved object. In this example, I'd expect a single "Create dashboard" audit record, as the privileges check is a simple implementation detail, which would still be captured by the ES audit logs.

I'd expect to see Dashboard created and Dashboard creation failed audit records in this example. Both should provide additional info: who performs an action, in what space, etc.

How many audit records would we expect to see here? Somewhere between 1 and 3?

The same logic here. I expect the only one event here - Copied to space. Users do not think in terms of Export/ bulk_get saved objects / Import / bulk_create saved objects. As you said, they are implementation details. However, users can find correlated low-level events in the Kibana logs via a request identifier / a background task identifier.

I only bring this up because it's not immediately clear to me where we'll choose to generate/emit these audit events.

The Infrastructure level (ES / SO clients) cannot emit domain events. A plugin code emits them. Depending on the plugin workflow, it can be done:

in an HTTP route handler
in background task runner

I proposed to use Audit Trail service that receives those domain events and calculates data for to build Audit Logging Record:

// in plugin code
auditTrail.add({event, message, request});
// in http request handler context can be bound to a request
auditTrail.add({event, message});
// in background task we haven't got a context pattern and might have to introduce one
auditTrail.add({event, message});

// in audit trail plugin code
class AuditTrail {
  on(event, message, request){
    const auditData = {
      message,
      action: 'authenticationSuccess'
      user: security.getUser(request),
      spaces: spaces.getSpace(request),
      server: core.http.getServerInfo(),
     ...
  }
  // has a well-known prefix
  log.logger(auditData);
}

Audit Logger doesn't deal with any observability concerns (ES query performance, for example).

Let me know if it makes sense to you or if I missed something.

legrego · 2020-03-05T15:01:09Z

That all makes sense, thanks. My primary question was how we would allow plugin code to emit events. Something like auditTrail.add({event, message, request}) makes perfect sense to me.

My initial confusion was around registerAuditable, and then I got distracted with those two examples I put up. So registerAuditable would be a hook provided by core, which the security plugin (for example) could call in order to be notified about all emitted audit events? Similar to how core provides a hook for security to register the auth provider?

mshustov · 2020-03-05T15:53:10Z

So registerAuditable would be a hook provided by core, which the security plugin (for example) could call in order to be notified about all emitted audit events?

I'd expect it to be used by AuditTrail plugin to extend the platform. There are several benefits of using it in this manner:

the Audit API can be used for OSS code (ES, SO CRUD)
the Audit API can be used for audit plugin dependencies (security, spaces)

AuditTrail plugin can depend on any plugin and uses plugin public API to calculate audit data:

// package.json
requiredPlugins: ['security', 'spaces'],
// plugin.ts
class AuditTrail {
  on(event, message, request){
    const auditData = {
      message,
      action: 'authenticationSuccess'
      user: security.getUser(request),
      spaces: spaces.getSpace(request),
      server: core.http.getServerInfo(),
     ...
  }
  // has a well-known prefix
  log.logger(auditData);
}

platform.registerAuditable(auditTrail.on)

Probably registerAuditable is not the best name. Is registerAuditor more clear?

Also, I'd like to hear from Josh. He might have a different vision.

jportner · 2020-03-05T16:52:04Z

Good idea making a diagram @legrego --

OK, so Approach #1 as described above is to generate a single audit event for each user request.

How many audit records would we expect to see here? Somewhere between 1 and 3?

"Copy to space" record

"Export/ bulk_get saved objects" record

"Import / bulk_create saved objects" record

In Approach #2 that I've been thinking of, we would see five audit records:

In my mind it would look something like this.

Click to see JSON

{
  "event": {
    "action": "read sourcespace dashboard",
    "category": "saved_objects_authorization",
    "module": "plugin:security",
    "outcome": "success",
  },
  "trace": { "id": "some-uuid" }
}
{
  "event": {
    "action": "bulk_get [sourcespace:dashboard:foo]",
    "category": "saved_objects_client",
    "module": "core",
    "outcome": "success",
  },
  "trace": { "id": "some-uuid" }
}
{
  "event": {
    "action": "write destspace dashboard",
    "category": "saved_objects_authorization",
    "module": "plugin:security",
    "outcome": "success",
  },
  "trace": { "id": "some-uuid" }
}
{
  "event": {
    "action": "bulk_create [destspace:dashboard:foo]",
    "category": "saved_objects_client",
    "module": "core",
    "outcome": "success",
  },
  "trace": { "id": "some-uuid" }
}
{
  "event": {
    "action": "POST",
    "category": "http",
    "module": "core",
    "outcome": "success",
  },
  "http": {
    "request": {
      "body": {
        "content": "{\"objects\":[{\"type\":\"dashboard\",\"id\":\"foo\"}],\"spaces\":[\"destspace\"],\"includeReferences\":true,\"overwrite\":true}"
      },
      "method": "POST"
    }
  },
  "source": {
    "address": "12.34.56.78",
    "ip": "12.34.56.78"
  },
  "url": {
    "domain": "www.somekibanahost.com",
    "full": "https://www.somekibanahost.com/api/spaces/_copy_saved_objects",
    "path": "/api/spaces/_copy_saved_objects",
    "port": "443",
    "query": "",
    "scheme": "https"
  },
  "user": {
    "email": "[email protected]",
    "full_name": "John Doe",
    "hash": "D30A5F57532A603697CCBB51558FA02CCADD74A0C499FCF9D45B...",
    "sid": "2FBAF28F6427B1832F2924E4C22C66E85FE96AFBDC3541C659B67...",
    "name": "jdoe",
    "roles": [ "kibana_user" ]
  },
  "trace": { "id": "some-uuid" }
}

Note 1: I omitted some attributes in the interest of brevity.

Note 2: each record can be correlated with each other by trace.id (which should also be sent to Elasticsearch as X-Opaque-Id).

Note 3: the four records in the audit trail with the saved_objects_client and saved_objects_authorization categories wouldn't need to contain all of the attributes (http, source, url, user) -- however, the "events" that these records are generated from would still need to have this info. This is because we want to be able to add a filter to avoid writing records based on certain attributes, such as user or IP address.

So, this approach would generically audit all API routes and SOC calls. It would show what's happening "under the hood" for the SOC and its wrappers. Of course this is more verbose than the alternative of writing a single audit event for each request.

Potential advantages of Approach #2:

Less work to get broad auditing coverage of Kibana -- we don't need each plugin to be responsible for auditing events for all of its APIs. If an API relies on the SOC then it shouldn't need to audit anything else. From what I understand, this covers the majority of Kibana APIs. For any other APIs that don't rely on the SOC, we can add specific audit events.
- Fewer blind spots -- any new plugin would automatically get covered by these generic audit events.
- No need for us to try to understand "failure" event nuances (such as authZ failed for read, or authZ failed for write) at the API route level -- records that were generated at lower layers would include that info.
Easier to reason about the chain of events that happen when Kibana interacts with Elasticsearch on behalf of a user.
Easier to track down who did what (e.g., search for "create dashboard:foo" -- that would be revealed in this audit trail because of the bulk_create event that was logged)

Disadvantages:

More verbose (though we plan to offer the ability to filter out events by action or category if so desired)
Additional events may not always have much meaning

Thoughts?

joshdover · 2020-03-05T17:52:12Z

Probably registerAuditable is not the best name. Is registerAuditor more clear?

Also, I'd like to hear from Josh. He might have a different vision.

I think we're on the same page here. The only part I'm confused about in your example is the auditTrail.add API. This is meant to be a Core API, right? Not an API on the audit trail plugin.

If we're on the same page there, then the final result is Platform would need to expose two APIs:

registerAuditor for receiving audit events
- This is the API that audit log plugin would use to get all events, enrich with additional data, and forward to a logger.
auditTrail.add / addAuditEvent / someOtherName for adding audit events
- This is the API that Core, OSS plugins, and commercial plugins would use to add domain-events for user actions (eg. Copy to Space). These events are forwarded to any auditors registered with registerAuditor.

In terms of what produces the audit events themselves (@jportner's discussion above), I think I do favor Approach #2 for its completeness. It seems less likely that we may miss an critical event that should be included in the audit log if we log the lower level details. That said, I'm not very familiar with how audit logs are used by customers. If the low-level logs are too opaque to understand, that could make these logs much less useful.

So really it seems the question is: do we favor completeness or clearer semantics?

Could we do both? Could the semantic, high-level action be provided as a "scope" for the lower-level audit events?

For example, what if we had an API that allows an HTTP endpoint to start a auditable event scope so that all audit events that are produced while that scope is open are associated with the high-level semantic action.

router.post(
  { path: '/api/do_action' },
  async (context, req, res) => {
    const auditScope = context.audit.openScope('copy_to_space');
    try {
      // Any audit events produced by SO client while scope is open 
      // would be associated with the `copy_to_space` scope.
      const res = await copyToSpace(context.savedObjects.client);
      return res.ok({ body: res });
    } finally {
      auditScope.close();
    }
  }
);

Or we could change the API a bit to:

router.post(
  { path: '/api/do_action' },
  async (context, req, res) => context.audit.openScope(
    'copy_to_space',
    async () => {
      // Any audit events produced by SO client while scope is open 
      // would be associated with the `copy_to_space` scope.
      const res = await copyToSpace(context.savedObjects.client);
      return res.ok({ body: res });
    }
  )
);

The tricky part about this in Node.js is that these async actions are running in the same memory space, which makes associating the scope with any asynchronous code difficult. Couple options for solving:

Bind the context.audit object for the request to the ES and SO clients provided by context.
Use the continuation-local-storage library to associate any code executed in the promise chain with the scope. This would eliminate any problem with plugins that use their own ES client or SO repository not being associated with the scope. However, I don't have experience with this library and it may be a premature optmization.

mshustov · 2020-03-06T11:06:42Z

If we're on the same page there, then the final result is Platform would need to expose two APIs:
registerAuditor for receiving audit events
This is the API that audit log plugin would use to get all events, enrich with additional data, and forward to a logger.
auditTrail.add / addAuditEvent / someOtherName for adding audit events
This is the API that Core, OSS plugins, and commercial plugins would use to add domain-events for user actions (eg. Copy to Space). These events are forwarded to any auditors registered with registerAuditor.

Correct 👍

The tricky part about this in Node.js is that these async actions are running in the same memory space, which makes associating the scope with any asynchronous code difficult.

AFAIK Nodejs provides built-in primitives that we can try to use for this case https://nodejs.org/api/async_hooks.html
it's time to finally watch https://www.youtube.com/watch?v=omOtwqffhck from
@watson 😄

joshdover · 2020-03-09T15:52:42Z

AFAIK Nodejs provides built-in primitives that we can try to use for this case nodejs.org/api/async_hooks.html

I agree async_hooks could be a solution. My concern is just that it's still in experimental, even in the latest Node version. It does look like the working group is discussing stabilization. If it does go stable in v14 LTS, it could be a viable option for us.

thomheymann · 2020-07-01T10:42:00Z

Hi team, I'm new to the project and am starting to get up to speed with the audit log feature.

From speaking to different people there still seem to be a few outstanding questions and different ideas as to what the audit log should provide, to what level of detail and how it differs from existing logging.

In order to help us define a clear approach I wanted to define some guiding principles that we can agree on and then refer back to when making a decision about whether something should be included in the audit log or not and what the implementation should look like.

I have written these as statements but they are all open questions / up for debate.

I might have gotten this completely wrong so would be great to get your thoughts!

Guiding Principles

What’s the difference between our audit log and system log?

The purpose of an audit log is to support compliance, accountability and security by capturing who performed an action, what action was performed, when it occurred and what the outcome was
It is not the purpose of an audit log to aid with debugging the system or provide usage statistics

What events need to be captured?

Auditing requirements will vary widely between organisations so we will allow fine grained control over what gets captured with sensible defaults
At the most verbose level we should allow capturing all events that fall in the following categories:
- System access (incl. failed attempts)
- Data reads (incl. failed attempts)
- Data writes (incl. failed attempts)
Filters can then be applied to e.g. only log data mutations or failed attempts

When are events logged?

Audit logs have knowledge of the outcome of an event so will be captured after an operation completed

Can an action trigger multiple events (log lines)?

Actions can be a combinations of different operations each of which need to be captured as separate events
Multiple events that were part of the same request can be correlated in the audit log using the trace id property
A bulk operation should be logged as a single log line with meta data (e.g. which saved objects where accessed) extracted to simplify search / aggregation

How does Kibana audit logging tie into ElasticSearch audit logging?

Kibana should provide a full picture regarding what saved objects were accessed by whom since ElasticSearch has no context over Kibana session / user details
Kibana will not capture results from queries against users' data indices (Responsibility of ElasticSearch)
Audit logs of both systems can be correlated using X-Opaque-Id header

Examples

Do log when a user logs in or out of Kibana
Do log when any saved object was accessed / written to
Do not log when user data indices / records were accessed
Do not log Kibana implementation details (i.e. if Kibana needs to make certain checks internally but the user has no way of seeing that data in the UI or API response then that should not be logged in the audit log)

thomheymann · 2020-07-06T15:16:09Z

ECS Audit Log Proposal

Field Reference: https://www.elastic.co/guide/en/ecs/current/ecs-field-reference.html

Approach

Authorisation / privilege checks are logged as an outcome of an action rather than as a separate log line since they are implementation details. This is the same approach as error/success results in ECS standard.

Bulk operations are logged as separate events. It would be less verbose to combine a bulk operation into a single log line but that would mean that we can't record successes/failures individually using ECS standard. Saved object details are extracted into a non-standard document field for each audit event.

category, type and outcome fields are categorisation fields in ECS with specific allowed keywords. I tried to map these as good as I can but some of them do sound slightly clunky for our use case.

Events

User Authentication

{
  "message": "User 'jdoe' logged in successfully using realm 'native'|Failed login attempt using realm 'native'|User re-authentication failed",
  "event": {
    "action": "user_login|user_logout|user_reauth",
    "category": ["authentication"],
    "type": ["user"],
    "outcome": "success|failure",
    "module": "kibana",
    "dataset": "kibana.audit"
  },
  "error": {
    "code": "spaces_authorization_failure",
    "message": "jdoe unauthorized to getAll spaces",
  },
  "trace": {
    "id": "opaque-id"
  }
}

Saved Object CRUD

{
  "message": "User 'jdoe' created dashboard 'new-saved-object' in space 'default'",
  "event": {
    "action": "saved_object_create",
    "category": ["database"],
    "type": ["creation|access|change|deletion", "allowed|denied"],
    "outcome": "success|failure",
  },
  "document": {
    "space": "default",
    "type": "dashboard",
    "id": "new-saved-object"
  },
  "error": {
    "code": "spaces_authorization_failure",
    "message": "jdoe unauthorized to getAll spaces",
  },
  "trace": {
    "id": "opaque-id"
  }
}

HTTP Response

{
  "message": "HTTP request 'login' by user 'jdoe' succeeded",
  "event": {
    "action": "http_request",
    "category": ["web"],
    "outcome": "success|failure",
  },
  "http": {
    "request": {
      "method": "POST",
      "body": {
        "content": "{\"objects\":[{\"type\":\"dashboard\",\"id\":\"foo\"}],\"spaces\":[\"destspace\"],\"includeReferences\":true,\"overwrite\":true}"
      }
    },
    "response": {
      "status_code": 200
    }
  },
  "source": {
    "address": "12.34.56.78",
    "ip": "12.34.56.78"
  },
  "url": {
    "domain": "kibana",
    "full": "https://kibana/api/spaces/_copy_saved_objects",
    "path": "/api/spaces/_copy_saved_objects",
    "port": "443",
    "query": "",
    "scheme": "https"
  },
  "user": {
    "email": "[email protected]",
    "full_name": "John Doe",
    "hash": "D30A5F57532A603697CCBB51558FA02CCADD74A0C499FCF9D45B...",
    "sid": "2FBAF28F6427B1832F2924E4C22C66E85FE96AFBDC3541C659B67...",
    "name": "jdoe",
    "roles": [ "kibana_user" ]
  },
  "trace": {
    "id": "opaque-id"
  }
}

Scenarios

Copy to space

{
  "message": "User 'jdoe' accessed dashboard 'first-object' in space 'default'",
  "event": { "action": "saved_object_read", "category": ["database"], "type": ["access"], "outcome": "success" },
  "document": { "id": "first-object", "type": "dashboard", "space": "default" }
}
{
  "message": "User 'jdoe' accessed dashboard 'second-object' in space 'default'",
  "event": { "action": "saved_object_read", "category": ["database"], "type": ["access"], "outcome": "success" },
  "document": { "id": "second-object", "type": "dashboard", "space": "default" }
}
{
  "message": "User 'jdoe' created dashboard 'first-object' in space 'copy'",
  "event": { "action": "saved_object_create", "category": ["database"], "type": ["creation"], "outcome": "success" },
  "document": { "id": "first-object", "type": "dashboard", "space": "copy" }
}
{
  "message": "User 'jdoe' created dashboard 'second-object' in space 'copy'",
  "event": { "action": "saved_object_create", "category": ["database"], "type": ["creation"], "outcome": "success" },
  "document": { "id": "second-object", "type": "dashboard", "space": "copy" }
}
{
  "message": "HTTP request 'copy-to-space' by user 'jdoe' succeeded",
  "event": { "action": "http_request", "category": ["web"], "outcome": "success" }
}

Error: User not authorised to access dashboard (Kibana authZ):

{
  "message": "User 'jdoe' not authorised to access dashboard 'first-object' in space 'default'",
  "event": { "action": "saved_object_read", "category": ["database"], "type": ["access"], "outcome": "failure" },
  "error": { "code": "spaces_authorization_failure", "message": "jdoe unauthorized to getAll spaces" },
  "document": { "id": "first-object", "type": "dashboard", "space": "default" }
}
{
  "message": "HTTP request 'copy-to-space' by user 'jdoe' failed",
  "event": { "action": "http_request", "category": ["web"], "outcome": "failure" },
  "error": { "code": "spaces_authorization_failure", "message": "jdoe unauthorized to getAll spaces" }
}

Error: Session expired (Kibana authN):

{
  "message": "Unknown user not authenticated to request 'copy-to-space'",
  "event": { "action": "http_request", "category": ["web", "authentication"], "type": ["denied"], "outcome": "failure" }
}

Error: User not authorised to access data index (ElasticSearch authZ):

{
  "message": "User 'jdoe' not authorised to access index 'products'"
}
{
  "message": "HTTP request 'copy-to-space' by user 'jdoe' failed",
  "event": { "action": "http_request", "category": ["web", "authentication"], "type": ["allowed"], "outcome": "failure" }
}

User login

{
  "message": "User 'jdoe' logged in successfully using realm 'native'",
  "event": { "action": "user_login", "category": ["authentication"], "type": ["user"], "outcome": "success" }
}
{
  "message": "HTTP request 'login' by user 'jdoe' succeeded",
  "event": { "action": "http_request", "category": ["web"], "outcome": "success" }
}

Open question

How does generic API request logging (http_request) tie into the other audit events. For bulk operation these make sense as it groups the other events together. For single operation requests it feels like unnecessary duplication. (See user_login example)
Implementation constraints: Do we have all data available we need at the point of logging?

mshustov · 2020-07-07T08:44:04Z

@thomheymann thank you for the logging format proposal. I have a couple of questions about the Events section.

Is it the complete list of events for the first stage of Audit Logging? Or it's just the list for the First phase / just an example of sub-set of events.
What support for HTTP events required from the platform team side? I suspect we don't have to track all response, but only for selected routes using its context-specific information.
Is audit for SO actions performed by the Security plugin? The SO client from the core doesn't know about authz/authc restrictions.

thomheymann · 2020-07-07T10:12:56Z

Thanks for feedback Mikhail!

Is it the complete list of events for the first stage of Audit Logging? Or it's just the list for the First phase / just an example of sub-set of events.

These are only example events, there are a lot more events we would audit but I wanted to establish some kind of a pattern first since most of the other events would follow a similar approach. I've added a list of the possible other events below. (again, not complete / reviewed)

What support for HTTP events required from the platform team side? I suspect we don't have to track all response, but only for selected routes using its context-specific information.

The way I understood HTTP based audit logging is that it's a way of very quickly and easily getting most of our auditing requirements ticked off without forcing plugin authors to manually create audit specific events. It feeds into one of my open questions though around the overlap of these (i.e. do we need an http_request event for the login route in our audit log if we already log user logins as a separate event?)

Is audit for SO actions performed by the Security plugin? The SO client from the core doesn't know about authz/authc restrictions.

I have no view on this at this point, I'm purely looking at it from a requirements perspective. Would be great to get a steer in terms of what is actually feasible based on the implementation.

legrego · 2020-07-08T17:28:11Z

Thanks for the writeup @thomheymann! A quick note on your guiding principles:

Do log what indices / records were accessed

When discussing how this ties into ES audit logs, you menion:

Maybe record level audit logging could be left to ElasticSearch?

I agree with this. I wouldn't expect Kibana to log responses returned by ES that result from queries against users' data indices.

The full list of events might be easier to curate and discuss in a google doc. Entries under user and role management should be left to ES audit logs, as they are the authoritative source of this information. I expect logstash pipelines fall into this category as well.

What support for HTTP events required from the platform team side? I suspect we don't have to track all response, but only for selected routes using its context-specific information.

The way I understood HTTP based audit logging is that it's a way of very quickly and easily getting most of our auditing requirements ticked off without forcing plugin authors to manually create audit specific events. It feeds into one of my open questions though around the overlap of these (i.e. do we need an http_request event for the login route in our audit log if we already log user logins as a separate event?)

At the most verbose level, we may want to include everything, or almost everything here. The ability to filter this out will be critical though, and it'll probably make sense to come up with a sensible configuration so that we don't log everything by default, but instead allow administrators to opt-in to more granularity.

Perhaps the platform could add a route option to the interface to allow a route to exclude itself from auditing, if we find that we need this flexibility.

Is audit for SO actions performed by the Security plugin? The SO client from the core doesn't know about authz/authc restrictions.

I have no view on this at this point, I'm purely looking at it from a requirements perspective. Would be great to get a steer in terms of what is actually feasible based on the implementation.

I'm leaning towards having the security plugin log these events (it's what we do today). It's technically possible to create a SOC without the security wrapper applied, but in those cases, we'd expect consumers to audit their own SO events. Alerting is one such example: https://github.com/gmmorris/kibana/blob/alerting/consumer-based-rbac/x-pack/plugins/alerts/server/authorization/alerts_authorization.ts#L158

legrego · 2020-07-08T18:18:34Z

Bulk operations are logged as separate events. It would be less verbose to combine a bulk operation into a single log line but that would mean that we can't record successes/failures individually using ECS standard. Saved object details are extracted into a non-standard document field for each audit event.

There might be an exception to this that I'm overlooking, but I believe all bulk operations are all-or-nothing today, so we don't have a need for logging success/failures individually. Our current approach (which isn't necessarily the right one) is to log bulk operations as a single entry, but that entry identifies the objects in question.
Verbosity aside, I worry about the performance of logging bulk operations as separate events. An export of 10,000 saved objects would require approximately 10,000 audit log entries, which could take a non-trivial amount of time.

How does generic API request logging (http_request) tie into the other audit events. For bulk operation these make sense as it groups the other events together. For single operation requests it feels like unnecessary duplication. (See user_login example)

It might be unnecessary duplication, but I think it's hard to definitively say that a certain API endpoint will only ever do a single operation. We could attempt to tag routes as such, but that requires manual effort on the engineering side which could be easily overlooked during a seemingly unrelated refactor. At the moment, I'm thinking we'll accept the duplication since we'll have the ability to filter events, but we can always revisit this if we find a clear pattern to these events

I'm interested in hearing other thoughts though! My opinions here are just that.

legrego · 2022-08-11T18:46:38Z

Closing this meta issue, as we have sub-issues open to track the remaining individual tasks that we care about at this time.

mbudge · 2023-12-14T20:54:19Z

Please can you add the saved object name/description so we can provide reports to IT controls?

Reports with the saved object ID aren't user friendly.

legrego · 2023-12-18T18:26:19Z

@mbudge your request is being tracked here: #100523.
edit: I see you discovered this already

jportner added Meta Team:Security Team focused on: Auth, Users, Roles, Spaces, Audit Logging, and more! labels Dec 3, 2019

jportner self-assigned this Dec 3, 2019

jportner mentioned this issue Dec 11, 2019

Kibana audit logging and usage data #17939

Closed

jportner mentioned this issue Dec 30, 2019

[8.0] Remove @kbn/legacy-logging #50660

Closed

14 tasks

kobelb mentioned this issue Jan 14, 2020

Audit logging: phase 1 #54836

Closed

11 tasks

elastic deleted a comment from arisonl Feb 3, 2020

jportner mentioned this issue Feb 18, 2020

[KP] Implement Log rotation appender #56291

Closed

joshdover mentioned this issue Feb 19, 2020

Enable logging for when users change saved objects #57910

Closed

mshustov mentioned this issue Mar 3, 2020

[elasticsearch/logging] Add slowlogs config option. #58086

Open

legrego mentioned this issue Mar 3, 2020

New feature in Kibana audit log #59000

Closed

This was referenced Mar 13, 2020

Add generic AuditTrail service #60119

Closed

Add execution context to log records #60122

Open

[Meta] Logging Projects #60391

Closed

mshustov mentioned this issue Mar 19, 2020

Plugins can initiate Kibana graceful shutdown #60636

Open

This was referenced Mar 31, 2020

Allow plugins to extend logging config #61976

Closed

Pipe X-Opaque-Id header to AuditTrail logs and Elasticsearch API calls #62018

Closed

mshustov mentioned this issue Jun 23, 2020

[Audit Logging] Add AuditTrail service #69278

Merged

mshustov mentioned this issue Aug 5, 2020

[Fleet] Add audit logging support #74362

Open

dmlemeshko mentioned this issue Oct 19, 2020

Identify queries associated with scenario/user workflow in log elastic/kibana-load-testing#1

Open

This was referenced Oct 26, 2020

Kibana audit logging filebeat module #81609

Closed

Kibana audit logging API #81612

Open

Kibana audit logging attribute filtering #81613

Open

legrego mentioned this issue Dec 4, 2020

Improve Kibana account management and security #84784

Closed

legrego added the Feature:Security/Audit Platform Security - Audit Logging feature label Jul 22, 2021

jportner mentioned this issue Jan 19, 2022

X-Opaque-ID contains UUID causing ES deduplication to fail #120124

Closed

jportner removed their assignment Mar 22, 2022

ppf2 mentioned this issue May 25, 2022

ECS Audit logging lack of identifiable description on saved object #100523

Open

zhongnansu mentioned this issue Aug 4, 2022

[MD] Logging and Auditing opensearch-project/OpenSearch-Dashboards#1986

Open

3 tasks

legrego closed this as completed Aug 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Meta] Audit Logging #52125

[Meta] Audit Logging #52125

jportner commented Dec 3, 2019 •

edited by mikecali

Loading

Current state

Future state

elasticmachine commented Dec 3, 2019

jportner commented Dec 3, 2019

joshdover commented Feb 19, 2020

jportner commented Feb 20, 2020

joshdover commented Feb 20, 2020

jportner commented Feb 25, 2020 •

edited

Loading

kobelb commented Feb 27, 2020

joshdover commented Mar 2, 2020

mshustov commented Mar 3, 2020 •

edited

Loading

joshdover commented Mar 3, 2020

legrego commented Mar 5, 2020

mshustov commented Mar 5, 2020

legrego commented Mar 5, 2020

mshustov commented Mar 5, 2020

jportner commented Mar 5, 2020 •

edited

Loading

joshdover commented Mar 5, 2020 •

edited

Loading

mshustov commented Mar 6, 2020

joshdover commented Mar 9, 2020

thomheymann commented Jul 1, 2020 •

edited

Loading

thomheymann commented Jul 6, 2020 •

edited

Loading

mshustov commented Jul 7, 2020

thomheymann commented Jul 7, 2020 •

edited

Loading

legrego commented Jul 8, 2020

legrego commented Jul 8, 2020 •

edited

Loading

legrego commented Aug 11, 2022

mbudge commented Dec 14, 2023

legrego commented Dec 18, 2023 •

edited

Loading

[Meta] Audit Logging #52125

[Meta] Audit Logging #52125

Comments

jportner commented Dec 3, 2019 • edited by mikecali Loading

Overview

Current state

Future state

First Phase

Future Phase

elasticmachine commented Dec 3, 2019

jportner commented Dec 3, 2019

joshdover commented Feb 19, 2020

jportner commented Feb 20, 2020

joshdover commented Feb 20, 2020

jportner commented Feb 25, 2020 • edited Loading

kobelb commented Feb 27, 2020

joshdover commented Mar 2, 2020

mshustov commented Mar 3, 2020 • edited Loading

joshdover commented Mar 3, 2020

legrego commented Mar 5, 2020

mshustov commented Mar 5, 2020

legrego commented Mar 5, 2020

mshustov commented Mar 5, 2020

jportner commented Mar 5, 2020 • edited Loading

joshdover commented Mar 5, 2020 • edited Loading

mshustov commented Mar 6, 2020

joshdover commented Mar 9, 2020

thomheymann commented Jul 1, 2020 • edited Loading

Guiding Principles

What’s the difference between our audit log and system log?

What events need to be captured?

When are events logged?

Can an action trigger multiple events (log lines)?

How does Kibana audit logging tie into ElasticSearch audit logging?

Examples

thomheymann commented Jul 6, 2020 • edited Loading

ECS Audit Log Proposal

Approach

Events

User Authentication

Saved Object CRUD

HTTP Response

Scenarios

Copy to space

Error: User not authorised to access dashboard (Kibana authZ):

Error: Session expired (Kibana authN):

Error: User not authorised to access data index (ElasticSearch authZ):

User login

Open question

mshustov commented Jul 7, 2020

thomheymann commented Jul 7, 2020 • edited Loading

legrego commented Jul 8, 2020

legrego commented Jul 8, 2020 • edited Loading

legrego commented Aug 11, 2022

mbudge commented Dec 14, 2023

legrego commented Dec 18, 2023 • edited Loading

jportner commented Dec 3, 2019 •

edited by mikecali

Loading

jportner commented Feb 25, 2020 •

edited

Loading

mshustov commented Mar 3, 2020 •

edited

Loading

jportner commented Mar 5, 2020 •

edited

Loading

joshdover commented Mar 5, 2020 •

edited

Loading

thomheymann commented Jul 1, 2020 •

edited

Loading

thomheymann commented Jul 6, 2020 •

edited

Loading

thomheymann commented Jul 7, 2020 •

edited

Loading

legrego commented Jul 8, 2020 •

edited

Loading

legrego commented Dec 18, 2023 •

edited

Loading