Skip to content

Commit

Permalink
Phase 2 in conversion to asciidoc (elastic#334)
Browse files Browse the repository at this point in the history
* Phase 2 in conversion to asciidoc

* Update docs/convert.asciidoc

Trying the new suggest feature for changing v7.1 to v7.0.

Co-Authored-By: karenzone <[email protected]>

* Start incorporating review comments

* Update faq.asciidoc

Fix typo in anchor
  • Loading branch information
karenzone authored and Mathieu Martin committed Mar 5, 2019
1 parent d40f1af commit 1fbb02d
Show file tree
Hide file tree
Showing 9 changed files with 253 additions and 77 deletions.
18 changes: 8 additions & 10 deletions docs/conventions.asciidoc
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
//[[ecs-conventions]]
== {ecs} Conventions

{ecs} is most effective when you understand and follow conventions.
{ecs} is most effective when you understand and follow these guidelines and conventions.

[float]
=== Multi-fields text indexing
Expand All @@ -10,15 +10,13 @@ Elasticsearch can index text using:

* *Text.* Text indexing allows for full text search, or searching arbitrary words that
are part of the field.
See https://www.elastic.co/guide/en/elasticsearch/reference/current/text.html[Text datatype]
in the {es} Reference Guide.
See {ref}/text.html[Text datatype] in the {es} Reference Guide.
* *Keywords.* Keyword indexing offers faster exact match filtering and prefix search,
and makes aggregations (for Kibana visualizations) possible.
and makes aggregations (for {kib} visualizations) possible.
See the {es} Reference Guide for more information on
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-term-query.html[exact match filtering],
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-prefix-query.html[prefix search], or
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations.html[aggregations].

{ref}/query-dsl-term-query.html[exact match filtering],
{ref}/query-dsl-prefix-query.html[prefix search], or
{ref}/search-aggregations.html[aggregations].

[float]
==== Default Elasticsearch convention
Expand All @@ -40,7 +38,7 @@ For monitoring use cases, `keyword` indexing is needed almost exclusively, with
full text search on very few fields. Given this premise, ECS defaults
all text indexing to `keyword` at the top level (with very few exceptions).
Any use case that requires full text search indexing on additional fields
https://www.elastic.co/guide/en/elasticsearch/reference/current/multi-fields.html[multi-field].
can add a {ref}/multi-fields.html[multi-field]
for full text search. Doing so does not conflict with ECS,
as the canonical field name will remain `keyword` indexed.

Expand All @@ -63,7 +61,7 @@ follow the multi-field convention where `text` indexing is nested in the multi-f
[float]
=== IDs and most codes are keywords, not integers

Despite the fact that IDs and codes (e.g. error codes) are often integers,
Despite the fact that IDs and codes (such as error codes) are often integers,
this is not always the case.
Since we want to make it possible to map as many systems and data sources
to ECS as possible, we default to using the `keyword` type for IDs and codes.
Expand Down
52 changes: 52 additions & 0 deletions docs/convert.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
[[convert-to-ecs]]
== Converting an implementation to ECS

A common schema helps you correlate and use data from various sources.

Fields for most Elastic modules and solutions (version 7.0 and later) are mapped
to the Elastic Common Schema. You may want to map data from other
implementations to ECS to help you correlate data across all of your products
and solutions.

Before you start a conversion, be sure that you understand the basics below.

[float]
[[core-or-ext]]
=== Core and extended fields

* *Core fields.* Fields that are most common across all use cases are called *core fields*.
+
These generalized fields are used by analysis content
(searches, visualizations, dashboards, alerts, machine learning jobs, reports)
across use cases. Analysis content designed to operate on these
fields should work properly on data from any relevant source.
+
Focus on populating these fields first.

* *Extended fields.* Any field that is not a core field is called an *extended field*.
Extended fields may apply to more narrow use cases, or may be more open
to interpretation depending on the use case. Extended fields are more likely to
change over time.

Each {ecs} <<ecs-fields,field>> in a table is identified as core or extended.

[float]
[[ecs-comv]]
=== An approach to mapping an existing implementation

Here's the recommended approach for converting an existing implementation to {ecs}.

. Start with Core fields.
+
Populate core fields first. Look at your set of event fields, and find
the appropriate ECS field name for each one.

. Move on to Extended fields.
+
Map fields that may be specific to various data sources using {ecs} extended
fields. Look at {ecs} extended fields, and decide how to populate these fields
with the data you have available. Even if you have already mapped a field to an
{ecs} core field, you can still map it to an extended field.

Populating both core and extended fields helps ensure reusability of ECS analysis
content.
104 changes: 104 additions & 0 deletions docs/faq.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
[[ecs-faq]]
== FAQ

[float]
[[ecs-benefits]]
=== What are the benefits of using ECS?

The benefits to a user adopting these fields and names in their clusters are:

* **Data correlation.** Ability to easily correlate data from the same or different sources, including:
** data from metrics, logs, and application performance management (apm) tools
** data from the same machines/hosts
** data from the same service
* **Ease of recall.** Improved ability to remember commonly used field names (because there is a single set, not a set per data source)
* **Ease of deduction.** Improved ability to deduce field names (because the field naming follows a small number of rules with few exceptions)
* **Reuse.** Ability to re-use analysis content (searches, visualizations, dashboards, alerts, reports, and machine learning jobs) across multiple data sources
* **Future proofing.** Ability to use any future Elastic-provided analysis content in your environment without modifications

[float]
[[conflict]]
=== What if I have fields that conflict with ECS?

The
{ref}/rename-processor.html[rename
processor] can help you resolve field conflicts. For example, imagine that you
already have a field called "user," but ECS employs `user` as an object. You can
use the rename processor on ingest time to rename your field to the matching ECS
field. If your field does not match ECS, you can rename your field to
`user.value` instead.

[float]
[[addl-fields]]
=== What if my events have additional fields?

Events may contain fields in addition to ECS fields. These fields can follow the
ECS naming and writing rules, but this is not a requirement.

[float]
[[dot-notation]]
=== Why does ECS use a dot notation instead of an underline notation?

There are two common key formats for ingesting data into Elasticsearch:

* Dot notation: `user.firstname: Nicolas`, `user.lastname: Ruflin`
* Underline notation: `user_firstname: Nicolas`, `user_lastname: Ruflin`

ECS uses the dot notation to represent nested objects.

[float]
[[notation-diff]]
==== What is the difference between the two notations?

Ingesting `user.firstname: Nicolas` and `user.lastname: Ruflin` is identical to ingesting the following JSON:

```
"user": {
"firstname": "Nicolas",
"lastname": "Ruflin"
}
```

In Elasticsearch, `user` is represented as an {ref}/object.html[object
datatype]. In the case of the underline notation, both are just
{ref}/mapping-types.html[string datatypes].

NOTE: ECS does not use {ref}/nested.html[nested
datatypes], which are arrays of objects.

[float]
[[dot-adv]]
==== Advantages of dot notation

With dot notation, each prefix in Elasticsearch is an object. Each object can have
{ref}/object.html#object-params[parameters]
that control how fields inside the object are treated. In the context of ECS,
for example, these parameters would allow you to disable dynamic property
creation for certain prefixes.

Individual objects give you more flexibility on both the ingest and the event
sides. In Elasticsearch, for example, you can use the remove processor to drop
complete objects instead of selecting each key inside. You don't have to know
ahead of time which keys will be in an object.

In Beats, you can simplify the creation of events. For example, you can treat
each object as an object (or struct in Golang), which makes constructing and
modifying each part of the final event easier.

[float]
[[dot-disadv]]
==== Disadvantage of dot notation

In Elasticsearch, each key can have only one type. For example, if `user` is an
`object`, you can't use it as a `keyword` type in the same index, like `{"user":
"nicolas ruflin"}`. This restriction can be an issue in certain datasets. For
the ECS data itself, this is not an issue because all fields are predefined.

[float]
[[underline]]
==== What if I already use the underline notation?

As long as there are no conflicts, underline notation and ECS dot notation can
coexist in the same document.


67 changes: 55 additions & 12 deletions docs/fields-gen.asciidoc
Original file line number Diff line number Diff line change
@@ -1,22 +1,65 @@
[[ecs-base]]
=== Base fields

The base set contains all fields which are on the top level. These fields are common across all types of events.
The `base` set contains top level fields that are common across all types of events
(such as `@timestamp`, `tags`, `message`, and `labels`).

[cols="<,<,<,<,<",options="header",]
|=======================================================================
| Field | Description | Level | Type | Example
| [[@timestamp]] | Date/time when the event originated.

[options="header"]
|=====
| Field | Description | Level

// ===============================================================

| [[@timestamp]]
| Date/time when the event originated.
For log events this is the date/time when the event was generated, and not when it was read.
Required field for all events. | core | date | `2016-05-23T08:05:34.853Z`
| tags | List of keywords used to tag each event. | core | keyword | `["production", "env2"]`
| labels | Key/value pairs.
Required field for all events.

type: date

Example: `2016-05-23T08:05:34.853Z`

| core

// ===============================================================

| tags
| List of keywords used to tag each event.

type: keyword

Example:`["production", "env2"]`

| core

// ===============================================================

| labels
| Key/value pairs.
Can be used to add meta information to events. Should not contain nested objects.
All values are stored as keyword.
Example: `docker` and `k8s` labels. | core | object | `{'application': 'foo-bar', 'env': 'production'}`
| message | For log events the message field contains the log message.
In other use cases the message field can be used to concatenate different values which are then freely searchable. If multiple messages exist, they can be combined into one message. | core | text | `Hello World`
|=======================================================================

type: object

Examples: `docker` and `k8s` labels. Examples: `{'application': 'foo-bar', 'env': 'production'}`

| core

// ===============================================================

| message
| For log events the message field contains the log message.
In other use cases the message field can be used to concatenate different values
which are then freely searchable. If multiple messages exist, they can be
combined into one message.

type: text

Example: `Hello World`

| core
|=====


[[ecs-agent]]
Expand Down
11 changes: 6 additions & 5 deletions docs/fields.asciidoc
Original file line number Diff line number Diff line change
@@ -1,16 +1,17 @@
[[ecs-fields]]
== {ecs} Fields

// Add a list of field types w/ brief description so user can get a
// sense of what we're offering without having to scroll.

// Set make to generate short description for use in the Fields overview section.
// Pull in generated field content using `include` statements

[float]
[[ecs-categories]]
=== Field categories
[cols="<,<",options="header",]
|=======================================================================
| Fields | Description
| <<ecs-base,Base>> | The base set contains all fields which are on the top level.
These fields are common across all types of events.
| <<ecs-base,Base>> | Top level fields that are common across all types of events
(such as `@timestamp`, `tags`, `message`, and `labels`).
| <<ecs-agent,Agent>> | The agent fields contain data about the
agent/client/shipper that created the event.
| <<ecs-cloud,Cloud>> | Fields related to the cloud or infrastructure the events are
Expand Down
3 changes: 1 addition & 2 deletions docs/glossary.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,7 @@
== Glossary of {ecs} Terms

[[glossary-ecs]]
ECS ::

ECS::
Elastic Common Schema. A common set of document fields, field names, and their respective entity
relationships to be used in the storage of log messages and other data in
Elasticsearch.
Expand Down
14 changes: 11 additions & 3 deletions docs/guidelines.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -8,11 +8,10 @@ practices.
=== General guidelines

* The document MUST have the `@timestamp` field.
* Use the https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-types.html[data type]
* Use the {ref}/mapping-types.html[data types]
defined for an ECS field.
* Use the `ecs.version` field to define which version of ECS is used.
* Map as many fields as possible to ECS.
* TBD: Include guidelines on when people should contribute to the spec. Link to Contributing.

[float]
==== Guidelines for writing fields
Expand All @@ -28,6 +27,15 @@ practices.
* *Singular or plural.* Use singular and plural names properly to reflect the field content. For example, use `requests_per_sec` rather than `request_per_sec`.
* *General to specific.* Organise the prefixes from general to specific to allow grouping fields into objects with a prefix like `host.*`.
* *Avoid repetition.* Avoid stuttering of words. If part of the field name is already in the prefix, do not repeat it. Example: `host.host_ip` should be `host.ip`.
* *Use prefixes.* Fields must be prefixed except for the base fields. For example all `host` fields are prefixed with `host.`. See `dot` notation in FAQ for more details.
* *Use prefixes.* Fields must be prefixed except for the base fields. For example, all `host` fields are prefixed with `host.`.
See <<dot-notation>> for more details.
+
The document structure should be nested JSON objects. If you use Beats or
Logstash, the nesting of JSON objects is done for you automatically. If you're
ingesting to Elasticsearch using the API, your fields must be nested
objects, not strings containing dots.

* *Avoid abbreviations when possible*. A few exceptions like `ip` exist.



Loading

0 comments on commit 1fbb02d

Please sign in to comment.