diff --git a/rfcs/text/0023-match_only_text-data-type.md b/rfcs/text/0023-match_only_text-data-type.md new file mode 100644 index 0000000000..3b45d0dfde --- /dev/null +++ b/rfcs/text/0023-match_only_text-data-type.md @@ -0,0 +1,140 @@ +# 0023: Migrate `text` fields to `match_only_text` + + +- Stage: **0 (strawperson)** +- Date: **2021-05-11** + + + + + +Indexing `message` fields as the `text` type in security and application logs consumes significant disk space. Part of the disk space spent is on indexing to support scoring and phrase queries, which aren't often used in logging use cases. Elasticsearch 7.14 introduces a new field type called `match_only_text` which is a more space-efficient variant of the `text` field type for this logging-focused use cases. + +This RFC proposes migrating existing ECS `text` fields to `match_only_text`. Most current ECS datasets are focused heavily on logging use cases, and we can pass this disk space savings onto users by migrating `text` fields to `match_only_text` by default in ECS. Upcoming changes in Elasticsearch will default to indexing the `message` field as `match_only_text`, and this change in ECS will also align better with this new stack default. + + + +## Fields + +The following fields are currently indexed as `text` and are candidates to migrate to `match_only_text`: + +* `message` +* `error.message` + + + + + + +## Usage + +Data is indexed the same as a `text` field that has: + +* `index_options: docs` +* `norms: false` + +`match_only_text` uses the `_source` for positional queries like `match_phrase` + +The `match_only_text` type supports the same feature set as `text`, except the following: + +* No support for scoring: queries ignore index statistics and produce constant scores. +* Span queries are unsupported. If a span query is run, then shards where the field is mapped as match_only_text will be returned as failed in the search response and their hits will be ignored. +* Phrase and intervals queries run slower. + +Like `text`, `match_only_text` fields do not support aggregations. + +This new field is part of the text family, so it is returned as a text field in the `_field_caps` output. Being a member of the `text` field family means migrating fields from `text` to `match_only_text` is a non-breaking change and the fields of `text` and `match_only_text` can be queried alongside each other. + + + +## Source data + + + + + + + +## Scope of impact + + + +## Concerns + + + + + + + +## People + +The following are the people that consulted on the contents of this RFC. + +* @ebeahan | author +* @jpountz | subject matter expert + + + + +## References + + + +* https://www.elastic.co/guide/en/elasticsearch/reference/master/text.html#match-only-text-field-type +* https://github.com/elastic/elasticsearch/pull/66172 +* https://github.com/elastic/ecs/issues/1377 +* https://github.com/elastic/elasticsearch/issues/64467 +* https://github.com/elastic/elasticsearch/blob/7.x/x-pack/plugin/core/src/main/resources/data-streams-mappings.json#L14-L22 + +### RFC Pull Requests + + + +* Stage 0: https://github.com/elastic/ecs/pull/1396 + +