Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add match_only_text, a space-efficient variant of text. #72064

Merged
merged 4 commits into from
Apr 22, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion docs/reference/mapping/types.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,8 @@ values.
[[text-search-types]]
==== Text search types

<<text,`text`>>:: Analyzed, unstructured text.
<<text,`text` fields>>:: The text family, including `text` and `match_only_text`.
Analyzed, unstructured text.
{plugins}/mapper-annotated-text.html[`annotated-text`]:: Text containing special
markup. Used for identifying named entities.
<<completion-suggester,`completion`>>:: Used for auto-complete suggestions.
Expand Down
59 changes: 59 additions & 0 deletions docs/reference/mapping/types/match-only-text.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
[discrete]
[[match-only-text-field-type]]
=== Match-only text field type

A variant of <<text-field-type,`text`>> that trades scoring and efficiency of
positional queries for space efficiency. This field effectively stores data the
same way as a `text` field that only indexes documents (`index_options: docs`)
and disables norms (`norms: false`). Term queries perform as fast if not faster
as on `text` fields, however queries that need positions such as the
<<query-dsl-match-query-phrase,`match_phrase` query>> perform slower as they
need to look at the `_source` document to verify whether a phrase matches. All
queries return constant scores that are equal to 1.0.

Analysis is not configurable: text is always analyzed with the
<<specify-index-time-default-analyzer,default analyzer>>
(<<analysis-standard-analyzer,`standard`>> by default).

<<span-queries,span queries>> are not supported with this field, use
<<query-dsl-intervals-query,interval queries>> instead, or the
<<text-field-type,`text`>> field type if you absolutely need span queries.

Other than that, `match_only_text` supports the same queries as `text`. And
like `text`, it doesn't support sorting or aggregating.

[source,console]
--------------------------------
PUT logs
{
"mappings": {
"properties": {
"@timestamp": {
"type": "date"
},
"message": {
"type": "match_only_text"
}
}
}
}
--------------------------------

[discrete]
[[match-only-text-params]]
==== Parameters for match-only text fields

The following mapping parameters are accepted:

[horizontal]

<<multi-fields,`fields`>>::

Multi-fields allow the same string value to be indexed in multiple ways for
different purposes, such as one field for search and a multi-field for
sorting and aggregations, or the same string value analyzed by different
analyzers.

<<mapping-field-meta,`meta`>>::

Metadata about the field.
18 changes: 17 additions & 1 deletion docs/reference/mapping/types/text.asciidoc
Original file line number Diff line number Diff line change
@@ -1,9 +1,23 @@
[testenv="basic"]
[[text]]
=== Text field type
=== Text type family
++++
<titleabbrev>Text</titleabbrev>
++++

The text family includes the following field types:

* <<text-field-type,`text`>>, the traditional field type for full-text content
such as the body of an email or the description of a product.
* <<match-only-text-field-type,`match_only_text`>>, a space-optimized variant
of `text` that disables scoring and performs slower on queries that need
positions. It is best suited for indexing log messages.


[discrete]
[[text-field-type]]
=== Text field type

A field to index full-text values, such as the body of an email or the
description of a product. These fields are `analyzed`, that is they are passed through an
<<analysis,analyzer>> to convert the string into a list of individual terms
Expand Down Expand Up @@ -258,3 +272,5 @@ PUT my-index-000001
}
}
--------------------------------------------------

include::match-only-text.asciidoc[]
2 changes: 1 addition & 1 deletion modules/mapper-extras/build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,6 @@ esplugin {

restResources {
restApi {
include '_common', 'cluster', 'nodes', 'indices', 'index', 'search', 'get'
include '_common', 'cluster', 'field_caps', 'nodes', 'indices', 'index', 'search', 'get'
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,155 @@
/*
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
* or more contributor license agreements. Licensed under the Elastic License
* 2.0 and the Server Side Public License, v 1; you may not use this file except
* in compliance with, at your election, the Elastic License 2.0 or the Server
* Side Public License, v 1.
*/

package org.elasticsearch.index.mapper;

import org.apache.lucene.analysis.CannedTokenStream;
import org.apache.lucene.analysis.Token;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.index.DocValuesType;
import org.apache.lucene.index.IndexOptions;
import org.apache.lucene.index.IndexableField;
import org.apache.lucene.index.IndexableFieldType;
import org.elasticsearch.common.Strings;
import org.elasticsearch.common.xcontent.XContentBuilder;
import org.elasticsearch.common.xcontent.XContentFactory;
import org.elasticsearch.index.query.SearchExecutionContext;
import org.elasticsearch.plugins.Plugin;
import org.hamcrest.Matchers;

import java.io.IOException;
import java.util.Collection;
import java.util.Collections;

import static org.hamcrest.Matchers.containsString;
import static org.hamcrest.Matchers.equalTo;
import static org.hamcrest.Matchers.instanceOf;

public class MatchOnlyTextFieldMapperTests extends MapperTestCase {

@Override
protected Collection<Plugin> getPlugins() {
return Collections.singleton(new MapperExtrasPlugin());
}

@Override
protected Object getSampleValueForDocument() {
return "value";
}

public final void testExists() throws IOException {
MapperService mapperService = createMapperService(fieldMapping(b -> { minimalMapping(b); }));
assertExistsQuery(mapperService);
assertParseMinimalWarnings();
}

@Override
protected void registerParameters(ParameterChecker checker) throws IOException {
checker.registerUpdateCheck(b -> {
b.field("meta", Collections.singletonMap("format", "mysql.access"));
}, m -> assertEquals(Collections.singletonMap("format", "mysql.access"), m.fieldType().meta()));
}

@Override
protected void minimalMapping(XContentBuilder b) throws IOException {
b.field("type", "match_only_text");
}

public void testDefaults() throws IOException {
DocumentMapper mapper = createDocumentMapper(fieldMapping(this::minimalMapping));
assertEquals(Strings.toString(fieldMapping(this::minimalMapping)), mapper.mappingSource().toString());

ParsedDocument doc = mapper.parse(source(b -> b.field("field", "1234")));
IndexableField[] fields = doc.rootDoc().getFields("field");
assertEquals(1, fields.length);
assertEquals("1234", fields[0].stringValue());
IndexableFieldType fieldType = fields[0].fieldType();
assertThat(fieldType.omitNorms(), equalTo(true));
assertTrue(fieldType.tokenized());
assertFalse(fieldType.stored());
assertThat(fieldType.indexOptions(), equalTo(IndexOptions.DOCS));
assertThat(fieldType.storeTermVectors(), equalTo(false));
assertThat(fieldType.storeTermVectorOffsets(), equalTo(false));
assertThat(fieldType.storeTermVectorPositions(), equalTo(false));
assertThat(fieldType.storeTermVectorPayloads(), equalTo(false));
assertEquals(DocValuesType.NONE, fieldType.docValuesType());
}

public void testNullConfigValuesFail() throws MapperParsingException {
Exception e = expectThrows(
MapperParsingException.class,
() -> createDocumentMapper(fieldMapping(b -> b.field("type", "match_only_text").field("meta", (String) null)))
);
assertThat(e.getMessage(), containsString("[meta] on mapper [field] of type [match_only_text] must not have a [null] value"));
}

public void testSimpleMerge() throws IOException {
XContentBuilder startingMapping = fieldMapping(b -> b.field("type", "match_only_text"));
MapperService mapperService = createMapperService(startingMapping);
assertThat(mapperService.documentMapper().mappers().getMapper("field"), instanceOf(MatchOnlyTextFieldMapper.class));

merge(mapperService, startingMapping);
assertThat(mapperService.documentMapper().mappers().getMapper("field"), instanceOf(MatchOnlyTextFieldMapper.class));

XContentBuilder newField = mapping(b -> {
b.startObject("field")
.field("type", "match_only_text")
.startObject("meta")
.field("key", "value")
.endObject()
.endObject();
b.startObject("other_field").field("type", "keyword").endObject();
});
merge(mapperService, newField);
assertThat(mapperService.documentMapper().mappers().getMapper("field"), instanceOf(MatchOnlyTextFieldMapper.class));
assertThat(mapperService.documentMapper().mappers().getMapper("other_field"), instanceOf(KeywordFieldMapper.class));
}

public void testDisabledSource() throws IOException {
XContentBuilder mapping = XContentFactory.jsonBuilder().startObject().startObject("_doc");
{
mapping.startObject("properties");
{
mapping.startObject("foo");
{
mapping.field("type", "match_only_text");
}
mapping.endObject();
}
mapping.endObject();

mapping.startObject("_source");
{
mapping.field("enabled", false);
}
mapping.endObject();
}
mapping.endObject().endObject();

MapperService mapperService = createMapperService(mapping);
MappedFieldType ft = mapperService.fieldType("foo");
SearchExecutionContext context = createSearchExecutionContext(mapperService);
TokenStream ts = new CannedTokenStream(new Token("a", 0, 3), new Token("b", 4, 7));
IllegalArgumentException e = expectThrows(IllegalArgumentException.class, () -> ft.phraseQuery(ts, 0, true, context));
assertThat(e.getMessage(), Matchers.containsString("cannot run positional queries since [_source] is disabled"));

// Term queries are ok
ft.termQuery("a", context); // no exception
}

@Override
protected Object generateRandomInputValue(MappedFieldType ft) {
assumeFalse("We don't have a way to assert things here", true);
return null;
}

@Override
protected void randomFetchTestFieldConfig(XContentBuilder b) throws IOException {
assumeFalse("We don't have a way to assert things here", true);
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ public Map<String, Mapper.TypeParser> getMappers() {
mappers.put(RankFeatureFieldMapper.CONTENT_TYPE, RankFeatureFieldMapper.PARSER);
mappers.put(RankFeaturesFieldMapper.CONTENT_TYPE, RankFeaturesFieldMapper.PARSER);
mappers.put(SearchAsYouTypeFieldMapper.CONTENT_TYPE, SearchAsYouTypeFieldMapper.PARSER);
mappers.put(MatchOnlyTextFieldMapper.CONTENT_TYPE, MatchOnlyTextFieldMapper.PARSER);
return Collections.unmodifiableMap(mappers);
}

Expand Down
Loading