Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add match_only_text, a space-efficient variant of text. #66172

Merged
merged 28 commits into from
Apr 22, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
0bfd387
Add `match_only_text`, a space-efficient variant of `text`.
jpountz Dec 7, 2020
6b0cb21
iter
jpountz Dec 10, 2020
7525e4f
Merge branch 'master' into feature/source_phrase_queries
jpountz Dec 16, 2020
e57699e
Use source lookup from the shard context.
jpountz Dec 16, 2020
9ec31c6
Update release version.
jpountz Dec 16, 2020
7a03a0f
Consolidate docs with `text`.
jpountz Dec 16, 2020
5774bc9
Fail phrase queries when _source is disabled.
jpountz Dec 17, 2020
c0be502
Remove support for `store`.
jpountz Dec 17, 2020
feaf2f8
Add tests for span and intervals queries.
jpountz Dec 17, 2020
d51db6c
Test for fuzzy query.
jpountz Dec 17, 2020
71adb75
More tests.
jpountz Dec 17, 2020
4f33106
Merge branch 'master' into feature/source_phrase_queries
jpountz Feb 1, 2021
24b345e
Fix compilation.
jpountz Feb 1, 2021
34743ef
iter
jpountz Feb 9, 2021
efdb3ba
Merge branch 'master' into feature/source_phrase_queries
jpountz Mar 30, 2021
7114fdc
iter
jpountz Mar 30, 2021
2030545
iter
jpountz Apr 1, 2021
96f668b
Merge branch 'master' into feature/source_phrase_queries
jpountz Apr 1, 2021
3a85af4
iter
jpountz Apr 1, 2021
448eb28
iter
jpountz Apr 1, 2021
f3e77f8
Fix compilation.
jpountz Apr 1, 2021
c5f4f04
Analysis is no longer configurable.
jpountz Apr 2, 2021
4818edc
iter
jpountz Apr 7, 2021
339c8dc
Merge branch 'master' into feature/source_phrase_queries
jpountz Apr 21, 2021
e652aa4
Intervals unit tests.
jpountz Apr 21, 2021
31a5bba
Fix docs now that `match_only_text` supports all interval queries.
jpountz Apr 21, 2021
3783f18
Undo testing hack.
jpountz Apr 21, 2021
edaa5b0
Merge branch 'master' into feature/source_phrase_queries
jpountz Apr 21, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion docs/reference/mapping/types.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,8 @@ values.
[[text-search-types]]
==== Text search types

<<text,`text`>>:: Analyzed, unstructured text.
<<text,`text` fields>>:: The text family, including `text` and `match_only_text`.
Analyzed, unstructured text.
{plugins}/mapper-annotated-text.html[`annotated-text`]:: Text containing special
markup. Used for identifying named entities.
<<completion-suggester,`completion`>>:: Used for auto-complete suggestions.
Expand Down
59 changes: 59 additions & 0 deletions docs/reference/mapping/types/match-only-text.asciidoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
[discrete]
[[match-only-text-field-type]]
=== Match-only text field type

A variant of <<text-field-type,`text`>> that trades scoring and efficiency of
positional queries for space efficiency. This field effectively stores data the
same way as a `text` field that only indexes documents (`index_options: docs`)
and disables norms (`norms: false`). Term queries perform as fast if not faster
as on `text` fields, however queries that need positions such as the
<<query-dsl-match-query-phrase,`match_phrase` query>> perform slower as they
need to look at the `_source` document to verify whether a phrase matches. All
queries return constant scores that are equal to 1.0.

Analysis is not configurable: text is always analyzed with the
<<specify-index-time-default-analyzer,default analyzer>>
(<<analysis-standard-analyzer,`standard`>> by default).

<<span-queries,span queries>> are not supported with this field, use
<<query-dsl-intervals-query,interval queries>> instead, or the
<<text-field-type,`text`>> field type if you absolutely need span queries.

Other than that, `match_only_text` supports the same queries as `text`. And
like `text`, it doesn't support sorting or aggregating.

[source,console]
--------------------------------
PUT logs
{
"mappings": {
"properties": {
"@timestamp": {
"type": "date"
},
"message": {
"type": "match_only_text"
}
}
}
}
--------------------------------

[discrete]
[[match-only-text-params]]
==== Parameters for match-only text fields

The following mapping parameters are accepted:

[horizontal]

<<multi-fields,`fields`>>::

Multi-fields allow the same string value to be indexed in multiple ways for
different purposes, such as one field for search and a multi-field for
sorting and aggregations, or the same string value analyzed by different
analyzers.

<<mapping-field-meta,`meta`>>::

Metadata about the field.
18 changes: 17 additions & 1 deletion docs/reference/mapping/types/text.asciidoc
Original file line number Diff line number Diff line change
@@ -1,9 +1,23 @@
[testenv="basic"]
[[text]]
=== Text field type
=== Text type family
++++
<titleabbrev>Text</titleabbrev>
++++

The text family includes the following field types:

* <<text-field-type,`text`>>, the traditional field type for full-text content
such as the body of an email or the description of a product.
* <<match-only-text-field-type,`match_only_text`>>, a space-optimized variant
of `text` that disables scoring and performs slower on queries that need
positions. It is best suited for indexing log messages.


[discrete]
[[text-field-type]]
=== Text field type

A field to index full-text values, such as the body of an email or the
description of a product. These fields are `analyzed`, that is they are passed through an
<<analysis,analyzer>> to convert the string into a list of individual terms
Expand Down Expand Up @@ -253,3 +267,5 @@ PUT my-index-000001
}
}
--------------------------------------------------

include::match-only-text.asciidoc[]
2 changes: 1 addition & 1 deletion modules/mapper-extras/build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,6 @@ esplugin {

restResources {
restApi {
include '_common', 'cluster', 'nodes', 'indices', 'index', 'search', 'get'
include '_common', 'cluster', 'field_caps', 'nodes', 'indices', 'index', 'search', 'get'
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,156 @@
/*
* Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
* or more contributor license agreements. Licensed under the Elastic License
* 2.0 and the Server Side Public License, v 1; you may not use this file except
* in compliance with, at your election, the Elastic License 2.0 or the Server
* Side Public License, v 1.
*/

package org.elasticsearch.index.mapper;

import org.apache.lucene.analysis.CannedTokenStream;
import org.apache.lucene.analysis.Token;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.index.DocValuesType;
import org.apache.lucene.index.IndexOptions;
import org.apache.lucene.index.IndexableField;
import org.apache.lucene.index.IndexableFieldType;
import org.elasticsearch.common.Strings;
import org.elasticsearch.common.xcontent.XContentBuilder;
import org.elasticsearch.common.xcontent.XContentFactory;
import org.elasticsearch.index.query.SearchExecutionContext;
import org.elasticsearch.plugins.Plugin;
import org.hamcrest.Matchers;

import java.io.IOException;
import java.util.Collection;
import java.util.Collections;
import java.util.List;

import static org.hamcrest.Matchers.containsString;
import static org.hamcrest.Matchers.equalTo;
import static org.hamcrest.Matchers.instanceOf;

public class MatchOnlyTextFieldMapperTests extends MapperTestCase {

@Override
protected Collection<Plugin> getPlugins() {
return List.of(new MapperExtrasPlugin());
}

@Override
protected Object getSampleValueForDocument() {
return "value";
}

public final void testExists() throws IOException {
MapperService mapperService = createMapperService(fieldMapping(b -> { minimalMapping(b); }));
assertExistsQuery(mapperService);
assertParseMinimalWarnings();
}

@Override
protected void registerParameters(ParameterChecker checker) throws IOException {
checker.registerUpdateCheck(b -> {
b.field("meta", Collections.singletonMap("format", "mysql.access"));
}, m -> assertEquals(Collections.singletonMap("format", "mysql.access"), m.fieldType().meta()));
}

@Override
protected void minimalMapping(XContentBuilder b) throws IOException {
b.field("type", "match_only_text");
}

public void testDefaults() throws IOException {
DocumentMapper mapper = createDocumentMapper(fieldMapping(this::minimalMapping));
assertEquals(Strings.toString(fieldMapping(this::minimalMapping)), mapper.mappingSource().toString());

ParsedDocument doc = mapper.parse(source(b -> b.field("field", "1234")));
IndexableField[] fields = doc.rootDoc().getFields("field");
assertEquals(1, fields.length);
assertEquals("1234", fields[0].stringValue());
IndexableFieldType fieldType = fields[0].fieldType();
assertThat(fieldType.omitNorms(), equalTo(true));
assertTrue(fieldType.tokenized());
assertFalse(fieldType.stored());
assertThat(fieldType.indexOptions(), equalTo(IndexOptions.DOCS));
assertThat(fieldType.storeTermVectors(), equalTo(false));
assertThat(fieldType.storeTermVectorOffsets(), equalTo(false));
assertThat(fieldType.storeTermVectorPositions(), equalTo(false));
assertThat(fieldType.storeTermVectorPayloads(), equalTo(false));
assertEquals(DocValuesType.NONE, fieldType.docValuesType());
}

public void testNullConfigValuesFail() throws MapperParsingException {
Exception e = expectThrows(
MapperParsingException.class,
() -> createDocumentMapper(fieldMapping(b -> b.field("type", "match_only_text").field("meta", (String) null)))
);
assertThat(e.getMessage(), containsString("[meta] on mapper [field] of type [match_only_text] must not have a [null] value"));
}

public void testSimpleMerge() throws IOException {
XContentBuilder startingMapping = fieldMapping(b -> b.field("type", "match_only_text"));
MapperService mapperService = createMapperService(startingMapping);
assertThat(mapperService.documentMapper().mappers().getMapper("field"), instanceOf(MatchOnlyTextFieldMapper.class));

merge(mapperService, startingMapping);
assertThat(mapperService.documentMapper().mappers().getMapper("field"), instanceOf(MatchOnlyTextFieldMapper.class));

XContentBuilder newField = mapping(b -> {
b.startObject("field")
.field("type", "match_only_text")
.startObject("meta")
.field("key", "value")
.endObject()
.endObject();
b.startObject("other_field").field("type", "keyword").endObject();
});
merge(mapperService, newField);
assertThat(mapperService.documentMapper().mappers().getMapper("field"), instanceOf(MatchOnlyTextFieldMapper.class));
assertThat(mapperService.documentMapper().mappers().getMapper("other_field"), instanceOf(KeywordFieldMapper.class));
}

public void testDisabledSource() throws IOException {
XContentBuilder mapping = XContentFactory.jsonBuilder().startObject().startObject("_doc");
{
mapping.startObject("properties");
{
mapping.startObject("foo");
{
mapping.field("type", "match_only_text");
}
mapping.endObject();
}
mapping.endObject();

mapping.startObject("_source");
{
mapping.field("enabled", false);
}
mapping.endObject();
}
mapping.endObject().endObject();

MapperService mapperService = createMapperService(mapping);
MappedFieldType ft = mapperService.fieldType("foo");
SearchExecutionContext context = createSearchExecutionContext(mapperService);
TokenStream ts = new CannedTokenStream(new Token("a", 0, 3), new Token("b", 4, 7));
IllegalArgumentException e = expectThrows(IllegalArgumentException.class, () -> ft.phraseQuery(ts, 0, true, context));
assertThat(e.getMessage(), Matchers.containsString("cannot run positional queries since [_source] is disabled"));

// Term queries are ok
ft.termQuery("a", context); // no exception
}

@Override
protected Object generateRandomInputValue(MappedFieldType ft) {
assumeFalse("We don't have a way to assert things here", true);
return null;
}

@Override
protected void randomFetchTestFieldConfig(XContentBuilder b) throws IOException {
assumeFalse("We don't have a way to assert things here", true);
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ public Map<String, Mapper.TypeParser> getMappers() {
mappers.put(RankFeatureFieldMapper.CONTENT_TYPE, RankFeatureFieldMapper.PARSER);
mappers.put(RankFeaturesFieldMapper.CONTENT_TYPE, RankFeaturesFieldMapper.PARSER);
mappers.put(SearchAsYouTypeFieldMapper.CONTENT_TYPE, SearchAsYouTypeFieldMapper.PARSER);
mappers.put(MatchOnlyTextFieldMapper.CONTENT_TYPE, MatchOnlyTextFieldMapper.PARSER);
return Collections.unmodifiableMap(mappers);
}

Expand Down
Loading