-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deprecate sorting in reindex #49458
Deprecate sorting in reindex #49458
Changes from 6 commits
688f2d0
dc8a875
d47eb08
5c49c63
459b66c
fcc5a73
aa2a7e0
6993834
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -478,6 +478,10 @@ which defaults to a maximum size of 100 MB. | |||||
`sort`::: | ||||||
(Optional, list) A comma-separated list of `<field>:<direction>` pairs to sort by before indexing. | ||||||
Use in conjunction with `max_docs` to control what documents are reindexed. | ||||||
deprecated[7.6, sort in reindex is deprecated] *NOTE: Sorting in reindex is deprecated.* | ||||||
Sorting in reindex was never guaranteed to index | ||||||
documents in order and prevents future evolution of reindex such as resilience and performance | ||||||
improvements. If used in combination with `max_docs`, consider using a query filter instead. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I feel like the I'd place this content into a block
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I tried this now, but I get an error saying:
which is why I originally went for the simpler solution. My asciidoc skills are not super-advanced, so maybe there is a trick to make this work? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was able to get this working on my end without that error. Here's the commit on my fork if that helps: Feel free to cherry-pick it or copy/paste the contents. There are a few tricks:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What you have is workable though. If you're not able to get the streamlined deprecated note working, I wouldn't hold up this PR. We can reformat it as a separate issue. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||||||
|
||||||
`_source`::: | ||||||
(Optional, string) If `true` reindexes all source fields. | ||||||
|
@@ -602,8 +606,8 @@ POST _reindex | |||||
-------------------------------------------------- | ||||||
// TEST[setup:twitter] | ||||||
|
||||||
[[docs-reindex-select-sort]] | ||||||
===== Reindex select documents with sort | ||||||
[[docs-reindex-select-max-docs]] | ||||||
===== Reindex select documents with `max_docs` | ||||||
|
||||||
You can limit the number of processed documents by setting `max_docs`. | ||||||
For example, this request copies a single document from `twitter` to | ||||||
|
@@ -624,28 +628,6 @@ POST _reindex | |||||
-------------------------------------------------- | ||||||
// TEST[setup:twitter] | ||||||
|
||||||
You can use `sort` in conjunction with `max_docs` to select the documents you want to reindex. | ||||||
Sorting makes the scroll less efficient but in some contexts it's worth it. | ||||||
If possible, it's better to use a more selective query instead of `max_docs` and `sort`. | ||||||
|
||||||
For example, following request copies 10000 documents from `twitter` into `new_twitter`: | ||||||
|
||||||
[source,console] | ||||||
-------------------------------------------------- | ||||||
POST _reindex | ||||||
{ | ||||||
"max_docs": 10000, | ||||||
"source": { | ||||||
"index": "twitter", | ||||||
"sort": { "date": "desc" } | ||||||
}, | ||||||
"dest": { | ||||||
"index": "new_twitter" | ||||||
} | ||||||
} | ||||||
-------------------------------------------------- | ||||||
// TEST[setup:twitter] | ||||||
|
||||||
[[docs-reindex-multiple-indices]] | ||||||
===== Reindex from multiple indices | ||||||
|
||||||
|
@@ -824,11 +806,10 @@ POST _reindex | |||||
"index": "twitter", | ||||||
"query": { | ||||||
"function_score" : { | ||||||
"query" : { "match_all": {} }, | ||||||
"random_score" : {} | ||||||
"random_score" : {}, | ||||||
"min_score" : 0.9 <1> | ||||||
} | ||||||
}, | ||||||
"sort": "_score" <1> | ||||||
} | ||||||
}, | ||||||
"dest": { | ||||||
"index": "random_twitter" | ||||||
|
@@ -837,8 +818,8 @@ POST _reindex | |||||
---------------------------------------------------------------- | ||||||
// TEST[setup:big_twitter] | ||||||
|
||||||
<1> `_reindex` defaults to sorting by `_doc` so `random_score` will not have any | ||||||
effect unless you override the sort to `_score`. | ||||||
<1> you may need to adjust the `min_score` depending on the relative amount of | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
data extracted from source. | ||||||
|
||||||
[[reindex-scripts]] | ||||||
===== Modify documents during reindexing | ||||||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,60 @@ | ||
/* | ||
* Licensed to Elasticsearch under one or more contributor | ||
* license agreements. See the NOTICE file distributed with | ||
* this work for additional information regarding copyright | ||
* ownership. Elasticsearch licenses this file to you under | ||
* the Apache License, Version 2.0 (the "License"); you may | ||
* not use this file except in compliance with the License. | ||
* You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, | ||
* software distributed under the License is distributed on an | ||
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
* KIND, either express or implied. See the License for the | ||
* specific language governing permissions and limitations | ||
* under the License. | ||
*/ | ||
|
||
package org.elasticsearch.index.reindex; | ||
|
||
import org.elasticsearch.index.query.RangeQueryBuilder; | ||
import org.elasticsearch.plugins.Plugin; | ||
import org.elasticsearch.search.sort.SortOrder; | ||
import org.elasticsearch.test.ESSingleNodeTestCase; | ||
|
||
import java.util.Arrays; | ||
import java.util.Collection; | ||
|
||
import static org.elasticsearch.index.reindex.ReindexTestCase.matcher; | ||
import static org.elasticsearch.test.hamcrest.ElasticsearchAssertions.assertHitCount; | ||
|
||
public class ReindexSingleNodeTests extends ESSingleNodeTestCase { | ||
@Override | ||
protected Collection<Class<? extends Plugin>> getPlugins() { | ||
return Arrays.asList(ReindexPlugin.class); | ||
} | ||
|
||
public void testDeprecatedSort() { | ||
int max = between(2, 20); | ||
for (int i = 0; i < max; i++) { | ||
client().prepareIndex("source").setId(Integer.toString(i)).setSource("foo", i).get(); | ||
} | ||
|
||
client().admin().indices().prepareRefresh("source").get(); | ||
assertHitCount(client().prepareSearch("source").setSize(0).get(), max); | ||
|
||
// Copy a subset of the docs sorted | ||
int subsetSize = randomIntBetween(1, max - 1); | ||
ReindexRequestBuilder copy = new ReindexRequestBuilder(client(), ReindexAction.INSTANCE) | ||
.source("source").destination("dest").refresh(true); | ||
copy.maxDocs(subsetSize); | ||
copy.request().addSortField("foo", SortOrder.DESC); | ||
assertThat(copy.get(), matcher().created(subsetSize)); | ||
|
||
assertHitCount(client().prepareSearch("dest").setSize(0).get(), subsetSize); | ||
assertHitCount(client().prepareSearch("dest").setQuery(new RangeQueryBuilder("foo").gte(0).lt(max-subsetSize)).get(), 0); | ||
assertWarnings(Reindexer.SORT_DEPRECATED_MESSAGE); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd go with "further" rather than "future"