Fix `ShardSplittingQuery` to respect nested documents. #27398

s1monw · 2017-11-15T15:51:16Z

Today if nested docs are used in an index that is split the operation
will only work correctly if the index is not routing partitioned or
unless routing is used. This change fixes the query that selectes the docs
to delete to also select all parents nested docs as well.

Closes #27378

Today if nested docs are used in an index that is split the operation will only work correctly if the index is not routing partitioned or unless routing is used. This change fixes the query that selectes the docs to delete to also select all parents nested docs as well. Closes elastic#27378

s1monw · 2017-11-16T08:01:40Z

@elasticmachine test this please

jimczi

I left some minor comments regarding naming and caching, otherwise LGTM

jimczi · 2017-11-16T08:02:17Z

core/src/main/java/org/elasticsearch/index/shard/ShardSplittingQuery.java

        };
    }

+    private void markChildDocs(BitSet parentDocs, BitSet deletedDocs) {


Why deletedDocs ? It's the opposite, no ? The bitset of the matching parent+children, allDocs ?

naming issue, I will fix. In theory it will hold all docs that need to be deleted by the IndexWriter.

jimczi · 2017-11-16T08:06:43Z

core/src/main/java/org/elasticsearch/index/shard/ShardSplittingQuery.java

+    /*
+     * this is used internally to obtain a bitset for parent documents. We don't cache this since we never access the same reader more
+     * than once
+     */


We warm this query per segment in BitsetFilterCache#BitSetProducerWarmer so why not using the per-segment cache directly ?

we use this only as a delete by query which is executed on a recovery-private index writer. There is no point in cacheing it and it won't have a cache hit either.

I left a comment

Ok I missed the recovery-private thing. Thanks

martijnvg

This looks good @s1monw! I left a few questions and small remarks.

martijnvg · 2017-11-16T08:08:25Z

core/src/test/java/org/elasticsearch/index/shard/ShardSplittingQueryTests.java

@@ -83,53 +104,91 @@ public void testSplitOnRouting() throws IOException {
            .numberOfShards(numShards)
            .setRoutingNumShards(numShards * 1000000)
            .numberOfReplicas(0).build();
+        boolean hasNested = true || randomBoolean();


s/true || randomBoolean();/randomBoolean();

oh yeah 🗡

martijnvg · 2017-11-16T08:08:46Z

core/src/test/java/org/elasticsearch/index/shard/ShardSplittingQueryTests.java

        IndexMetaData metaData = IndexMetaData.builder("test")
            .settings(Settings.builder().put(IndexMetaData.SETTING_VERSION_CREATED, Version.CURRENT))
            .numberOfShards(numShards)
            .setRoutingNumShards(numShards * 1000000)
            .numberOfReplicas(0).build();
+        boolean hasNested = true;randomBoolean();


martijnvg · 2017-11-16T08:13:46Z

core/src/main/java/org/elasticsearch/index/shard/ShardSplittingQuery.java

                    assert indexMetaData.isRoutingPartitionedIndex() == false;
                    findSplitDocs(IdFieldMapper.NAME, includeInShard, leafReader, bitSet::set);
                } else {
+                    final BitSet parentBitSet;
+                    final IntPredicate includeDoc;


So just to double check, the includeDoc is to ensure that only root docs get selected and later we select the nested docs of the selected root docs in markChildDocs(...), right?

martijnvg · 2017-11-16T08:17:58Z

core/src/main/java/org/elasticsearch/index/shard/ShardSplittingQuery.java

+     * this is used internally to obtain a bitset for parent documents. We don't cache this since we never access the same reader more
+     * than once
+     */
+    private static BitSetProducer newParentDocBitSetProducer() {


martijnvg · 2017-11-16T08:28:42Z

core/src/main/java/org/elasticsearch/index/shard/ShardSplittingQuery.java

+        }
+
+        @Override
+        public boolean matches() throws IOException {


I was trying to understand why this works, because of the forward iteration here (with nested, we usually seek backwards (BitSet.prevSetBit(...))). So this works because all live doc ids (root docs and nested docs) are evaluated in order.

correct, I will leave a comment

s1monw · 2017-11-16T08:53:18Z

@jimczi @martijnvg I pushed new commits if you wanna take another look

martijnvg

LGTM

Today if nested docs are used in an index that is split the operation will only work correctly if the index is not routing partitioned or unless routing is used. This change fixes the query that selectes the docs to delete to also select all parents nested docs as well. Closes #27378

* master: Stop skipping REST test after backport of #27056 Fix default value of ignore_unavailable for snapshot REST API (#27056) Add composite aggregator (#26800) Fix `ShardSplittingQuery` to respect nested documents. (#27398) [Docs] Restore section about multi-level parent/child relation in parent-join (#27392) Add TcpChannel to unify Transport implementations (#27132) Add note on plugin distributions in plugins folder Remove implementations of `TransportChannel` (#27388) Update Google SDK to version 1.23 (#27381) Fix Gradle 4.3.1 compatibility for logging (#27382) [Test] Change Elasticsearch startup timeout to 120s in packaging tests Docs/windows installer (#27369)

s1monw added :Core/Infra/Core Core issues without another label >bug v6.1.0 v7.0.0 labels Nov 15, 2017

s1monw requested review from martijnvg and jimczi November 15, 2017 15:51

Merge branch 'master' into fix_split_on_nested

fae122c

jimczi approved these changes Nov 16, 2017

View reviewed changes

martijnvg reviewed Nov 16, 2017

View reviewed changes

s1monw added 2 commits November 16, 2017 09:48

apply review comments and simplify code

7aafdd6

add more comments

cfa6cc2

martijnvg approved these changes Nov 16, 2017

View reviewed changes

jimczi approved these changes Nov 16, 2017

View reviewed changes

s1monw merged commit 303e0c0 into elastic:master Nov 16, 2017

s1monw deleted the fix_split_on_nested branch November 16, 2017 10:35

s1monw added backport pending blocker labels Nov 16, 2017

s1monw removed the backport pending label Nov 16, 2017

jimczi added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix `ShardSplittingQuery` to respect nested documents. #27398

Fix `ShardSplittingQuery` to respect nested documents. #27398

s1monw commented Nov 15, 2017 •

edited

Loading

s1monw commented Nov 16, 2017

jimczi left a comment

jimczi Nov 16, 2017

s1monw Nov 16, 2017

jimczi Nov 16, 2017

s1monw Nov 16, 2017

s1monw Nov 16, 2017

jimczi Nov 16, 2017

martijnvg left a comment

martijnvg Nov 16, 2017

s1monw Nov 16, 2017

martijnvg Nov 16, 2017

martijnvg Nov 16, 2017

s1monw Nov 16, 2017

martijnvg Nov 16, 2017

martijnvg Nov 16, 2017

s1monw Nov 16, 2017

s1monw commented Nov 16, 2017

martijnvg left a comment

Fix ShardSplittingQuery to respect nested documents. #27398

Fix ShardSplittingQuery to respect nested documents. #27398

Conversation

s1monw commented Nov 15, 2017 • edited Loading

s1monw commented Nov 16, 2017

jimczi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

martijnvg left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

s1monw commented Nov 16, 2017

martijnvg left a comment

Choose a reason for hiding this comment

Fix `ShardSplittingQuery` to respect nested documents. #27398

Fix `ShardSplittingQuery` to respect nested documents. #27398

s1monw commented Nov 15, 2017 •

edited

Loading