Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Concurrent Segment Search] CardinalityIT/NestedIT test failures with concurrent search and AssertingCodec enabled #8095

Closed
sohami opened this issue Jun 16, 2023 · 0 comments · Fixed by #8303
Assignees
Labels
distributed framework :test Adding or fixing a test

Comments

@sohami
Copy link
Collaborator

sohami commented Jun 16, 2023

Describe the bug
With concurrent search enabled below tests sometimes fails when AssertingCodec gets used as it verifies that the thread accessing the codecs remain same throughout the search execution. With concurrent mode, the postCollection on Aggregation which happens in reduce phase can access the docValues. These access happens on the parent search thread and not on index_searcher thread causing the failure.

Test Failures:

Cardinality/Nested

 - org.opensearch.search.aggregations.metrics.CardinalityWithRequestBreakerIT.testRequestBreaker
 - org.opensearch.search.aggregations.metrics.CardinalityWithRequestBreakerIT.classMethod
`
REPRODUCE WITH: ./gradlew ':server:internalClusterTest' --tests "org.opensearch.search.aggregations.metrics.CardinalityWithRequestBreakerIT" -Dtests.seed=5A2A89155E5AADBB -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=en-US -Dtests.timezone=UTC -Druntime.java=20
`
- org.opensearch.search.aggregations.metrics.CardinalityIT.testMultiValuedString
`REPRODUCE WITH: ./gradlew ':server:internalClusterTest' --tests "org.opensearch.search.aggregations.metrics.CardinalityIT.testMultiValuedString" -Dtests.seed=5A2A89155E5AADBB -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=es-PE -Dtests.timezone=Asia/Qyzylorda -Druntime.java=20`

- org.opensearch.search.aggregations.bucket.NestedIT.testNestedAsSubAggregation
`REPRODUCE WITH: ./gradlew ':server:internalClusterTest' --tests "org.opensearch.search.aggregations.bucket.NestedIT.testNestedAsSubAggregation" -Dtests.seed=5A2A89155E5AADBB -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=mk-MK -Dtests.timezone=Europe/Minsk -Druntime.java=20`

Sample Stack Trace:

2> REPRODUCE WITH: ./gradlew ':server:internalClusterTest' --tests "org.opensearch.search.aggregations.metrics.CardinalityWithRequestBreakerIT" -Dtests.seed=5A2A89155E5AADBB -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=en-US -Dtests.timezone=UTC -Druntime.java=20
  2> com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=79, name=opensearch[node_s1][search][T#4], state=RUNNABLE, group=TGRP-CardinalityWithRequestBreakerIT]

        Caused by:
        java.lang.AssertionError: Sorted doc values are only supposed to be consumed in the thread in which they have been acquired. But was acquired in Thread[#86,opensearch[node_s1][index_searcher][T#4],5,TGRP-CardinalityWithRequestBreakerIT] and consumed in Thread[#79,opensearch[node_s1][search][T#4],5,TGRP-CardinalityWithRequestBreakerIT].
            at __randomizedtesting.SeedInfo.seed([5A2A89155E5AADBB]:0)
            at org.apache.lucene.tests.index.AssertingLeafReader.assertThread(AssertingLeafReader.java:67)
            at org.apache.lucene.tests.index.AssertingLeafReader$AssertingSortedDocValues.lookupOrd(AssertingLeafReader.java:908)
            at org.apache.lucene.index.SingletonSortedSetDocValues.lookupOrd(SingletonSortedSetDocValues.java:95)
            at org.opensearch.search.aggregations.metrics.CardinalityAggregator$OrdinalsCollector.postCollect(CardinalityAggregator.java:338)
            at org.opensearch.search.aggregations.metrics.CardinalityAggregator.postCollectLastCollector(CardinalityAggregator.java:155)
            at org.opensearch.search.aggregations.metrics.CardinalityAggregator.doPostCollection(CardinalityAggregator.java:165)
            at org.opensearch.search.aggregations.AggregatorBase.postCollection(AggregatorBase.java:278)
            at org.opensearch.search.aggregations.AggregatorBase.postCollection(AggregatorBase.java:279)
            at org.opensearch.search.aggregations.AggregationCollectorManager.reduce(AggregationCollectorManager.java:80)
            at org.opensearch.search.aggregations.NonGlobalAggCollectorManager.reduce(NonGlobalAggCollectorManager.java:22)
            at org.opensearch.search.aggregations.AggregationCollectorManager.reduce(AggregationCollectorManager.java:31)
            at org.apache.lucene.search.MultiCollectorManager.reduce(MultiCollectorManager.java:73)
            at org.opensearch.search.query.QueryCollectorManagerContext$QueryCollectorManager.reduce(QueryCollectorManagerContext.java:41)
            at org.opensearch.search.query.QueryCollectorManagerContext$QueryCollectorManager.reduce(QueryCollectorManagerContext.java:27)
            at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:733)
            at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:688)
            at org.opensearch.search.query.ConcurrentQueryPhaseSearcher.searchWithCollectorManager(ConcurrentQueryPhaseSearcher.java:81)
            at org.opensearch.search.query.ConcurrentQueryPhaseSearcher.searchWithCollector(ConcurrentQueryPhaseSearcher.java:51)
            at org.opensearch.search.query.QueryPhase$DefaultQueryPhaseSearcher.searchWith(QueryPhase.java:423)
            at org.opensearch.search.query.QueryPhaseSearcherWrapper.searchWith(QueryPhaseSearcherWrapper.java:62)
            at org.opensearch.search.query.QueryPhase.executeInternal(QueryPhase.java:278)
            at org.opensearch.search.query.QueryPhase.execute(QueryPhase.java:151)
            at org.opensearch.search.SearchService.lambda$executeQueryPhase$1(SearchService.java:681)
            at org.opensearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:73)
            at org.opensearch.action.ActionRunnable$2.doRun(ActionRunnable.java:88)
            at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
            at org.opensearch.threadpool.TaskAwareRunnable.doRun(TaskAwareRunnable.java:78)
            at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
            at org.opensearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:59)
            at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:806)
            at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
            at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
            at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)

To Reproduce
Steps to reproduce the behavior:

  1. Enable Concurrent Search Feature Flag in OpenSearchIntegTestCase and override the slices method in ContextIndexSearcher to compute slices with docCount and segmentCount as 1 to force multiple slices.
  2. Also use the system property -Dtests.codec=Asserting to use the AssetingCodec during execution

Expected behavior
Ideally, in the concurrent model even though the access is happening across multiple threads (index_searcher and search threadpool) these accesses will happen one after another and not concurrently. But the way AssertingCodec validates is based on which thread created the codec (that means during leafCollector creation) and if any access (sequentially or concurrently) happens by other thread it fails. Ref for AssertingCodec

Plugins
N/A

Screenshots
N/A

@sohami sohami changed the title [BUG] CardinalityIT/NestedIT test failures with concurrent search and AssertingCodec enabled [Concurrent Segment Search] CardinalityIT/NestedIT test failures with concurrent search and AssertingCodec enabled Jun 16, 2023
@sohami sohami self-assigned this Jun 16, 2023
@sohami sohami added :test Adding or fixing a test distributed framework and removed untriaged labels Jun 16, 2023
@sohami sohami moved this from Todo to In Progress in Concurrent Search Jul 11, 2023
@github-project-automation github-project-automation bot moved this from In Progress to Done in Concurrent Search Jul 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
distributed framework :test Adding or fixing a test
Projects
Status: Done
1 participant