Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix concurrent modification exceptions in thread context #14084

Conversation

ansjcy
Copy link
Member

@ansjcy ansjcy commented Jun 7, 2024

Description

Current implementation to inject headers in thread context caused concurrent modification errors since other threads can access the headers at the same time when we delete it.

java.lang.AssertionError: Unexpected ShardFailures: [shard [[6aZksFMiT4WtFkh9Ni-mog][test][7]], reason [RemoteTransportException[[node_t2][127.0.0.1:35247][indices:data/read/search[phase/query/id]]]; nested: NotSerializableExceptionWrapper[concurrent_modification_exception: null]; ], cause [NotSerializableExceptionWrapper[concurrent_modification_exception: null]
	at java.util.HashMap$HashIterator.nextNode(HashMap.java:1605)
	at java.util.HashMap$EntryIterator.next(HashMap.java:1638)
	at java.util.HashMap$EntryIterator.next(HashMap.java:1636)
	at org.opensearch.common.util.concurrent.ThreadContext$ThreadContextStruct.putResponseHeaders(ThreadContext.java:710)
	at org.opensearch.common.util.concurrent.ThreadContext.lambda$newStoredContext$4(ThreadContext.java:278)
	at org.opensearch.common.util.concurrent.ThreadContext$StoredContext.restore(ThreadContext.java:568)
	at org.opensearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:112)
	at org.opensearch.transport.NativeMessageHandler.handleRequest(NativeMessageHandler.java:279)
	at org.opensearch.transport.NativeMessageHandler.handleMessage(NativeMessageHandler.java:147)
	at org.opensearch.transport.NativeMessageHandler.messageReceived(NativeMessageHandler.java:127)
	at org.opensearch.transport.InboundHandler.messageReceivedFromPipeline(InboundHandler.java:121)
	at org.opensearch.transport.InboundHandler.inboundMessage(InboundHandler.java:113)
	at org.opensearch.transport.TcpTransport.inboundMessage(TcpTransport.java:788)
	at org.opensearch.transport.nativeprotocol.NativeInboundBytesHandler.forwardFragments(NativeInboundBytesHandler.java:156)
	at org.opensearch.transport.nativeprotocol.NativeInboundBytesHandler.doHandleBytes(NativeInboundBytesHandler.java:93)
	at org.opensearch.transport.InboundPipeline.doHandleBytes(InboundPipeline.java:143)
	at org.opensearch.transport.InboundPipeline.handleBytes(InboundPipeline.java:119)
	at org.opensearch.transport.nio.MockNioTransport$MockTcpReadWriteHandler.consumeReads(MockNioTransport.java:343)
	at org.opensearch.nio.SocketChannelContext.handleReadBytes(SocketChannelContext.java:246)
	at org.opensearch.nio.BytesChannelContext.read(BytesChannelContext.java:59)
	at org.opensearch.nio.EventHandler.handleRead(EventHandler.java:152)
	at org.opensearch.transport.nio.TestEventHandler.handleRead(TestEventHandler.java:167)
	at org.opensearch.nio.NioSelector.handleRead(NioSelector.java:438)
	at org.opensearch.nio.NioSelector.processKey(NioSelector.java:264)
	at org.opensearch.nio.NioSelector.singleLoop(NioSelector.java:191)
	at org.opensearch.nio.NioSelector.runLoop(NioSelector.java:148)
	at java.lang.Thread.run(Thread.java:1583)
]]
Expected: <0>
     but: was <1>

This is caused by removing a key from a map while other threads are iterating the map at the same time.
This PR removes the delete logic and make replacing existing key logic self contained in put headers function.

Related Issues

Related to the above build failure

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@ansjcy ansjcy changed the title fix concurrent modification issue in thread context fix concurrent modification exceptions in thread context Jun 7, 2024
@jed326 jed326 requested a review from deshsidd June 7, 2024 22:47
@deshsidd
Copy link
Contributor

deshsidd commented Jun 7, 2024

LGTM apart from the minor comment. Will approve after the change

@ansjcy ansjcy force-pushed the fix-concurrent-modification-issue-in-thread-context branch from 666af8c to 86c4909 Compare June 7, 2024 22:59
@ansjcy ansjcy force-pushed the fix-concurrent-modification-issue-in-thread-context branch from 86c4909 to 768da23 Compare June 7, 2024 23:01
Copy link
Contributor

github-actions bot commented Jun 7, 2024

❕ Gradle check result for 666af8c: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

Copy link

codecov bot commented Jun 7, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 71.69%. Comparing base (b15cb0c) to head (95bcf01).
Report is 371 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main   #14084      +/-   ##
============================================
+ Coverage     71.42%   71.69%   +0.27%     
- Complexity    59978    61622    +1644     
============================================
  Files          4985     5082      +97     
  Lines        282275   289233    +6958     
  Branches      40946    41853     +907     
============================================
+ Hits         201603   207360    +5757     
- Misses        63999    64732     +733     
- Partials      16673    17141     +468     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

github-actions bot commented Jun 8, 2024

❌ Gradle check result for 86c4909:

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

github-actions bot commented Jun 8, 2024

❌ Gradle check result for 768da23:

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@ansjcy ansjcy force-pushed the fix-concurrent-modification-issue-in-thread-context branch from 768da23 to 70f95c0 Compare June 8, 2024 04:45
@ansjcy ansjcy force-pushed the fix-concurrent-modification-issue-in-thread-context branch from 70f95c0 to 95bcf01 Compare June 8, 2024 04:45
Copy link
Contributor

github-actions bot commented Jun 8, 2024

❌ Gradle check result for 70f95c0: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

github-actions bot commented Jun 8, 2024

✅ Gradle check result for 95bcf01: SUCCESS

@jed326 jed326 merged commit c8f0b6d into opensearch-project:main Jun 10, 2024
30 checks passed
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/OpenSearch/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/OpenSearch/backport-2.x
# Create a new branch
git switch --create backport/backport-14084-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 c8f0b6da6def4b8e78cd17274e5a4271e844a71f
# Push it to GitHub
git push --set-upstream origin backport/backport-14084-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/OpenSearch/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-14084-to-2.x.

parv0201 pushed a commit to parv0201/OpenSearch that referenced this pull request Jun 10, 2024
ansjcy added a commit to ansjcy/OpenSearch that referenced this pull request Jun 10, 2024
ansjcy added a commit to ansjcy/OpenSearch that referenced this pull request Jun 10, 2024
jed326 pushed a commit that referenced this pull request Jun 10, 2024
…s tracking (#14085)

* Query-level resource usages tracking (#13172)

* Query-level resource usages tracking

Signed-off-by: Chenyang Ji <[email protected]>

* Moving TaskResourceTrackingService to clusterService

Signed-off-by: Chenyang Ji <[email protected]>

* use shard response header to piggyback task resource usages

Signed-off-by: Chenyang Ji <[email protected]>

* split changes for query insights plugin

Signed-off-by: Chenyang Ji <[email protected]>

* improve the supplier logic and other misc items

Signed-off-by: Chenyang Ji <[email protected]>

* track resource usage for failed requests

Signed-off-by: Chenyang Ji <[email protected]>

* move resource usages interactions into TaskResourceTrackingService

Signed-off-by: Chenyang Ji <[email protected]>

---------

Signed-off-by: Chenyang Ji <[email protected]>
(cherry picked from commit 3d1fa98)

* fix concurrent modification issue in thread context (#14084)

Signed-off-by: Chenyang Ji <[email protected]>
(cherry picked from commit c8f0b6d)

* consume query level cpu and memory usage in query insights (#13739)

* consume query level cpu and memory usage in query insights

Signed-off-by: Chenyang Ji <[email protected]>

* handle failed requests metrics in query insights

Signed-off-by: Chenyang Ji <[email protected]>

* refactor the code to make it more maintainable

Signed-off-by: Chenyang Ji <[email protected]>

---------

Signed-off-by: Chenyang Ji <[email protected]>
(cherry picked from commit 04a417a)

* fix japicmp check for threadContext

Signed-off-by: Chenyang Ji <[email protected]>
(cherry picked from commit b403fdc)
kkewwei pushed a commit to kkewwei/OpenSearch that referenced this pull request Jul 24, 2024
…s tracking (opensearch-project#14085)

* Query-level resource usages tracking (opensearch-project#13172)

* Query-level resource usages tracking

Signed-off-by: Chenyang Ji <[email protected]>

* Moving TaskResourceTrackingService to clusterService

Signed-off-by: Chenyang Ji <[email protected]>

* use shard response header to piggyback task resource usages

Signed-off-by: Chenyang Ji <[email protected]>

* split changes for query insights plugin

Signed-off-by: Chenyang Ji <[email protected]>

* improve the supplier logic and other misc items

Signed-off-by: Chenyang Ji <[email protected]>

* track resource usage for failed requests

Signed-off-by: Chenyang Ji <[email protected]>

* move resource usages interactions into TaskResourceTrackingService

Signed-off-by: Chenyang Ji <[email protected]>

---------

Signed-off-by: Chenyang Ji <[email protected]>
(cherry picked from commit 3d1fa98)

* fix concurrent modification issue in thread context (opensearch-project#14084)

Signed-off-by: Chenyang Ji <[email protected]>
(cherry picked from commit c8f0b6d)

* consume query level cpu and memory usage in query insights (opensearch-project#13739)

* consume query level cpu and memory usage in query insights

Signed-off-by: Chenyang Ji <[email protected]>

* handle failed requests metrics in query insights

Signed-off-by: Chenyang Ji <[email protected]>

* refactor the code to make it more maintainable

Signed-off-by: Chenyang Ji <[email protected]>

---------

Signed-off-by: Chenyang Ji <[email protected]>
(cherry picked from commit 04a417a)

* fix japicmp check for threadContext

Signed-off-by: Chenyang Ji <[email protected]>
(cherry picked from commit b403fdc)
Signed-off-by: kkewwei <[email protected]>
wdongyu pushed a commit to wdongyu/OpenSearch that referenced this pull request Aug 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch backport-failed skip-changelog v2.15.0 Issues and PRs related to version 2.15.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants