POC: Cap shard failure lists to a fixed small size (March 2024) #106135

quux00 · 2024-03-08T17:34:23Z

This is a POC exploratory coding attempt to address #103708 and #99220

After some earlier exploratory code, I decided not to change the AtomicArray of ShardSearchFailures in AbstractSearchAsyncAction. Changing it really messes up the lock-free thread safety model of that class. In addition, other classes keep AtomicArray's of all shard results, so this is not the only offender.

Instead, I focused on reducing the number of failures reported in the SearchResponse. The SearchResponse does not track failed shard count independent of the ShardSearchFailure array, so that new field had to be added.

Most tests are passing, but need to do further work on those. Also CCS MRT=false is not yet truncating the number of failures in the _cluster/details/failures section so I need to track down where that occurs.

…branch

…ar to be passing; next need to add logic to AbstractSearchAsyncAction

…n creating a SearchResponse. Updated AbstractSearchAsyncActionTests Improved the new SearchResponse.Builder. SearchResponseMerger changes not made yet. CCSDuelIT failing for testShardFailures

SearchResponseMergerTests modified to match new functionality. Fixed JdbcShardFailureIT and RestSqlTestCase to no longer expect suppressed header warnings, since we are limiting the number of failures returned.

quux00 added 3 commits March 8, 2024 11:40

Init commit - starting to copy over from search/limit-shard-failures …

df4864d

…branch

Added SearchResponse Builder but not using it yet

ae7ed38

Added failedShards to de/serialization of SearchResponse - tests appe…

eca31a7

…ar to be passing; next need to add logic to AbstractSearchAsyncAction

elasticsearchmachine added the v8.14.0 label Mar 8, 2024

quux00 added 3 commits March 8, 2024 14:26

AbstractSearchAsyncAction now limits the number of shard failures whe…

c73f4f1

…n creating a SearchResponse. Updated AbstractSearchAsyncActionTests Improved the new SearchResponse.Builder. SearchResponseMerger changes not made yet. CCSDuelIT failing for testShardFailures

Moved SearchResponseMerger to SearchResponse.Builder

756f9d1

SearchResponseMerger now limits the total number of shard failures

94ba5c7

SearchResponseMergerTests modified to match new functionality. Fixed JdbcShardFailureIT and RestSqlTestCase to no longer expect suppressed header warnings, since we are limiting the number of failures returned.

elasticsearchmachine added v8.15.0 and removed v8.14.0 labels Apr 17, 2024

elasticsearchmachine added v8.16.0 and removed v8.15.0 labels Jul 4, 2024

mark-vieira added v9.0.0 and removed v8.16.0 labels Sep 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

POC: Cap shard failure lists to a fixed small size (March 2024) #106135

POC: Cap shard failure lists to a fixed small size (March 2024) #106135

quux00 commented Mar 8, 2024 •

edited

Loading

POC: Cap shard failure lists to a fixed small size (March 2024) #106135

Are you sure you want to change the base?

POC: Cap shard failure lists to a fixed small size (March 2024) #106135

Conversation

quux00 commented Mar 8, 2024 • edited Loading

quux00 commented Mar 8, 2024 •

edited

Loading