Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support task resource tracking in OpenSearch #3982

Merged

Conversation

ketanv3
Copy link
Contributor

@ketanv3 ketanv3 commented Jul 22, 2022

Description

Reopens changes from #2639 (reverted in #3046) to add a framework for task resource tracking.
Currently, SearchTask and SearchShardTask support resource tracking but it can be extended to any other task in the future.

Changes since #2639:

  • Replaced the usage of AutoQueueAdjustingExecutorBuilder with ResizableExecutorBuilder
  • Fixed a race-condition when Task is unregistered before its threads are stopped
  • Resolved merge conflicts
  • Fixed broken tests

Signed-off-by: Ketan Verma [email protected]

Issues Resolved

#1179

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@codecov-commenter
Copy link

codecov-commenter commented Jul 23, 2022

Codecov Report

Attention: Patch coverage is 77.67442% with 48 lines in your changes missing coverage. Please review.

Project coverage is 70.71%. Comparing base (740f75d) to head (2027602).
Report is 3229 commits behind head on main.

Files with missing lines Patch % Lines
.../opensearch/tasks/TaskResourceTrackingService.java 77.38% 12 Missing and 7 partials ⚠️
...org/opensearch/action/support/TransportAction.java 50.00% 7 Missing ⚠️
...erver/src/main/java/org/opensearch/tasks/Task.java 75.00% 6 Missing and 1 partial ⚠️
...a/org/opensearch/threadpool/TaskAwareRunnable.java 69.56% 4 Missing and 3 partials ⚠️
...rc/main/java/org/opensearch/tasks/TaskManager.java 62.50% 4 Missing and 2 partials ⚠️
...ster/node/tasks/list/TransportListTasksAction.java 66.66% 1 Missing ⚠️
...ensearch/common/util/concurrent/ThreadContext.java 80.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #3982      +/-   ##
============================================
+ Coverage     70.50%   70.71%   +0.20%     
- Complexity    56848    57041     +193     
============================================
  Files          4583     4585       +2     
  Lines        273931   274122     +191     
  Branches      40158    40178      +20     
============================================
+ Hits         193146   193845     +699     
+ Misses        64561    64001     -560     
- Partials      16224    16276      +52     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ketanv3 ketanv3 marked this pull request as ready for review July 23, 2022 13:17
@ketanv3 ketanv3 requested review from a team and reta as code owners July 23, 2022 13:17
Copy link
Collaborator

@Bukhtawar Bukhtawar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ketanv3 for the changes. What is the additional delay the await mechanism might introduce. We might need to run benchmarks for this change

@ketanv3
Copy link
Contributor Author

ketanv3 commented Jul 23, 2022

Thanks @ketanv3 for the changes. What is the additional delay the await mechanism might introduce. We might need to run benchmarks for this change

Yes, I'm working towards running performance benchmarks.

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

ketanv3 added 4 commits July 31, 2022 01:10
Reopens changes from opensearch-project#2639 (reverted in opensearch-project#3046) to add a framework for task resource tracking.
Currently, SearchTask and SearchShardTask support resource tracking but it can be extended to any other task.

Changes since opensearch-project#2639:
* Replaced the usage of AutoQueueAdjustingExecutorBuilder with ResizableExecutorBuilder
* Resolved merge conflicts
* Fixed broken tests

Signed-off-by: Ketan Verma <[email protected]>
@ketanv3 ketanv3 force-pushed the feature/resource-tracking-framework branch from dcdaf6e to a09a60a Compare July 31, 2022 09:22
@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@ketanv3
Copy link
Contributor Author

ketanv3 commented Jul 31, 2022

Comparing the accuracy of both approaches – (1) no waiting for task's threads to complete, and (2) using a callback to keep track of active threads – before marking the task as unregistered.

An existing integration test was re-used to perform large number of search requests with predictable CPU/memory usage. Measurements were taken for:

  • Number of times thread usage was reported before task tracking was stopped (thread usage accounted).
  • Number of times thread usage was reported after task tracking was stopped (thread usage lost).

Both tests executed 500 search queries and reported resource usages for ~9020 tasks.

approach 1 approach 2
thread usage accounted 2641 4392
thread usage lost 3721 237
tasks completed 9020 9021

Based on these results, approach (2) has been used for the implementation as it gives better accuracy.

@github-actions
Copy link
Contributor

github-actions bot commented Aug 1, 2022

Gradle Check (Jenkins) Run Completed with:

@ketanv3
Copy link
Contributor Author

ketanv3 commented Aug 1, 2022

Benchmark results

  • Used c5.2xlarge EC2 instance-type
  • Used ./gradlew localDistro to generate a distribution from arbitrary commits
  • Used the default configs to launch a cluster
  • Used nyc_taxis workload

Baseline commit: 740f75d
Contender commit: a09a60a

opensearch-benchmark compare --baseline 740f75d2051 --contender a09a60acbbb

   ____                  _____                      __       ____                  __                         __
  / __ \____  ___  ____ / ___/___  ____ ___________/ /_     / __ )___  ____  _____/ /_  ____ ___  ____ ______/ /__
 / / / / __ \/ _ \/ __ \\__ \/ _ \/ __ `/ ___/ ___/ __ \   / __  / _ \/ __ \/ ___/ __ \/ __ `__ \/ __ `/ ___/ //_/
/ /_/ / /_/ /  __/ / / /__/ /  __/ /_/ / /  / /__/ / / /  / /_/ /  __/ / / / /__/ / / / / / / / / /_/ / /  / ,<
\____/ .___/\___/_/ /_/____/\___/\__,_/_/   \___/_/ /_/  /_____/\___/_/ /_/\___/_/ /_/_/ /_/ /_/\__,_/_/  /_/|_|
    /_/


Comparing baseline
  TestExecution ID: 740f75d2051
  TestExecution timestamp: 2022-07-31 13:29:54
  TestProcedure: append-no-conflicts
  ProvisionConfigInstance: external

with contender
  TestExecution ID: a09a60acbbb
  TestExecution timestamp: 2022-08-01 09:25:35
  TestProcedure: append-no-conflicts
  ProvisionConfigInstance: external

------------------------------------------------------
    _______             __   _____
   / ____(_)___  ____ _/ /  / ___/_________  ________
  / /_  / / __ \/ __ `/ /   \__ \/ ___/ __ \/ ___/ _ \
 / __/ / / / / / /_/ / /   ___/ / /__/ /_/ / /  /  __/
/_/   /_/_/ /_/\__,_/_/   /____/\___/\____/_/   \___/
------------------------------------------------------

|                                                        Metric |                     Task |    Baseline |   Contender |     Diff |   Unit |
|--------------------------------------------------------------:|-------------------------:|------------:|------------:|---------:|-------:|
|                    Cumulative indexing time of primary shards |                          |       126.4 |     126.926 |   0.5262 |    min |
|             Min cumulative indexing time across primary shard |                          |       126.4 |     126.926 |   0.5262 |    min |
|          Median cumulative indexing time across primary shard |                          |       126.4 |     126.926 |   0.5262 |    min |
|             Max cumulative indexing time across primary shard |                          |       126.4 |     126.926 |   0.5262 |    min |
|           Cumulative indexing throttle time of primary shards |                          |           0 |           0 |        0 |    min |
|    Min cumulative indexing throttle time across primary shard |                          |           0 |           0 |        0 |    min |
| Median cumulative indexing throttle time across primary shard |                          |           0 |           0 |        0 |    min |
|    Max cumulative indexing throttle time across primary shard |                          |           0 |           0 |        0 |    min |
|                       Cumulative merge time of primary shards |                          |     52.5814 |      54.486 |  1.90467 |    min |
|                      Cumulative merge count of primary shards |                          |         208 |         212 |        4 |        |
|                Min cumulative merge time across primary shard |                          |     52.5814 |      54.486 |  1.90467 |    min |
|             Median cumulative merge time across primary shard |                          |     52.5814 |      54.486 |  1.90467 |    min |
|                Max cumulative merge time across primary shard |                          |     52.5814 |      54.486 |  1.90467 |    min |
|              Cumulative merge throttle time of primary shards |                          |     1.37742 |     1.66843 |  0.29102 |    min |
|       Min cumulative merge throttle time across primary shard |                          |     1.37742 |     1.66843 |  0.29102 |    min |
|    Median cumulative merge throttle time across primary shard |                          |     1.37742 |     1.66843 |  0.29102 |    min |
|       Max cumulative merge throttle time across primary shard |                          |     1.37742 |     1.66843 |  0.29102 |    min |
|                     Cumulative refresh time of primary shards |                          |    0.690283 |    0.595333 | -0.09495 |    min |
|                    Cumulative refresh count of primary shards |                          |          76 |          80 |        4 |        |
|              Min cumulative refresh time across primary shard |                          |    0.690283 |    0.595333 | -0.09495 |    min |
|           Median cumulative refresh time across primary shard |                          |    0.690283 |    0.595333 | -0.09495 |    min |
|              Max cumulative refresh time across primary shard |                          |    0.690283 |    0.595333 | -0.09495 |    min |
|                       Cumulative flush time of primary shards |                          |     1.51388 |     1.44558 |  -0.0683 |    min |
|                      Cumulative flush count of primary shards |                          |          31 |          33 |        2 |        |
|                Min cumulative flush time across primary shard |                          |     1.51388 |     1.44558 |  -0.0683 |    min |
|             Median cumulative flush time across primary shard |                          |     1.51388 |     1.44558 |  -0.0683 |    min |
|                Max cumulative flush time across primary shard |                          |     1.51388 |     1.44558 |  -0.0683 |    min |
|                                       Total Young Gen GC time |                          |      64.188 |      66.636 |    2.448 |      s |
|                                      Total Young Gen GC count |                          |       17754 |       18124 |      370 |        |
|                                         Total Old Gen GC time |                          |           0 |           0 |        0 |      s |
|                                        Total Old Gen GC count |                          |           0 |           0 |        0 |        |
|                                                    Store size |                          |     24.3704 |     24.3655 | -0.00482 |     GB |
|                                                 Translog size |                          | 5.12227e-08 | 5.12227e-08 |        0 |     GB |
|                                        Heap used for segments |                          |           0 |           0 |        0 |     MB |
|                                      Heap used for doc values |                          |           0 |           0 |        0 |     MB |
|                                           Heap used for terms |                          |           0 |           0 |        0 |     MB |
|                                           Heap used for norms |                          |           0 |           0 |        0 |     MB |
|                                          Heap used for points |                          |           0 |           0 |        0 |     MB |
|                                   Heap used for stored fields |                          |           0 |           0 |        0 |     MB |
|                                                 Segment count |                          |          30 |          30 |        0 |        |
|                                                Min Throughput |                    index |      139816 |      140101 |  284.462 | docs/s |
|                                               Mean Throughput |                    index |      140748 |      141834 |  1086.13 | docs/s |
|                                             Median Throughput |                    index |      140712 |      141860 |  1147.85 | docs/s |
|                                                Max Throughput |                    index |      141949 |      143104 |   1155.2 | docs/s |
|                                       50th percentile latency |                    index |      459.04 |     461.438 |  2.39825 |     ms |
|                                       90th percentile latency |                    index |     641.768 |     667.389 |  25.6212 |     ms |
|                                       99th percentile latency |                    index |      1401.7 |     1471.98 |  70.2789 |     ms |
|                                     99.9th percentile latency |                    index |     2154.13 |     2254.37 |  100.243 |     ms |
|                                    99.99th percentile latency |                    index |     2801.73 |      2947.4 |   145.67 |     ms |
|                                      100th percentile latency |                    index |     2950.47 |     3561.21 |  610.743 |     ms |
|                                  50th percentile service time |                    index |      459.04 |     461.438 |  2.39825 |     ms |
|                                  90th percentile service time |                    index |     641.768 |     667.389 |  25.6212 |     ms |
|                                  99th percentile service time |                    index |      1401.7 |     1471.98 |  70.2789 |     ms |
|                                99.9th percentile service time |                    index |     2154.13 |     2254.37 |  100.243 |     ms |
|                               99.99th percentile service time |                    index |     2801.73 |      2947.4 |   145.67 |     ms |
|                                 100th percentile service time |                    index |     2950.47 |     3561.21 |  610.743 |     ms |
|                                                    error rate |                    index |           0 |           0 |        0 |      % |
|                                                Min Throughput | wait-until-merges-finish |  0.00545798 |  0.00717392 |  0.00172 |  ops/s |
|                                               Mean Throughput | wait-until-merges-finish |  0.00545798 |  0.00717392 |  0.00172 |  ops/s |
|                                             Median Throughput | wait-until-merges-finish |  0.00545798 |  0.00717392 |  0.00172 |  ops/s |
|                                                Max Throughput | wait-until-merges-finish |  0.00545798 |  0.00717392 |  0.00172 |  ops/s |
|                                      100th percentile latency | wait-until-merges-finish |      183218 |      139394 | -43824.2 |     ms |
|                                 100th percentile service time | wait-until-merges-finish |      183218 |      139394 | -43824.2 |     ms |
|                                                    error rate | wait-until-merges-finish |           0 |           0 |        0 |      % |
|                                                Min Throughput |                  default |     3.01589 |     3.01582 |   -7e-05 |  ops/s |
|                                               Mean Throughput |                  default |     3.02592 |      3.0258 | -0.00011 |  ops/s |
|                                             Median Throughput |                  default |     3.02365 |     3.02349 | -0.00016 |  ops/s |
|                                                Max Throughput |                  default |     3.04569 |     3.04548 | -0.00022 |  ops/s |
|                                       50th percentile latency |                  default |      5.7814 |     5.91026 |  0.12886 |     ms |
|                                       90th percentile latency |                  default |     6.43334 |     6.83153 |  0.39818 |     ms |
|                                       99th percentile latency |                  default |     8.34059 |     10.2823 |  1.94167 |     ms |
|                                      100th percentile latency |                  default |      9.3404 |     11.1879 |  1.84747 |     ms |
|                                  50th percentile service time |                  default |     3.23102 |     3.30837 |  0.07735 |     ms |
|                                  90th percentile service time |                  default |     3.64748 |     3.97348 |  0.32599 |     ms |
|                                  99th percentile service time |                  default |     5.69504 |     7.69124 |   1.9962 |     ms |
|                                 100th percentile service time |                  default |      6.9198 |     8.15025 |  1.23045 |     ms |
|                                                    error rate |                  default |           0 |           0 |        0 |      % |
|                                                Min Throughput |                    range |    0.703708 |    0.703334 | -0.00037 |  ops/s |
|                                               Mean Throughput |                    range |    0.706099 |    0.705482 | -0.00062 |  ops/s |
|                                             Median Throughput |                    range |    0.705548 |    0.704986 | -0.00056 |  ops/s |
|                                                Max Throughput |                    range |    0.711012 |    0.709893 | -0.00112 |  ops/s |
|                                       50th percentile latency |                    range |     230.942 |     228.752 | -2.18983 |     ms |
|                                       90th percentile latency |                    range |     232.469 |      232.27 | -0.19822 |     ms |
|                                       99th percentile latency |                    range |     281.604 |      268.33 | -13.2736 |     ms |
|                                      100th percentile latency |                    range |     282.024 |     274.873 | -7.15022 |     ms |
|                                  50th percentile service time |                    range |     224.228 |     221.927 | -2.30112 |     ms |
|                                  90th percentile service time |                    range |     225.436 |     225.098 | -0.33821 |     ms |
|                                  99th percentile service time |                    range |     274.534 |     261.367 | -13.1669 |     ms |
|                                 100th percentile service time |                    range |     274.651 |     268.094 |  -6.5567 |     ms |
|                                                    error rate |                    range |           0 |           0 |        0 |      % |
|                                                Min Throughput |      distance_amount_agg |     2.01208 |     2.01214 |    6e-05 |  ops/s |
|                                               Mean Throughput |      distance_amount_agg |     2.01986 |     2.01997 |  0.00011 |  ops/s |
|                                             Median Throughput |      distance_amount_agg |     2.01805 |     2.01815 |   0.0001 |  ops/s |
|                                                Max Throughput |      distance_amount_agg |     2.03565 |      2.0359 |  0.00025 |  ops/s |
|                                       50th percentile latency |      distance_amount_agg |     5.11124 |     5.33593 |  0.22468 |     ms |
|                                       90th percentile latency |      distance_amount_agg |     5.49016 |      5.5372 |  0.04703 |     ms |
|                                       99th percentile latency |      distance_amount_agg |     5.91635 |     5.86624 | -0.05011 |     ms |
|                                      100th percentile latency |      distance_amount_agg |     5.94142 |     6.09461 |  0.15319 |     ms |
|                                  50th percentile service time |      distance_amount_agg |     1.83493 |     1.90533 |   0.0704 |     ms |
|                                  90th percentile service time |      distance_amount_agg |     2.09502 |     2.09756 |  0.00254 |     ms |
|                                  99th percentile service time |      distance_amount_agg |     2.26754 |     2.35835 |  0.09081 |     ms |
|                                 100th percentile service time |      distance_amount_agg |     2.44594 |     2.45766 |  0.01172 |     ms |
|                                                    error rate |      distance_amount_agg |           0 |           0 |        0 |      % |
|                                                Min Throughput |            autohisto_agg |     1.50055 |     1.50018 | -0.00037 |  ops/s |
|                                               Mean Throughput |            autohisto_agg |     1.50088 |     1.50029 | -0.00059 |  ops/s |
|                                             Median Throughput |            autohisto_agg |     1.50081 |     1.50027 | -0.00055 |  ops/s |
|                                                Max Throughput |            autohisto_agg |     1.50158 |      1.5005 | -0.00108 |  ops/s |
|                                       50th percentile latency |            autohisto_agg |     432.987 |     447.084 |  14.0965 |     ms |
|                                       90th percentile latency |            autohisto_agg |     440.025 |     454.874 |   14.849 |     ms |
|                                       99th percentile latency |            autohisto_agg |     445.834 |     462.858 |  17.0243 |     ms |
|                                      100th percentile latency |            autohisto_agg |     452.361 |     466.995 |  14.6342 |     ms |
|                                  50th percentile service time |            autohisto_agg |     430.369 |      445.16 |  14.7912 |     ms |
|                                  90th percentile service time |            autohisto_agg |     437.996 |     452.532 |  14.5358 |     ms |
|                                  99th percentile service time |            autohisto_agg |       443.3 |     460.938 |  17.6376 |     ms |
|                                 100th percentile service time |            autohisto_agg |     450.203 |     463.866 |  13.6633 |     ms |
|                                                    error rate |            autohisto_agg |           0 |           0 |        0 |      % |
|                                                Min Throughput |       date_histogram_agg |     1.50276 |     1.50321 |  0.00044 |  ops/s |
|                                               Mean Throughput |       date_histogram_agg |     1.50448 |     1.50523 |  0.00075 |  ops/s |
|                                             Median Throughput |       date_histogram_agg |     1.50409 |     1.50477 |  0.00068 |  ops/s |
|                                                Max Throughput |       date_histogram_agg |     1.50791 |     1.50925 |  0.00134 |  ops/s |
|                                       50th percentile latency |       date_histogram_agg |     460.007 |      446.69 | -13.3169 |     ms |
|                                       90th percentile latency |       date_histogram_agg |     469.084 |     455.775 | -13.3096 |     ms |
|                                       99th percentile latency |       date_histogram_agg |     478.717 |      467.45 | -11.2664 |     ms |
|                                      100th percentile latency |       date_histogram_agg |     491.838 |     468.051 | -23.7876 |     ms |
|                                  50th percentile service time |       date_histogram_agg |     457.659 |     444.775 | -12.8836 |     ms |
|                                  90th percentile service time |       date_histogram_agg |     466.225 |     453.912 | -12.3126 |     ms |
|                                  99th percentile service time |       date_histogram_agg |     475.816 |     465.039 |  -10.777 |     ms |
|                                 100th percentile service time |       date_histogram_agg |     488.723 |     466.383 | -22.3394 |     ms |
|                                                    error rate |       date_histogram_agg |           0 |           0 |        0 |      % |


-------------------------------
[INFO] SUCCESS (took 0 seconds)
-------------------------------

@ketanv3 ketanv3 force-pushed the feature/resource-tracking-framework branch from 4e77476 to d89de65 Compare August 1, 2022 19:25
@github-actions
Copy link
Contributor

github-actions bot commented Aug 1, 2022

Gradle Check (Jenkins) Run Completed with:

@ketanv3 ketanv3 requested a review from Bukhtawar August 2, 2022 04:03
Comment on lines 415 to 417
public void addResourceTrackingCompletionListener(NotifyOnceListener<Task> listener) {
resourceTrackingCompletionListeners.add(listener);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't addResourceTrackingCompletionListener if the count is zero, else its possible to that newly added listener is never called.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point, updated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though there is still a rare possibility of race-condition:

  • Task resource tracking is completed (num threads = 0), and existing completion listeners are invoked.
  • Delayed thread execution starts for the task (num threads = 1)
  • New completion listener added at this point may succeed.
  • Delayed thread execution stops for the task (num threads = 0)
  • New completion listener is invoked.

To solve this, we may have to bring back the isResourceTrackingCompleted atomic boolean into the Task. It's not a concern at the moment as listeners are added during Task creation, not in the middle of execution.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On a different note, this may not even be a problem because a (delayed) thread is still a part of the task, and the newly added listener would just receive the more recent/accurate usage stats.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah we haven't also strictly synchronised adding listeners and invoking them, so I am fine with this limitation as long as it doesn't overcomplicate the use case

Copy link
Collaborator

@Bukhtawar Bukhtawar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ketanv3 for the changes, one minor comment

@github-actions
Copy link
Contributor

github-actions bot commented Aug 2, 2022

Gradle Check (Jenkins) Run Completed with:

@Bukhtawar Bukhtawar merged commit 5eac54d into opensearch-project:main Aug 2, 2022
@Bukhtawar Bukhtawar added the backport 2.x Backport to 2.x branch label Aug 2, 2022
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-2.x 2.x
# Navigate to the new working tree
cd .worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-3982-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 5eac54d4ade73f6d0ed80b0f2408a104a98e3232
# Push it to GitHub
git push --set-upstream origin backport/backport-3982-to-2.x
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-3982-to-2.x.

ketanv3 added a commit to ketanv3/OpenSearch that referenced this pull request Aug 2, 2022
* Support task resource tracking in OpenSearch

* Reopens changes from opensearch-project#2639 (reverted in opensearch-project#3046) to add a framework for task resource tracking. Currently, SearchTask and SearchShardTask support resource tracking but it can be extended to any other task.

* Fixed a race-condition when Task is unregistered before its threads are stopped

* Improved error handling and simplified task resource tracking completion listener

* Avoid registering listeners on already completed tasks

Signed-off-by: Ketan Verma <[email protected]>
Bukhtawar pushed a commit that referenced this pull request Aug 2, 2022
* [Backport 2.x] Support task resource tracking in OpenSearch

* Reopens changes from #2639 (reverted in #3046) to add a framework for task resource tracking. Currently, SearchTask and SearchShardTask support resource tracking but it can be extended to any other task.

* Fixed a race-condition when Task is unregistered before its threads are stopped

* Improved error handling and simplified task resource tracking completion listener

* Avoid registering listeners on already completed tasks

Signed-off-by: Ketan Verma <[email protected]>
PritLadani added a commit to PritLadani/OpenSearch that referenced this pull request Sep 6, 2022
Backporting pull requests opensearch-project#2089 and opensearch-project#3982

Signed-off-by: PritLadani <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants