Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle Reader thread termination gracefully #474

Closed
wants to merge 9 commits into from

Conversation

khushbr
Copy link
Collaborator

@khushbr khushbr commented Aug 13, 2023

Is your feature request related to a problem? Please provide an existing Issue # , or describe.
#468

Describe the solution you are proposing
RCA Agent process creates 4 thread as part of process bootstrap. The Reader thread is most critical as it is responsibly for reading, processing and cleaning up upstream metrics generated by Performance Analyzer(PA) plugin. In case of Reader thread crash, the metrics will start accumulating and filling up the disk, while the other components (RCA Graph, Webserver and GRPC Server) perform no work as the metrics are the leaf nodes to all the processing.

The initial design, thus, was to keep creating new reader thread in an infinite loop. This, however, isn't efficient in case of non-retryable/permanent failures. This code changes adds:

  1. Max retry attempt to bring up the Reader Thread
  2. In case Reader Thread doesn't come up, handle the failure by disabling PA plugin and gracefully exiting (using shutdownhook) with process Runtime exit.

Describe alternatives you've considered
The alternative to not terminating the RCA process is to keep the other threads and the process running, while the Reader thread has crashed. In the absence of no metrics flowing through analysis graph, the Controller thread and Servers (web and grpc) do no work.

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@khushbr khushbr changed the title Hanlde Reader thread termination gracefully Handle Reader thread termination gracefully Aug 13, 2023
@codecov
Copy link

codecov bot commented Aug 14, 2023

Codecov Report

Merging #474 (1c73519) into main (572e7bc) will increase coverage by 0.12%.
The diff coverage is 0.00%.

❗ Current head 1c73519 differs from pull request most recent head e93740f. Consider uploading reports for the commit e93740f to get more accurate results

@@             Coverage Diff              @@
##               main     #474      +/-   ##
============================================
+ Coverage     74.78%   74.91%   +0.12%     
- Complexity     2668     2682      +14     
============================================
  Files           316      317       +1     
  Lines         16243    16294      +51     
  Branches       1272     1276       +4     
============================================
+ Hits          12147    12206      +59     
- Misses         3581     3584       +3     
+ Partials        515      504      -11     
Files Changed Coverage Δ
...rch/performanceanalyzer/ESLocalhostConnection.java 0.00% <0.00%> (ø)
...ch/performanceanalyzer/PerformanceAnalyzerApp.java 33.51% <0.00%> (-16.49%) ⬇️
...rg/opensearch/performanceanalyzer/rca/Version.java 0.00% <ø> (ø)

... and 17 files with indirect coverage changes

@khushbr khushbr closed this Aug 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant