Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing Event Log file cleanup issue #36

Merged
merged 9 commits into from
Jul 13, 2021
Merged

Conversation

khushbr
Copy link
Collaborator

@khushbr khushbr commented Jul 13, 2021

Is your feature request related to a problem?
Issue : #33
[performance-analyzer-rca] PR: opensearch-project/performance-analyzer-rca#27
Previous PR [now closed] : #34

Describe the solution you are proposing

  1. The solution removes the 'MetricsPurgeActivity' collector and moves the responsibility for event log file cleanup to 'EventLogQueueProcessor.' The purgeQueueAndPersist() invokes deleteFiles() every filesCleanupPeriod (default to 60s), to cleanup the older event log files and then writes the latest event log files with writeAndRotate()
  2. Removing MetricsPurgeActivity class instantiation.

Describe alternatives you've considered
Another approach was to launch a new thread and invoke 'MetricsPurgeActivity' within it. We will again run into the same issue if this thread dies, thus to keep the cleanup and write within same thread was better.

Testing
Tested by spinning up a docker container. Manually copied 100 dummy files to /dev/shm/performanceanalyzer/.
Enabled DEBUG logs to verify cleanup is working as expected

[2021-07-01T20:30:52,950][DEBUG][c.a.o.e.p.r.EventLogFileHandler] Starting to delete old writer files
[2021-07-01T20:30:52,950][DEBUG][c.a.o.e.p.r.EventLogFileHandler] Files discovered 169
[2021-07-01T20:30:52,977][DEBUG][c.a.o.e.p.r.EventLogFileHandler] '153' Old writer files cleaned up.

Metrics:

Metrics=EventLogFilesDeletionTime=27.0 millis aggr|MEAN,EventLogFilesDeletionTime=27 millis 
aggr|MAX,EventLogFilesDeletionTime=27 millis aggr|SUM,EventLogFilesDeleted=153 count 
aggr|SUM,EventLogFilesDeleted=153 count aggr|MAX

Metric
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

sruti1312 and others added 9 commits April 15, 2021 11:30
* Add latency and failure metrics for Publish Cluster State Metrics from master's perspective

* Update Javadoc

* Add new line at the end

* Empty Queue before running each test

* Addressed comments

* Metrics for disabled collector

* Fixing bug in MasterClusterStateUpdateStatsCollector

Co-authored-by: Arpita Mathur <[email protected]>
// In case files deletion takes longer/fails, we are okay with eventQueue reaching
// its max size (100000), post that {@link PerformanceAnalyzerMetrics#emitMetric()}
// will emit metric {@link WriterMetrics#METRICS_WRITE_ERROR} and return.
cleanup();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the cleanup is called after the files are written, the first time this thread runs, it will delete even the most recently written files because the call to cleanup will not have the lastCleanupTimeBucket and hence deleteAllFiles will be called deleting everything but the .tmp files.

Because the full cleanup is just a one time cleanup and happens at the start, let's do it in the constructor itself ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. This would cause to lose the first timebucket metrics. I would address this as part of separate PR I will have for StatCollector refactoring.

@khushbr khushbr merged commit 14b300b into main Jul 13, 2021
sruti1312 pushed a commit that referenced this pull request Aug 25, 2021
sruti1312 added a commit that referenced this pull request Aug 25, 2021
* Create writer file if metrics are available (#31)

Signed-off-by: Sruti Parthiban <[email protected]>

* Add tests to check for writer file only if metrics are present (#35)

Signed-off-by: Sruti Parthiban <[email protected]>

* Merge pull request #36 from opensearch-project/khushbr-writer-purge-fix

Fixing Event Log file cleanup issue

* Moving deleteAllFiles() to inside scheduleExecutor()

* Fixing the Link Checker errors, updating the official documentation

* nit: Fixing spotlessJava indentation issue

* Merge pull request #37 from khushbr/feature/purge-fix

Handling purging of lingering files before scheduleExecutor start.

* Fix failing file handler test (#38)

Signed-off-by: Sruti Parthiban <[email protected]>

* Remove dependency on main branch when running spotless. (#47)

Signed-off-by: Marc Handalian <[email protected]>

* Updates to gradle build file (#48)

* Updates to gradle build file

Signed-off-by: Sruti Parthiban <[email protected]>

* Add ability to specify RCA branch

Signed-off-by: Sruti Parthiban <[email protected]>

* Fix build when opensearch_version flag is provided. (#52)

Signed-off-by: Marc Handalian <[email protected]>

* Update the version to 1.0.1

Signed-off-by: Sruti Parthiban <[email protected]>

Co-authored-by: Khushboo Rajput <[email protected]>
Co-authored-by: Khushboo Rajput <[email protected]>
Co-authored-by: Marc Handalian <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants