Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

apps sc: Alert size over limit only on active indicies #2203

Merged
merged 1 commit into from
Jan 8, 2025

Conversation

aarnq
Copy link
Contributor

@aarnq aarnq commented Jul 10, 2024

Warning

This is a public repository, ensure not to disclose:

  • personal data beyond what is necessary for interacting with this pull request, nor
  • business confidential information, such as customer names.

What kind of PR is this?

Required: Mark one of the following that is applicable:

  • kind/feature
  • kind/improvement
  • kind/deprecation
  • kind/documentation
  • kind/clean-up
  • kind/bug
  • kind/other

Optional: Mark one or more of the following that are applicable:

Important

Breaking changes should be marked kind/admin-change or kind/dev-change depending on type
Critical security fixes should be marked with kind/security

  • kind/admin-change
  • kind/dev-change
  • kind/security
  • kind/adr

What does this PR do / why do we need this PR?

This changes the index over size limit alert to only consider indices with a positive document rate over the last hour, meaning it will eventually stop alerting for indices that are inactive such as after a successful rollover.

Information to reviewers

I haven't had the time to test it yet.

Checklist

  • Proper commit message prefix on all commits
  • Change checks:
    • The change is transparent
    • The change is disruptive
    • The change requires no migration steps
    • The change requires migration steps
    • The change upgrades CRDs
    • The change updates the config and the schema
  • Metrics checks:
    • The metrics are still exposed and present in Grafana after the change
    • The metrics names didn't change (Grafana dashboards and Prometheus alerts are not affected)
    • The metrics names did change (Grafana dashboards and Prometheus alerts were fixed)
  • Logs checks:
    • The logs do not show any errors after the change
  • Pod Security Policy checks:
    • Any changed pod is covered by Pod Security Admission
    • Any changed pod is covered by Gatekeeper Pod Security Policies
    • The change does not cause any pods to be blocked by Pod Security Admission or Policies
  • Network Policy checks:
    • Any changed pod is covered by Network Policies
    • The change does not cause any dropped packets in the NetworkPolicy Dashboard
  • Audit checks:
    • The change does not cause any unnecessary Kubernetes audit events
    • The change requires changes to Kubernetes audit policy
  • Falco checks:
    • The change does not cause any alerts to be generated by Falco
  • Bug checks:
    • The bug fix is covered by regression tests

@aarnq aarnq added the kind/improvement Improvement of existing features, e.g. code cleanup or optimizations. label Jul 10, 2024
@aarnq aarnq self-assigned this Jul 10, 2024
@aarnq
Copy link
Contributor Author

aarnq commented Dec 13, 2024

@elastisys/goto-logging-stack Would anyone of you like to take over this idea to make the alerts only report indices that are actively written to?
I have not the time to properly test it.

@lunkan93
Copy link
Contributor

Do you have time for this @salehsedghpour ? I might have time next week, but otherwise I wont have time until January, if not I can take it anyway.

@salehsedghpour
Copy link

Yeah sure, I'll test it, I might have time for it next week.

@salehsedghpour salehsedghpour self-assigned this Dec 13, 2024
@salehsedghpour
Copy link

I've tested it and it works perfectly. I believe the use of elasticsearch_indices_docs_primary instead of elasticsearch_indices_docs_total would be more precise, as the primary shard should get the and validate the values and then forward to other replicas. What do you think @lunkan93?

@lunkan93
Copy link
Contributor

I've tested it and it works perfectly. I believe the use of elasticsearch_indices_docs_primary instead of elasticsearch_indices_docs_total would be more precise, as the primary shard should get the and validate the values and then forward to other replicas. What do you think @lunkan93?

Nice! 👍

I agree that it makes more sense to use elasticsearch_indices_docs_primary 😄

@salehsedghpour salehsedghpour marked this pull request as ready for review December 17, 2024 08:51
@salehsedghpour salehsedghpour requested a review from a team as a code owner December 17, 2024 08:51
@salehsedghpour salehsedghpour assigned aarnq and unassigned aarnq Dec 17, 2024
@lunkan93 lunkan93 force-pushed the aarnq/opensearch-active-index-size-alert branch from 337deb3 to ec90572 Compare January 7, 2025 13:58
Copy link
Contributor Author

@aarnq aarnq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, cannot approve though as I opened it 😅

@lunkan93 lunkan93 force-pushed the aarnq/opensearch-active-index-size-alert branch from ec90572 to c151444 Compare January 8, 2025 08:12
@lunkan93 lunkan93 merged commit 8c3fc63 into main Jan 8, 2025
12 checks passed
@lunkan93 lunkan93 deleted the aarnq/opensearch-active-index-size-alert branch January 8, 2025 08:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/improvement Improvement of existing features, e.g. code cleanup or optimizations.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants