Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

salt: Lower Prometheus rule alert disk thresholds #3479

Merged
merged 3 commits into from
Aug 5, 2021

Conversation

alexandre-allard
Copy link
Contributor

@alexandre-allard alexandre-allard commented Aug 4, 2021

Component: salt, prometheus

Context:
When disk space left is less than 10%, kubelet starts to evict pods to try to make some space.
The issue is our monitoring thresholds are higher than the ones from kubelet, so we never receive
alerts before a disk is actually full.

Summary:
We lower the thresholds for the following alerts

NodeFilesystemAlmostOutOfSpace:

  • warning from 5% to 20%
  • critical from 3% to 12%

NodeFilesystemAlmostOutOfFiles:

  • warning from 5% to 15%
  • critical from 3% to 8%

Acceptance criteria:


Closes: #3474

We lower the thresholds for the following alerts

NodeFilesystemAlmostOutOfSpace:
  - warning from 5% to 20%
  - critical from 3% to 12%
NodeFilesystemAlmostOutOfFiles:
  - warning from 5% to 15%
  - critical from 3% to 8%

Otherwise we don't receive alert before kubelet
starts evicting pods when disk is full as its
threshold is set to 10% of available disk space
and 5% of inodes free.

Refs: #3474
@alexandre-allard alexandre-allard requested a review from a team as a code owner August 4, 2021 15:07
@bert-e
Copy link
Contributor

bert-e commented Aug 4, 2021

Hello alexandre-allard-scality,

My role is to assist you with the merge of this
pull request. Please type @bert-e help to get information
on this process, or consult the user documentation.

Status report is not available.

@bert-e
Copy link
Contributor

bert-e commented Aug 4, 2021

Integration data created

I have created the integration data for the additional destination branches.

The following branches will NOT be impacted:

  • development/2.0
  • development/2.1
  • development/2.2
  • development/2.3
  • development/2.4
  • development/2.5
  • development/2.6
  • development/2.7
  • development/2.8
  • development/2.9

You can set option create_pull_requests if you need me to create
integration pull requests in addition to integration branches, with:

@bert-e create_pull_requests

@bert-e
Copy link
Contributor

bert-e commented Aug 4, 2021

Waiting for approval

The following approvals are needed before I can proceed with the merge:

  • the author

  • one peer

Peer approvals must include at least 1 approval from the following list:

gdemonet
gdemonet previously approved these changes Aug 4, 2021
thomasdanan
thomasdanan previously approved these changes Aug 4, 2021
@alexandre-allard
Copy link
Contributor Author

/approve

@bert-e
Copy link
Contributor

bert-e commented Aug 4, 2021

Build failed

The build for commit did not succeed in branch bugfix/3474-lower-disk-alert-thresholds.

The following options are set: approve

```
./tools/rule_extractor/rule_extractor.py \
  -i <control-plane-ip> -p 8443 -t rules
```

Refs: #3474
The `debug` could only be composed of alphabetic
characters and dashes, but we are now putting the
dist version in the stage name (e.g. centos-7).
@bert-e
Copy link
Contributor

bert-e commented Aug 5, 2021

Waiting for approval

The following approvals are needed before I can proceed with the merge:

  • the author

  • one peer

Peer approvals must include at least 1 approval from the following list:

The following options are set: approve

@bert-e
Copy link
Contributor

bert-e commented Aug 5, 2021

Build failed

The build for commit did not succeed in branch w/2.11/bugfix/3474-lower-disk-alert-thresholds.

The following options are set: approve

@bert-e
Copy link
Contributor

bert-e commented Aug 5, 2021

In the queue

The changeset has received all authorizations and has been added to the
relevant queue(s). The queue(s) will be merged in the target development
branch(es) as soon as builds have passed.

The changeset will be merged in:

  • ✔️ development/2.10

  • ✔️ development/2.11

The following branches will NOT be impacted:

  • development/2.0
  • development/2.1
  • development/2.2
  • development/2.3
  • development/2.4
  • development/2.5
  • development/2.6
  • development/2.7
  • development/2.8
  • development/2.9

There is no action required on your side. You will be notified here once
the changeset has been merged. In the unlikely event that the changeset
fails permanently on the queue, a member of the admin team will
contact you to help resolve the matter.

IMPORTANT

Please do not attempt to modify this pull request.

  • Any commit you add on the source branch will trigger a new cycle after the
    current queue is merged.
  • Any commit you add on one of the integration branches will be lost.

If you need this pull request to be removed from the queue, please contact a
member of the admin team now.

The following options are set: approve

@bert-e
Copy link
Contributor

bert-e commented Aug 5, 2021

I have successfully merged the changeset of this pull request
into targetted development branches:

  • ✔️ development/2.10

  • ✔️ development/2.11

The following branches have NOT changed:

  • development/2.0
  • development/2.1
  • development/2.2
  • development/2.3
  • development/2.4
  • development/2.5
  • development/2.6
  • development/2.7
  • development/2.8
  • development/2.9

Please check the status of the associated issue None.

Goodbye alexandre-allard-scality.

@bert-e bert-e merged commit 18bf239 into development/2.10 Aug 5, 2021
@bert-e bert-e deleted the bugfix/3474-lower-disk-alert-thresholds branch August 5, 2021 19:59
gdemonet added a commit that referenced this pull request Aug 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants