Adjust memory requests and limits for elastic-agent when run in Kubernetes cluster #5614

MichaelKatsoulis · 2024-09-25T12:46:49Z

What does this PR do?

This PR adjusts the default memory requests and limits for elastic-agent when it runs in a Kubernetes cluster.
The new requests and limits will be

resources:
  limits:
    memory: 1Gi
  requests:
    cpu: 100m
    memory: 500Mi

These will be adequate even in big load (tested with 95 pods per node).

Why is it important?

In latest elastic-agent versions the baseline memory consumption is higher, thus adding Kubernetes and system
integration leads to often OOM killed pods.
The previous memory limit set was 700Mb

After investigation with various number of pods we have seen that in ^8.15.* versions
the memory consumption can reach from 750 to 950 Mb.
This varies depending on the number of pods a single elastic agent has to monitor.
For example:

45 pods per node: mem consumption up to 810 MB
60  pods per node: mem consumption up to 830 Mb
75 pods per node: mem consumption up to 930 Mb
95 pods per node: mem consumption up to 990 MB

Checklist

My code follows the style guidelines of this project
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have made corresponding change to the default configuration files
I have added tests that prove my fix is effective or that my feature works
I have added an entry in ./changelog/fragments using the changelog tool
I have added an integration test or an E2E test

Disruptive User Impact

The inadequate memory limits lead to elastic-agent getting often OOM killed in newer versions.

How to test this PR locally

Deploy Elastic Agent with system and Kubernetes integration following the Kibana instructions and watch the
memory consumption of the pod with watch "kubectl top pod -n kube-system" and its status.

Related issues

…netes cluster

mergify · 2024-09-25T12:47:38Z

This pull request does not have a backport label. Could you fix it @MichaelKatsoulis? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

backport-./d./d is the label to automatically backport to the 8./d branch. /d is the digit

mergify · 2024-09-25T12:47:38Z

backport-v8.x has been added to help with the transition to the new branch 8.x.
If you don't need it please use backport-skip label and remove the backport-8.x label.

pkoutsovasilis

Hey @MichaelKatsoulis 👋 could you check the failures in CI? I think that some other files also need the change of the limits

MichaelKatsoulis · 2024-09-26T12:30:44Z

@pchila do you have any idea why these build kite integrations tests are failing? It is not obvious to me.
The only updates to this PR are the slight increase of memory requests and limits in elastic agent in Kubernetes.

pchila · 2024-09-26T13:23:42Z

@MichaelKatsoulis Not sure why I got pulled in this PR as there are already a lot of knowledgeable people involved, the reason for the integration tests failure on this PR is that those same tests are currently broken also on main and investigation is ongoing. You should update your branch once main is green again and check the outcome of those tests again.

If your PR has sensitive time constraints so that you need to merge it before main becomes green again I guess you can present your case to one of the repo admins for a force merge

MichaelKatsoulis · 2024-09-26T13:56:57Z

@pchila I pulled you in cause it seemed to me that you have worked a lot with those failing integrations tests. Thanks for your time and information.

elastic-sonarqube · 2024-10-02T05:26:31Z

Quality Gate passed

Issues
0 New issues
0 Fixed issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
No data about Duplication

See analysis details on SonarQube

…netes cluster (#5614) * Adjust memory requests and limits for elastic-agent when run in Kubernetes cluster (cherry picked from commit e06e786)

…netes cluster (#5614) (#5657) * Adjust memory requests and limits for elastic-agent when run in Kubernetes cluster (cherry picked from commit e06e786) Co-authored-by: Michael Katsoulis <[email protected]> Co-authored-by: Andrew Gizas <[email protected]>

…netes cluster (#5614) (#5656) * Adjust memory requests and limits for elastic-agent when run in Kubernetes cluster (cherry picked from commit e06e786) Co-authored-by: Michael Katsoulis <[email protected]>

Adjust memory requests and limits for elastic-agent when run in Kuber…

7eed526

…netes cluster

MichaelKatsoulis requested a review from a team as a code owner September 25, 2024 12:46

MichaelKatsoulis requested review from tetianakravchenko and constanca-m September 25, 2024 12:46

mergify bot assigned MichaelKatsoulis Sep 25, 2024

mergify bot added the backport-8.x Automated backport to the 8.x branch with mergify label Sep 25, 2024

MichaelKatsoulis requested review from gizas and cmacknz September 25, 2024 12:48

gizas approved these changes Sep 25, 2024

View reviewed changes

MichaelKatsoulis added the backport-8.15 Automated backport to the 8.15 branch with mergify label Sep 25, 2024

Add PR fragment

03d6596

MichaelKatsoulis requested a review from a team as a code owner September 25, 2024 13:45

MichaelKatsoulis requested a review from pkoutsovasilis September 25, 2024 13:45

pkoutsovasilis reviewed Sep 25, 2024

View reviewed changes

Update limits and requests in all files

72d7f31

tetianakravchenko approved these changes Sep 26, 2024

View reviewed changes

pkoutsovasilis approved these changes Sep 26, 2024

View reviewed changes

Trigger tests

9ed6b7d

MichaelKatsoulis added 3 commits September 30, 2024 09:44

Merge branch 'main' into adjust-elastic-agent-mem-requests-limits

b6d8b44

Merge branch 'main' into adjust-elastic-agent-mem-requests-limits

b918247

Merge branch 'main' into adjust-elastic-agent-mem-requests-limits

bd37270

gizas merged commit e06e786 into elastic:main Oct 2, 2024
14 checks passed

This was referenced Oct 2, 2024

[8.15](backport #5614) Adjust memory requests and limits for elastic-agent when run in Kubernetes cluster #5656

Merged

[8.x](backport #5614) Adjust memory requests and limits for elastic-agent when run in Kubernetes cluster #5657

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adjust memory requests and limits for elastic-agent when run in Kubernetes cluster #5614

Adjust memory requests and limits for elastic-agent when run in Kubernetes cluster #5614

MichaelKatsoulis commented Sep 25, 2024 •

edited

Loading

mergify bot commented Sep 25, 2024

mergify bot commented Sep 25, 2024

pkoutsovasilis left a comment

MichaelKatsoulis commented Sep 26, 2024

pchila commented Sep 26, 2024 •

edited

Loading

MichaelKatsoulis commented Sep 26, 2024

elastic-sonarqube bot commented Oct 2, 2024

Adjust memory requests and limits for elastic-agent when run in Kubernetes cluster #5614

Adjust memory requests and limits for elastic-agent when run in Kubernetes cluster #5614

Conversation

MichaelKatsoulis commented Sep 25, 2024 • edited Loading

What does this PR do?

Why is it important?

Checklist

Disruptive User Impact

How to test this PR locally

Related issues

mergify bot commented Sep 25, 2024

mergify bot commented Sep 25, 2024

pkoutsovasilis left a comment

Choose a reason for hiding this comment

MichaelKatsoulis commented Sep 26, 2024

pchila commented Sep 26, 2024 • edited Loading

MichaelKatsoulis commented Sep 26, 2024

elastic-sonarqube bot commented Oct 2, 2024

Quality Gate passed

MichaelKatsoulis commented Sep 25, 2024 •

edited

Loading

pchila commented Sep 26, 2024 •

edited

Loading