Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Monitoring] JVM memory usage alert #79039

Merged
merged 12 commits into from
Oct 5, 2020

Conversation

igoristic
Copy link
Contributor

Resolves #74824

This is part of the "Additional Alerting" effort for Stack Monitoring
Screen Shot 2020-09-30 at 5 13 06 PM
Screen Shot 2020-09-30 at 5 13 13 PM

The check calculates each data node to make sure the JVM memory usage is below the implied threshold.

Testing:

  1. Create a Stack Monitoring environment
  2. Through the regular Setup Mode > Alert Edit flow/ux, set the threshold to something low (like 10%)
    Screen Shot 2020-09-30 at 5 13 23 PM

*Note: that it might take a couple of minutes for the notification to show up in the UI

@elasticmachine
Copy link
Contributor

Pinging @elastic/stack-monitoring (Team:Monitoring)

Copy link
Contributor

@chrisronline chrisronline left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, looking good! I found a few things

nextSteps: [
createLink(
i18n.translate('xpack.monitoring.alerts.memoryUsage.ui.nextSteps.tuneThreadPools', {
defaultMessage: '#start_linkTune thread pools#end_link',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see this listed in Ravi's doc. Was it a last minute addition?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It wasn't on the list, but based on my research it is also a valuable resource to mitigate high memory usage. cc: @ravikesarwani wdyt?

@igoristic igoristic requested a review from chrisronline October 2, 2020 01:45
if (ccs) {
globalState.push(`ccs:${ccs}`);
}
globalState.push('refreshInterval:(pause:!f,value:10000),time:(from:now-1h,to:now)');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why add this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To keep it consistent with our defaults, otherwise it'll use timerange default of 15min, which will then persist to next state once they start navigating the app

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but it won't transfer to url_state unless we explicitly provide it (like I'm doing above). You can test this for yourself

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, but I'm still not following.

This is what I'm seeing which seems fine:
Kapture 2020-10-02 at 14 31 51

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There isn't much to figure out. Like I said it falls back to defaults, and for me it's:
Screen Shot 2020-10-02 at 4 16 06 PM

Anyways, we're digressing here. These are all trivial things that are not really related to the scope of this PR and can be addressed/debated as a separate discussion/issue

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My defaults are the same:
Screen Shot 2020-10-02 at 4 31 15 PM

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also relies on localStorage. Maybe try clearing it, or at least the following key:
kibana.timepicker.timeHistory: "[{"from":"now-1h","to":"now"},{"from":"now-15m","to":"now"}]"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it isn't related to this PR and we should defer the conversation, but we should have it at some point. I'm curious if there is an actual bug around this code that we need to fix.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey catching up on this, I think introducing more than one place to define defaults is probably not a good idea and we should understand why it's needed before we introduce that kind of foot-gun.

For now, if the goal is to get this alert in before FF, I recommend removing that line and then during the test plan phase try to understand why you both are seeing these different behaviors, and then document that back on this PR or in a new ticket. Hopefully it's not necessary at all, but if it is, we should export the defaults as a constant and then use it in both places at the very least.

@igoristic igoristic requested a review from chrisronline October 2, 2020 18:07
@igoristic igoristic added this to the Logs UI 7.10 milestone Oct 5, 2020
@igoristic
Copy link
Contributor Author

@elasticmachine merge upstream

@kibanamachine
Copy link
Contributor

💚 Build Succeeded

Metrics [docs]

@kbn/optimizer bundle module count

id before after diff
monitoring 635 638 +3

async chunks size

id before after diff
monitoring 1.2MB 1.2MB +60.0B

distributable file count

id before after diff
default 47085 47087 +2

page load bundle size

id before after diff
monitoring 222.7KB 247.7KB +25.0KB

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

Copy link
Contributor

@chrisronline chrisronline left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@igoristic igoristic merged commit f490268 into elastic:master Oct 5, 2020
@igoristic igoristic deleted the jvm-memory-usage branch October 5, 2020 15:22
igoristic added a commit that referenced this pull request Oct 5, 2020
* Memory usage first draft

* Fixed tests

* CR feedback

* Feedback and tests

* Added size to optimize query

* Removed scheduled check

* Removed globalstate date

Co-authored-by: Kibana Machine <[email protected]>

Co-authored-by: Kibana Machine <[email protected]>
@igoristic
Copy link
Contributor Author

Backport:
7.x: 246f163

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Monitoring][Additional-Alerting] JVM Heap Usage
7 participants