Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Show max memory limit in the UI #10268

Open
DingoEatingFuzz opened this issue Mar 31, 2021 · 4 comments
Open

Show max memory limit in the UI #10268

DingoEatingFuzz opened this issue Mar 31, 2021 · 4 comments

Comments

@DingoEatingFuzz
Copy link
Contributor

#10247 introduces the ability to describe memory as both a soft and hard limit. The soft limit (memory) tells the scheduler how much memory needs to be set aside, the hard limit (memory_max) tells Nomad at what point a task should be OOMed.

This nuance also needs to be communicated in the UI. There are three pieces to this of varying scope.

  1. Show this metadata in the task group details ribbon
  2. Show both the soft and hard limit in the memory utilization graph for both allocations and tasks
  3. Show oversubscription at a client level on both the client detail page and the topology visualization

Show this metadata in the task group details ribbon

This one is straightforward. Mimic the language and data used in the CLI updates on the task group detail page. The numbers in this ribbon are already an aggregate of individual task requirements.

If a task group has no memory_max set, then this ribbon should be unchanged.

Standardoversubscription

Show both the soft and hard limit in the memory utilization graph for both allocations and tasks

First and foremost, this can be deferred. If we make no changes to this graph, it will naturally report utilization percentages above 100% and the y-axis will adjust, just like we do with CPU soft limits already. This is still pretty confusing though, since it's unclear if the percentage is based on the soft limit or the hard limit.

We can improve this by doing the following:

  1. Changing the y-axis to be based on the hard limit so utilization could never go over 100%
  2. Add the soft limit as a horizontal annotation just like the reserved capacities on clients are presented now (currently unreleased on main)
  3. Potentially segmenting the point-in-time utilization progress bar to make it immediately clear when the soft limit threshold is surpassed.

If an allocation has no memory_max set, this graph should have no annotation.

alloc-detail-oversubscription

Show oversubscription at a client level on both the client detail page and the topology visualization

There are no designs for this yet. Just wanted to mention it here to track the concept.

@DingoEatingFuzz DingoEatingFuzz added this to the 1.1.0 milestone Mar 31, 2021
backspace added a commit that referenced this issue Apr 27, 2021
backspace added a commit that referenced this issue Apr 28, 2021
This is the first step in #10268. If a maximum is not specified, the
task group sum uses the memory number instead. The maximum is only
shown when it’s higher than the memory sum.
@tgross
Copy link
Member

tgross commented May 4, 2021

Closed by #10459

@tgross tgross closed this as completed May 4, 2021
@backspace
Copy link
Contributor

I’ve only accomplished item 1 from the first list here and am working on 2 at the moment:

image

I’ll reopen but let me know if there’s some better way to track this?

@backspace backspace reopened this May 4, 2021
@tgross
Copy link
Member

tgross commented May 4, 2021

I’ve only accomplished item 1 from the first list here and am working on 2 at the moment:

Oops, sorry!

@backspace
Copy link
Contributor

backspace commented May 14, 2021

I’m leaning toward #10459 being an incorrect implementation, now that I understand this better. Or at least subpar, as I’m not sure how else to accomplish it…

When I run a Nomad dev agent without memory oversubscription enabled, I get a warning when submitting a job with a memory_max-configured task that since oversubscription isn’t enabled, that configuration will be ignored. But the API response for the job still returns the memory_max within the task’s Resources:

image

The task group details ribbon checks whether the sum of provided memory_maxes on its tasks is greater than the sum of the memorys and shows the bracketed maximum if so. This shows regardless of whether oversubscription is actually working.

I’ve subsequently understood that the allocation response is a place to determine the true situation vs the configured one. In this screenshot, I have #10508 running against two different dev agents; the left has oversubscription enabled, the right does not. You can see that AllocatedResources and Resources in the allocation response reflect the true state of things. The primary metric chart only shows the oversubscription annotation on the left, as expected.

image

So… I’m not sure what to do about the task group details ribbon, as it seems incorrect to me to present the configured memory_max even when it’s ignored, but it’s also not possible to know whether it’s been ignored from the information available to it 🤔

The allocation metric annotation is correct now, at least, but I’m struggling with accessing AllocatedResources.Tasks to properly determine the task metric annotation 😢 ETA the answer is: task states, the data is already there 😆

@backspace backspace assigned backspace and unassigned notnoop May 17, 2021
backspace added a commit that referenced this issue May 18, 2021
This is a reversion from #10459, more background here:
#10268 (comment)
@tgross tgross removed this from the 1.1.1 milestone Jun 3, 2021
@mikenomitch mikenomitch moved this to Backlog in Nomad UI Dec 22, 2022
@mikenomitch mikenomitch moved this from Backlog to Todo in Nomad UI Jan 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Todo
Development

No branches or pull requests

4 participants