-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nomad.client.allocs.memory.usage
reports 0 for jobs with raw_exec driver
#9073
Comments
Hi @fredwangwang I was able to verify this on the current job "example" {
datacenters = [ "dc1" ]
group "group" {
task "task" {
driver = "raw_exec"
config {
command = "ping"
args = ["8.8.8.8"]
}
resources {
cpu = 200
memory = 200
}
}
}
} Resulting metrics output{
"Labels": {
"host": "linux",
"job": "example",
"task_group": "group",
"alloc_id": "ccfbfe98-911c-0c10-2cdc-af8bac62820f",
"task": "task",
"namespace": "default"
},
"Name": "nomad.client.allocs.cpu.allocated",
"Value": 200
}
{
"Labels": {
"namespace": "default",
"host": "linux",
"job": "example",
"task_group": "group",
"alloc_id": "ccfbfe98-911c-0c10-2cdc-af8bac62820f",
"task": "task"
},
"Name": "nomad.client.allocs.cpu.system",
"Value": 0.016665318980813026
}
{
"Labels": {
"job": "example",
"task_group": "group",
"alloc_id": "ccfbfe98-911c-0c10-2cdc-af8bac62820f",
"task": "task",
"namespace": "default",
"host": "linux"
},
"Name": "nomad.client.allocs.cpu.throttled_periods",
"Value": 0
}
{
"Labels": {
"job": "example",
"task_group": "group",
"alloc_id": "ccfbfe98-911c-0c10-2cdc-af8bac62820f",
"task": "task",
"namespace": "default",
"host": "linux"
},
"Name": "nomad.client.allocs.cpu.throttled_time",
"Value": 0
}
{
"Labels": {
"task": "task",
"namespace": "default",
"host": "linux",
"job": "example",
"task_group": "group",
"alloc_id": "ccfbfe98-911c-0c10-2cdc-af8bac62820f"
},
"Name": "nomad.client.allocs.cpu.total_percent",
"Value": 0.016665318980813026
}
{
"Labels": {
"host": "linux",
"job": "example",
"task_group": "group",
"alloc_id": "ccfbfe98-911c-0c10-2cdc-af8bac62820f",
"task": "task",
"namespace": "default"
},
"Name": "nomad.client.allocs.cpu.total_ticks",
"Value": 0.38380229473114014
}
{
"Labels": {
"namespace": "default",
"host": "linux",
"job": "example",
"task_group": "group",
"alloc_id": "ccfbfe98-911c-0c10-2cdc-af8bac62820f",
"task": "task"
},
"Name": "nomad.client.allocs.cpu.user",
"Value": 0
}
{
"Labels": {
"task_group": "group",
"alloc_id": "ccfbfe98-911c-0c10-2cdc-af8bac62820f",
"task": "task",
"namespace": "default",
"host": "linux",
"job": "example"
},
"Name": "nomad.client.allocs.memory.allocated",
"Value": 209715200
}
{
"Labels": {
"host": "linux",
"job": "example",
"task_group": "group",
"alloc_id": "ccfbfe98-911c-0c10-2cdc-af8bac62820f",
"task": "task",
"namespace": "default"
},
"Name": "nomad.client.allocs.memory.cache",
"Value": 0
}
{
"Labels": {
"job": "example",
"task_group": "group",
"alloc_id": "ccfbfe98-911c-0c10-2cdc-af8bac62820f",
"task": "task",
"namespace": "default",
"host": "linux"
},
"Name": "nomad.client.allocs.memory.kernel_max_usage",
"Value": 0
}
{
"Labels": {
"job": "example",
"task_group": "group",
"alloc_id": "ccfbfe98-911c-0c10-2cdc-af8bac62820f",
"task": "task",
"namespace": "default",
"host": "linux"
},
"Name": "nomad.client.allocs.memory.kernel_usage",
"Value": 0
}
{
"Labels": {
"task_group": "group",
"alloc_id": "ccfbfe98-911c-0c10-2cdc-af8bac62820f",
"task": "task",
"namespace": "default",
"host": "linux",
"job": "example"
},
"Name": "nomad.client.allocs.memory.max_usage",
"Value": 0
}
{
"Labels": {
"job": "example",
"task_group": "group",
"alloc_id": "ccfbfe98-911c-0c10-2cdc-af8bac62820f",
"task": "task",
"namespace": "default",
"host": "linux"
},
"Name": "nomad.client.allocs.memory.rss",
"Value": 33705984
}
{
"Labels": {
"namespace": "default",
"host": "linux",
"job": "example",
"task_group": "group",
"alloc_id": "ccfbfe98-911c-0c10-2cdc-af8bac62820f",
"task": "task"
},
"Name": "nomad.client.allocs.memory.swap",
"Value": 0
}
{
"Labels": {
"task": "task",
"namespace": "default",
"host": "linux",
"job": "example",
"task_group": "group",
"alloc_id": "ccfbfe98-911c-0c10-2cdc-af8bac62820f"
},
"Name": "nomad.client.allocs.memory.usage",
"Value": 0
} I've also verified that this appears to be working as expected with the |
I took a second look at that and I'm realizing we don't actually test that the results are non-zero though. So that might be where it's slipping through. I'll dig into this further. |
Hey. I am also seeing discrepancies between values for the actual CPU and memory utilization of the exe on the node vs the values shown on the Nomad UI. |
@dbkukku can you open a new issue for that explaining what you're seeing in more detail? That seems like it's a different problem. |
I dug a little bit deeper and I believe I found the issue. The flow approximately is:
Since the resource usage is reporting correctly for docker and exec for example, it has to be an issue in raw_exec driver. ExecutorBasicMeasuredMemStats = []string{"RSS", "Swap"} ms.RSS = memInfo.RSS
ms.Swap = memInfo.Swap
ms.Measured = ExecutorBasicMeasuredMemStats The raw_exec only expose those two metrics. It makes sense since raw_exec depends greatly on the underlying OS to provide the metrics, which can be hugely different. But since whats been measured is propagated back to the task runner by setting @tgross could you provide some thoughts on this? Thanks! |
Hi @fredwangwang looks like that's exactly it! I compared the results and code path of the The I think you're right that the best approach would be to not report the metrics that we're not collecting. It looks like the gauges are being written in the client, in the task runner: |
Thanks for following up @tgross! I think we should be able to work with the metrics (RSS specifically) that we are getting, but still needs to understand that a bit more. The confusion was mainly around that certain metrics reporting 0 where it should really just not showing up IMO. Looking forward to seeing it getting fixed or hearing more about why it is kept to report |
I had a conversation with folks here internally and that section of |
Thanks @tgross! A small note, the docs https://www.nomadproject.io/docs/telemetry/metrics#allocation-metrics probably needs to be updated as well to call out that memory metrics emitted depends on the driver type a task uses |
Just for reference: Using |
any update on this? Would love to have that available. |
Hi @it-ben there are no updates currently. When an engineer is assigned to this and working on it, updates will be provided in the issue. |
Nomad version
Nomad v0.12.3 (2db8abd)
Operating system and Environment details
This happens on both windows nodes and linux nodes
Issue
Nomad reports wrong (
0
) nomad.client.allocs.memory.usage metric for jobs with raw_exec driverReproduction steps
deploy:
and see
http://host-with-above-alloc/v1/metrics
:Job file (if appropriate)
see above
The text was updated successfully, but these errors were encountered: