-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Missing allocation resource use metrics #5928
Comments
Are the any jobs registered with running allocations on the client that you're talking to? For example:
|
@cgbaker
There are certainly running allocations on all the up clients:
|
@damoxc , there is a reported issue where, under load on the client node, all of the |
@cgbaker are you able to share the configuration files for your simple example so I could compare them to what I have? |
They weren't saved before the node was torn down, but the telemetry section was the same as you posted above. I will spin up a cluster and try again. |
I've added the same |
And to be clear, the prod cluster has the same configuration:
|
Yes. We also have:
|
We're also seeing this affecting our Nomad deployment with Ubuntu Linux 16.04 clients and Nomad v0.9.2 |
@endocrimes We've actually had jobs start out successfully reporting metrics, only to later have all allocation usage metrics stop on a nomad-client server. Here's an example job we use for deploying traefik:
In fact, for this job, we have 3 instances, where two are currently reporting metrics, and one is not, each on its own Nomad-client server. None of the allocations on the affected Nomad-client server are reporting allocation utilization metrics. |
@mgeggie That's very useful - thank you! Could you possibly send us any client logs from the bad clients to |
@endocrimes I sent logs over to |
Thanks @endocrimes . We've seen that host disk collection error since starting our Nomad cluster a few months back. I'll check out #6349 to see about resolving the issue. Also note, we've restarted the Nomad process on our afflicted Nomad client, and allocation stats that had been reporting and stopped did not recover after restarting the process. We're about to upgrade our Nomad cluster to 0.9.5. I'll report back on the status of our afflicted nodes once that's complete. |
Hi @endocrimes just an update on our upgrade; after upgrading our troublesome nomadclient to 0.9.5 and rebooting the server, allocation resource use metrics are once again being produced. |
Hey there Since this issue hasn't had any activity in a while - we're going to automatically close it in 30 days. If you're still seeing this issue with the latest version of Nomad, please respond here and we'll keep this open and take another look at this. Thanks! |
Hey @tgross I only received the waiting-reply label and notification and the issue was closed in the same hour and I'm still curious how our cluster entered a state where it stopped producing allocation metrics. I can report that in the 3 months since upgrading to Nomad-0.9.5 we haven't had any further issues of losing allocation metrics. |
Hi @mgeggie sorry about the confusion over that. I saw the notification from the bot and it looked to me like the issue had been resolved with the upgrade? I know we updated the prometheus and libcontainer clients in 0.9.4 so the issue may have been upstream, but we also improved CPU utilization for busy clusters in that same release and have seen Nomad CPU utilization impact metrics collection for other users. I've recently added extra testing for both host and allocation metrics collection in our end-to-end test suite and I've got an open ticket to upgrade our |
I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues. |
Nomad version
Nomad v0.9.3 (c5e8b66c3789e4e7f9a83b4e188e9a937eea43ce)
Operating system and Environment details
Server: Ubuntu 18.04
Client: Windows 2019
Driver: Docker
Issue
There doesn't appear to be any allocation resource usage metrics exposed after adding to both server and client's configuration:
I've tried querying the server and clients, there doesn't appear to be any
nomad.client
metrics exposed on the server and on the clients there is only basic totals, none of the allocation metrics as listed on https://www.nomadproject.io/docs/telemetry/metrics.html#allocation-metrics.Server Metrics
Client Metrics
The text was updated successfully, but these errors were encountered: