Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question client: error fetching host resource usage stats: permission denied #1680

Closed
discount-elf opened this issue Sep 2, 2016 · 14 comments

Comments

@discount-elf
Copy link

If you have a question, prepend your issue with [question] or preferably use the nomad mailing list.

If filing a bug please include the following:

Nomad version

Nomad v0.4.1

Operating system and Environment details

Ubuntu 16.04 amd64

Issue

While running nomad as a non root user:
client: error fetching host resource usage stats: permission denied

Looking at:

func (h *HostStatsCollector) Collect() (*HostStats, error) {

I don't think any (could be wrong) of these collection actions should require root. I am guessing scheduling will not work correctly without proper host stats?

Reproduction steps

Run nomad as non root user

Nomad Server logs (if appropriate)

Nomad Client logs (if appropriate)

error fetching host resource usage stats: permission denied

Job file (if appropriate)

@diptanu
Copy link
Contributor

diptanu commented Sep 2, 2016

Can you go through the client logs and share with us the WARN messages related to this?
If the client can't collect stats it writes the following log line -
[WARN] client: error fetching host resource usage stats It would help me understand which file the stats collector is not able to read.

@dadgar
Copy link
Contributor

dadgar commented Nov 10, 2016

@sigil66 Are you still running into this?

@dadgar dadgar added this to the v0.5.1 milestone Nov 10, 2016
@discount-elf
Copy link
Author

@dadgar we switched to running nomad as root on worker nodes, to deal with the issue. If the issue should be fixed I can test as an unprivileged user.

@dadgar
Copy link
Contributor

dadgar commented Nov 10, 2016

@sigil66 I don't think we were ever able to reproduce it. I mainly wanted to see if it was still reproducible for you. If you do test would you mind trying on 0.5.0-rc2? https://releases.hashicorp.com/nomad/0.5.0-rc2/

@discount-elf
Copy link
Author

@dadgar sure thing I will do so when I get a few spare cycles and will update accordingly.

@christoe
Copy link
Contributor

christoe commented Dec 8, 2016

We've now started seeing this on some(!) of our hosts:

nomad: 2016/12/08 07:16:05.780502 [WARN] client: error fetching host resource usage stats: permission denied nomad[12155]: client: error fetching host resource usage stats: permission denied

In our case it seems as if this started appearing after upgrading to 0.5.0. I'm currently trying to chase down what differences there might be between the hosts that may be causing this.

@christoe
Copy link
Contributor

christoe commented Dec 8, 2016

After some digging, strace gave me the answer (at least to my cause):

28666 statfs("/var/lib/<mounted_partition>", 0xc4205b0af8) = -1 EACCES (Permission denied)

Turns out the disk library used by Nomad to determine disk usage gets all mounted filesystems from /etc/mtab. However, not all of those filesystems are allowed to be stat:ed by a non-root user - which causes the permission denied error.

@diptanu
Copy link
Contributor

diptanu commented Dec 8, 2016

@sigil66 I am of the opinion that we close this ticket since reading /etc/mtab is the most reliable way to get the disk information out, which is what tools like df use as well. Do you have any opinions/thoughts?

@christoe
Copy link
Contributor

christoe commented Dec 9, 2016

@diptanu However (correct me if I'm wrong) after looking at the code it seems as if Collect() returns as soon as Usage() returns an error. Shouldn't the code be more resilient - i.e. continue to warn about the failed stat, but collect data from the partitions the user actually have permission to read?

@dadgar
Copy link
Contributor

dadgar commented Dec 12, 2016

@christoe I agree, could you file an issue with that project and cross link the issues?

@christoe
Copy link
Contributor

@dadgar: I think this should be fixed in Nomad. See reference PR #2090.

diptanu added a commit that referenced this issue Dec 13, 2016
#1680: Continue after not being able to stat a mountpoint
@dadgar
Copy link
Contributor

dadgar commented Dec 13, 2016

Closed via #2090

@dadgar dadgar closed this as completed Dec 13, 2016
@dadgar
Copy link
Contributor

dadgar commented Dec 13, 2016

Thanks!

@tgross tgross modified the milestones: near-term, unscheduled Jan 9, 2020
@github-actions
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 14, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants