Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Can't gather metrics from nvme drive #344

Closed
jorgepimentel opened this issue Jul 30, 2022 · 6 comments
Closed

[BUG] Can't gather metrics from nvme drive #344

jorgepimentel opened this issue Jul 30, 2022 · 6 comments
Labels
bug Something isn't working waiting for response

Comments

@jorgepimentel
Copy link

jorgepimentel commented Jul 30, 2022

I define my nvme drive in the devices section of the docker compose file, like I did on version 0.3xx, and the nvme drive does not show up in the list on the webUI.

The Expected behaviour is for the nvme drive to show up and have metrics.

I debugged this a bit by going into the Scrutiny container and running:
scrutiny-collector-metrics run

I can see that it fails to get metrics for the nvme drive:

INFO[0000] Executing command: smartctl --info --json --device nvme /dev/nvme0  type=metrics
ERRO[0000] Could not retrieve device information for nvme0: exit status 2  type=metrics

When trying to run that same smartctl command on the host, I get this error:

{
  "json_format_version": [
    1,
    0
  ],
  "smartctl": {
    "version": [
      7,
      1
    ],
    "svn_revision": "5022",
    "platform_info": "x86_64-linux-5.4.0-122-generic",
    "build_info": "(local build)",
    "argv": [
      "smartctl",
      "--info",
      "--json",
      "--device",
      "nvme",
      "/dev/nvme0",
      "type=metrics"
    ],
    "messages": [
      {
        "string": "ERROR: smartctl takes ONE device name as the final command-line argument.",
        "severity": "error"
      }
    ],
    "exit_status": 1
  }
}

So clearly it can't get the metrics for the nvme drive. Bug in the command?!

Let me know if I can help with anything else regarding this issue.

@jorgepimentel jorgepimentel added the bug Something isn't working label Jul 30, 2022
@AnalogJ
Copy link
Owner

AnalogJ commented Aug 3, 2022

This is definitely weird. Can you try running smartctl manually with the following command?

smartctl --info --json /dev/nvme0

What's the output?

@AnalogJ
Copy link
Owner

AnalogJ commented Aug 17, 2022

Hey @jorgepimentel
I'm going to close this issue as stale for now. If you're still having this issue, please comment below and we can re-open it.

Thanks!

@AnalogJ AnalogJ closed this as completed Aug 17, 2022
@jorgepimentel
Copy link
Author

jorgepimentel commented Oct 3, 2022

Hi @AnalogJ sorry, I went on holiday and missed this completely. Strangely enough it started working a few days later. It started detecting the nvme and showing it in the webUI. But I had to take the container down the other day, and when I spun it back up it wasn't decting the nvme anymore with the same error I pasted on my original post.

Here is the output of smartctl --info --json /dev/nvme0 like you requested:

root@8ee717262a45:/opt/scrutiny# smartctl --info --json /dev/nvme0
{
  "json_format_version": [
    1,
    0
  ],
  "smartctl": {
    "version": [
      7,
      2
    ],
    "svn_revision": "5155",
    "platform_info": "x86_64-linux-5.4.0-126-generic",
    "build_info": "(local build)",
    "argv": [
      "smartctl",
      "--info",
      "--json",
      "/dev/nvme0"
    ],
    "messages": [
      {
        "string": "Read NVMe Identify Controller failed: NVME_IOCTL_ADMIN_CMD: Permission denied",
        "severity": "error"
      }
    ],
    "exit_status": 2
  },
  "device": {
    "name": "/dev/nvme0",
    "info_name": "/dev/nvme0",
    "type": "nvme",
    "protocol": "NVMe"
  }
}

EDIT: I noticed that running that same command on metal gives me exactly the same results. And in order to get correct results I have to run the command with elevated priviliges (with sudo). Running the same command in the collector container, but on /dev/sda, all works fine, so I am not sure why /dev/nvme0 would need elavated priviliges.

@jorgepimentel
Copy link
Author

Please close this. I read a bit better through the docs and I was missing:


cap_add:
      - SYS_ADMIN

Now with the following it detects the nvme fine:


cap_add:
      - SYS_RAWIO
      - SYS_ADMIN

Sorry about that. Documentation in github can get a bit confusing sometimes.

@RapidFire05
Copy link

I had this same issue. For me I had to change 2 things. I had to add SYS_ADMIN in addition to SYS_RAWIO. Also, when i did fdisk -l to view devices my nvme device appeared as /dev/nvme0n1 with 2 partitions nvme0n1p1 and nvme0n1p2 so I put in nvme0n1. However on a hunch I tried passing the device as nvme0 and then it worked.

@grigio
Copy link

grigio commented Nov 25, 2023

I also had this issue

cap_add:
      - SYS_RAWIO
      - SYS_ADMIN

Should be in the docker-compsoe.yml example file

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working waiting for response
Projects
None yet
Development

No branches or pull requests

4 participants