Issue #2616: ZFS Zpool properties on Linux #6724

yvasiyarov · 2019-11-28T13:08:22Z

Updated Linux ZFS agent to collect information exposed by zpool list
It was requested in #2616 but for some reason was not implemented.

Since that information was already collected by ZFS agent on FreeBSD is just reused that implementation and updated unit tests accordingly. Shared part of Linux and FreeBSD agents has been moved to zfs.go file.
It was tested on Debian + FreeBSD 11

Required for all PRs:

[x ] Signed CLA.
[ x] Associated README.md updated.
[ x] Has appropriate unit tests.

yvasiyarov · 2019-12-06T08:04:00Z

@danielnelson Is there anything I can do to get this PR merged ?

Add freeing and leaked options collection

antifuchs · 2020-04-11T22:41:39Z

While this is waiting to be merged (I really hope it gets merged soon!), I hacked around the absence of useful zpool health checks with this mildly ridiculous python script. Feel free to use it in the meantime:

#!/usr/bin/python

import subprocess
import re
import sys

class ZpoolCheck:
    def check_pool_health(self, pool):
        line=subprocess.check_output(['/sbin/zpool', 'status', '-x', pool]).splitlines()[0]
        if re.match("pool '%s' is healthy" % pool, line):
            # healthy, yay: status 0.
            print("zpool_health,pool=%s health_status=0" % pool)
        else:
            # unhealthy, boo: status 1.
            print("zpool_health,pool=%s health_status=1" % pool)

if __name__ == '__main__':
    c = ZpoolCheck()
    for pool in sys.argv[1:]:
        c.check_pool_health(pool)

Install it in a place where it can be found, as check_zpool_health.py and use the following config:

[[inputs.exec]]
  commands = ["/path/to/check_zpool_health.py rpool"]
  data_format = "influx"

Skaronator · 2020-04-29T19:16:31Z

@danielnelson would it be possible to get this merged for the next version? Seems to be ready for review since december. Thanks!

danielnelson · 2020-05-08T07:16:10Z

As the current plugin works using only procfs, I'd prefer not to add a dependency on the zpool command on the Linux side. The performance isn't very good and it introduces a lot of potential issues including differences in the output, complex permission errors, timeouts.

Is it possible to get the critical information we need from /proc? I read that pool health was recently added openzfs/zfs#7563, though I'm not sure which version it is available in.

yvasiyarov · 2020-05-08T08:38:15Z

Hello @danielnelson,
unfortunately that information is not available at /proc and openzfs/zfs#7563 PR will add only one metric (zfs pool status itself, without any metrics).

There are only two options to get zpool properties:

use ioctl syscall directly
parse output of zpool command

None of this options is perfect, but I've chosen the second one. Because:

Its a way easier to support plugin which parse text output than one operating with binary bindings.
Its much easier to install zfs utilities then carry with you and compile binary bindings to syscall

To address your concerns about adding dependencies on zpool command in Linux:

First of all - this dependency is already here. Current implementation of zfs plugin for FreeBSD already depends on zpool command
Second: even if zpool command is not exists, or its output format has been changed - nothing will be broken. Plugin will just do not report zpool metrics.
Thirdly: There is no performance impact on the telegraf service itself. Zpool runs in separate process. Even if its hangs out - it will not affect telegraf itself.

So in worst case - we will report only metrics available at /proc, but normally all plugin users can benefit from new metrics available out of the box. Without using ad-hoc solutions, like one described at #6724 (comment)

tomtastic · 2020-05-12T08:04:29Z

I agree with @yvasiyarov, zpool command dependancy has been there for FreeBSD since the start. We could / should have had this available in Linux many years ago had the previous attempt not been derailed.
It's about time this got resolved once and for all!

danielnelson · 2020-05-12T18:48:42Z

@richardelling I'd like to get your input on this issue and how we should move forward. Should we proceed with shelling out to zpool on Linux, would it make more sense to use your zpool_influxdb utility, or is there another recommended solution. Perhaps OpenZFS has considered adding additional information to the procfs interface?

richardelling · 2020-05-13T21:49:34Z

@danielnelson I believe the containment of a blocked ioctl() is best done outside of telegraf itself. It is quite common for us to prototype new collectors and special-purpose using the socket_listener interface, so I'll propose the following:

I'll contrib zpool_influxdb to the OpenZFS community, so it can be built alongside ZFS, which is the only viable way to keep up with the unstable (as in interface stability) internal ZFS data structures
as part of that, I'll modify to optionally run as a daemon and output to a socket, allowing telegraf folks to simply use the existing socket_listener input plugin
existing stdout remains, so if someone really wants to run it via the exec input plugin, it should work, though perhaps less robustly
there is also histogram data available inside OpenZFS. I'd like to know the best way to send that to influxdb for heatmaps. In prometheus the buckets are labels (tags) because they have only one field. So is it better to have more fields or higher cardinality?

danielnelson · 2020-05-13T23:44:58Z

Socket_listener works for me, this improves on the zpool method as it doesn't need launched each interval and has machine readable output.

One more option that we now have is the execd input, which would allow Telegraf to run zpool_influxdb as a long-lived child process and read the standard out. This would act in a similar way to the socket_listener method, in that it would be zpool_influxdb's job to push updates.

Our current recommendation for doing histograms is essentially the Prometheus model, using tags for the bucket bounds and the same field key for all buckets. Cardinality actually ends up the same since each unique field key is a new series.

cpu,cpu=cpu1,le=0.0 usage_idle_bucket=0i
cpu,cpu=cpu1,le=10.0 usage_idle_bucket=1i
cpu,cpu=cpu1,le=50.0 usage_idle_bucket=2i
cpu,cpu=cpu1,le=100.0 usage_idle_bucket=4i
cpu,cpu=cpu1,le=+Inf usage_idle_bucket=4i

richardelling · 2020-05-14T23:54:26Z

got it.
We definitely want the ZFS ioctl consumer in a separate process than telegraf and it is better kept in the OpenZFS source tree.

richardelling · 2020-05-23T01:18:49Z

I have added histograms and setup zpool_influxdb to work well with the (awesome) execd input plugin, as an option. I'll push this into my master branch after a little bit of soak time and then work to get it added to the OpenZFS contrib directory. Meanwhile, more eyeballs are welcome.
https://github.com/richardelling/zpool_influxdb/tree/histograms

danielnelson · 2020-06-10T02:02:55Z

@richardelling This is really great, thanks so much for doing this.

While it is super easy to setup with execd (below), it still might make sense to integrate this directly into the zfs input to give a more polished interface. If someone is willing to do this, I'd be happy to merge it.

[[inputs.execd]]
  command = ["zpool_influxdb", "--execd"]
  signal = "STDIN"
  restart_delay = "10s"
  data_format = "influx"

richardelling · 2020-06-10T20:29:52Z

@danielnelson if I understand correctly, we could hide complexity by automatically running zpool_influxdb in the zfs plugin, if zpool_influxdb exists on the system. This is certainly doable.

As I'm putting together the merge into openzfs upstream, I'll include a telegraf.d/zpool_influxdb.conf as well as some dashboards. Would it also make sense to install into /etc/telegraf/telegraf.d if it exists?

danielnelson · 2020-06-17T04:22:44Z

Would it also make sense to install into /etc/telegraf/telegraf.d if it exists?

This is an interesting idea, I like it because you automatically start receiving the stats if you are a Telegraf user. I can however think about a few cases where this could cause problems:

If you are running Telegraf < 1.14 the execd plugin won't exist and you would get an error starting Telegraf.
Since it's not a pattern we generally use it may conflict with manual setup, unlikely at first but more likely if we integrate this with the zfs plugin.

Maybe it would be safer to include the configuration as part of the package documentation?

Mic92 · 2020-11-06T10:25:56Z

This is a pure awk solution to get data errors and pool status in the influx line protocol:

#!/usr/bin/env awk -f
BEGIN {
    while ("zpool status" | getline) {
        if ($1 ~ /pool:/) { printf "zpool_status,name=%s ", $2 }
        if ($1 ~ /state:/) { printf " state=\"%s\",", $2 }
        if ($1 ~ /errors:/) {
            if (index($2, "No")) printf "errors=0i\n"; else printf "errors=%di\n", $2
        }
    }
}

sudo awk -f ./test.sh
zpool_status,name=data  state="ONLINE",errors=0i
zpool_status,name=zroot  state="ONLINE",errors=0i

original output:

$ sudo zpool status
  pool: zroot
 state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
        still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
        the pool may no longer be accessible by software that does not support
        the features. See zpool-features(5) for details.
  scan: scrub repaired 0B in 00:25:09 with 0 errors on Sun Nov  1 08:09:42 2020
config:

        NAME                               STATE     READ WRITE CKSUM
        zroot                              ONLINE       0     0     0
          nvme-eui.ace42e0090114c31-part5  ONLINE       0     0     0

errors: No known data errors

  pool: ztest
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
  scan: scrub repaired 0B in 00:00:00 with 1 errors on Fri Nov  6 10:47:10 2020
config:

        NAME        STATE     READ WRITE CKSUM
        ztest       ONLINE       0     0     0
          test      ONLINE       0     0     1

errors: 1 data errors, use '-v' for a list

powersj · 2021-10-26T17:27:37Z

Hi Folks,

This has been sitting for almost a year now. I think there are a couple of solutions here already and so I am going to close this PR.

Thanks!

yvasiyarov force-pushed the zfs_zpool_linux branch from f3fd6a8 to a1ed043 Compare November 28, 2019 13:47

danielnelson added the feat Improvement on an existing feature such as adding a new setting/mode to an existing plugin label Dec 2, 2019

Issue influxdata#2616: ZFS Zpool properties on Linux

e46ec51

Add freeing and leaked options collection

yvasiyarov force-pushed the zfs_zpool_linux branch from 8aa7429 to e46ec51 Compare December 19, 2019 11:19

yvasiyarov mentioned this pull request Jan 10, 2020

[Feature request] ZFS Zpool properties on Linux #2616

Closed

yvasiyarov requested a review from danielnelson January 28, 2020 09:01

Hipska added the area/system label Feb 16, 2021

reimda removed the request for review from danielnelson June 17, 2021 21:04

powersj closed this Oct 26, 2021

powersj mentioned this pull request Apr 3, 2023

Add capacity, used and free space of ZFS pool on Linux #13014

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue #2616: ZFS Zpool properties on Linux #6724

Issue #2616: ZFS Zpool properties on Linux #6724

yvasiyarov commented Nov 28, 2019

yvasiyarov commented Dec 6, 2019

antifuchs commented Apr 11, 2020 •

edited

Loading

Skaronator commented Apr 29, 2020

danielnelson commented May 8, 2020

yvasiyarov commented May 8, 2020

tomtastic commented May 12, 2020

danielnelson commented May 12, 2020 •

edited

Loading

richardelling commented May 13, 2020

danielnelson commented May 13, 2020

richardelling commented May 14, 2020

richardelling commented May 23, 2020

danielnelson commented Jun 10, 2020

richardelling commented Jun 10, 2020

danielnelson commented Jun 17, 2020

Mic92 commented Nov 6, 2020 •

edited

Loading

powersj commented Oct 26, 2021

Issue #2616: ZFS Zpool properties on Linux #6724

Issue #2616: ZFS Zpool properties on Linux #6724

Conversation

yvasiyarov commented Nov 28, 2019

Required for all PRs:

yvasiyarov commented Dec 6, 2019

antifuchs commented Apr 11, 2020 • edited Loading

Skaronator commented Apr 29, 2020

danielnelson commented May 8, 2020

yvasiyarov commented May 8, 2020

tomtastic commented May 12, 2020

danielnelson commented May 12, 2020 • edited Loading

richardelling commented May 13, 2020

danielnelson commented May 13, 2020

richardelling commented May 14, 2020

richardelling commented May 23, 2020

danielnelson commented Jun 10, 2020

richardelling commented Jun 10, 2020

danielnelson commented Jun 17, 2020

Mic92 commented Nov 6, 2020 • edited Loading

powersj commented Oct 26, 2021

antifuchs commented Apr 11, 2020 •

edited

Loading

danielnelson commented May 12, 2020 •

edited

Loading

Mic92 commented Nov 6, 2020 •

edited

Loading