-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue #2616: ZFS Zpool properties on Linux #6724
Conversation
f3fd6a8
to
a1ed043
Compare
@danielnelson Is there anything I can do to get this PR merged ? |
Add freeing and leaked options collection
8aa7429
to
e46ec51
Compare
While this is waiting to be merged (I really hope it gets merged soon!), I hacked around the absence of useful zpool health checks with this mildly ridiculous python script. Feel free to use it in the meantime: #!/usr/bin/python
import subprocess
import re
import sys
class ZpoolCheck:
def check_pool_health(self, pool):
line=subprocess.check_output(['/sbin/zpool', 'status', '-x', pool]).splitlines()[0]
if re.match("pool '%s' is healthy" % pool, line):
# healthy, yay: status 0.
print("zpool_health,pool=%s health_status=0" % pool)
else:
# unhealthy, boo: status 1.
print("zpool_health,pool=%s health_status=1" % pool)
if __name__ == '__main__':
c = ZpoolCheck()
for pool in sys.argv[1:]:
c.check_pool_health(pool) Install it in a place where it can be found, as [[inputs.exec]]
commands = ["/path/to/check_zpool_health.py rpool"]
data_format = "influx" |
@danielnelson would it be possible to get this merged for the next version? Seems to be ready for review since december. Thanks! |
As the current plugin works using only procfs, I'd prefer not to add a dependency on the zpool command on the Linux side. The performance isn't very good and it introduces a lot of potential issues including differences in the output, complex permission errors, timeouts. Is it possible to get the critical information we need from |
Hello @danielnelson, There are only two options to get zpool properties:
None of this options is perfect, but I've chosen the second one. Because:
To address your concerns about adding dependencies on zpool command in Linux:
So in worst case - we will report only metrics available at /proc, but normally all plugin users can benefit from new metrics available out of the box. Without using ad-hoc solutions, like one described at #6724 (comment) |
I agree with @yvasiyarov, zpool command dependancy has been there for FreeBSD since the start. We could / should have had this available in Linux many years ago had the previous attempt not been derailed. |
@richardelling I'd like to get your input on this issue and how we should move forward. Should we proceed with shelling out to zpool on Linux, would it make more sense to use your zpool_influxdb utility, or is there another recommended solution. Perhaps OpenZFS has considered adding additional information to the procfs interface? |
@danielnelson I believe the containment of a blocked ioctl() is best done outside of telegraf itself. It is quite common for us to prototype new collectors and special-purpose using the
|
Socket_listener works for me, this improves on the zpool method as it doesn't need launched each interval and has machine readable output. One more option that we now have is the execd input, which would allow Telegraf to run zpool_influxdb as a long-lived child process and read the standard out. This would act in a similar way to the socket_listener method, in that it would be zpool_influxdb's job to push updates. Our current recommendation for doing histograms is essentially the Prometheus model, using tags for the bucket bounds and the same field key for all buckets. Cardinality actually ends up the same since each unique field key is a new series.
|
got it. |
I have added histograms and setup |
@richardelling This is really great, thanks so much for doing this. While it is super easy to setup with [[inputs.execd]]
command = ["zpool_influxdb", "--execd"]
signal = "STDIN"
restart_delay = "10s"
data_format = "influx" |
@danielnelson if I understand correctly, we could hide complexity by automatically running As I'm putting together the merge into openzfs upstream, I'll include a telegraf.d/zpool_influxdb.conf as well as some dashboards. Would it also make sense to install into /etc/telegraf/telegraf.d if it exists? |
This is an interesting idea, I like it because you automatically start receiving the stats if you are a Telegraf user. I can however think about a few cases where this could cause problems:
Maybe it would be safer to include the configuration as part of the package documentation? |
This is a pure awk solution to get data errors and pool status in the influx line protocol: #!/usr/bin/env awk -f
BEGIN {
while ("zpool status" | getline) {
if ($1 ~ /pool:/) { printf "zpool_status,name=%s ", $2 }
if ($1 ~ /state:/) { printf " state=\"%s\",", $2 }
if ($1 ~ /errors:/) {
if (index($2, "No")) printf "errors=0i\n"; else printf "errors=%di\n", $2
}
}
} sudo awk -f ./test.sh
zpool_status,name=data state="ONLINE",errors=0i
zpool_status,name=zroot state="ONLINE",errors=0i original output: $ sudo zpool status
pool: zroot
state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
the pool may no longer be accessible by software that does not support
the features. See zpool-features(5) for details.
scan: scrub repaired 0B in 00:25:09 with 0 errors on Sun Nov 1 08:09:42 2020
config:
NAME STATE READ WRITE CKSUM
zroot ONLINE 0 0 0
nvme-eui.ace42e0090114c31-part5 ONLINE 0 0 0
errors: No known data errors
pool: ztest
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
scan: scrub repaired 0B in 00:00:00 with 1 errors on Fri Nov 6 10:47:10 2020
config:
NAME STATE READ WRITE CKSUM
ztest ONLINE 0 0 0
test ONLINE 0 0 1
errors: 1 data errors, use '-v' for a list |
Hi Folks, This has been sitting for almost a year now. I think there are a couple of solutions here already and so I am going to close this PR. Thanks! |
Updated Linux ZFS agent to collect information exposed by zpool list
It was requested in #2616 but for some reason was not implemented.
Since that information was already collected by ZFS agent on FreeBSD is just reused that implementation and updated unit tests accordingly. Shared part of Linux and FreeBSD agents has been moved to zfs.go file.
It was tested on Debian + FreeBSD 11
Required for all PRs: