-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
inputs.zfs misses pools without objset-* files #14993
Comments
Hi, Looking at the Is there some heuristic we can use to determine which one to read from other than one has all zeros? As the plugin treats the files under |
In my case, the normal-use pools contain an io file for the entire pool, and objset files for each contained dataset, while the lustre-use pools only have the io file. To complicate matters, the io file data is measuring different things than the objset- files. It is also removed in zfs >= 2.1 ( #openzfs/zfs#13810 better describes the problem). This also affects Prometheus: #prometheus/node_exporter#2068. I'll poke around the lustre and zfs groups to see why there is no objset created, as having one would likely solve this problem. Otherwise, periodically parsing the output of |
Yeah the comments in the code and that thread seem to indicate that the io method is the old/deprecated, hence why it is the fallback and not default. Happy to see if we can figure something out, but it does seem like the lustre pool creation (?) is doing something that is perhaps the "old" or legacy way? let us know what you find out or if you think of a way to determine which way to parse. |
I think this is what's happening: the objset file for a dataset is only created when that dataset is mounted ( #openzfs/zfs#10928), but Lustre doesn't mount the object store dataset; it accesses it at a lower level. One big difference between the old io file vs new per-dataset objset files is that the new way will not recognize non-dataset pool io. I really want to see the effects of pool scrubs and resilvering operations, which should not appear in any dataset stats. It may be simplest to grab the stats directly from the disks that make up each pool, then aggregate. Still, eek. |
As a work-around to help aggregate, the device_tags option in diskio is quite handy. The pool name is included as "ID_FS_LABEL"
The other options are just there as the disks are all multipath and aliased with descriptive DM names, and double-counting is an issue. This captures the zpool scrub reading, but not reads from the ARC or effects of compression, so the pool read speed from inputs.zfs can be much higher than the raw read from disk (i think). One caveat is that disks that are in a zpool that hasn't been imported, or disks that were once in a pool, will still have the ID_FS_LABEL set. |
Hi @NateCrawford, Is your workaround using diskio sufficient? Or is there anything we could do in the zfs plugin for this case? |
Hello! I am closing this issue due to inactivity. I hope you were able to resolve your problem, if not please try posting this question in our Community Slack or Community Forums or provide additional details in this issue and reqeust that it be re-opened. Thank you! |
Relevant telegraf.conf
Logs from Telegraf
System info
Telegraf 1.29.5, CentOS 7.9, ZFS 2.0.7
Docker
No response
Steps to reproduce
1.Create zpool "datapool"
2.Create a Lustre OST on datapool
3.Start Telegraf
...
Expected behavior
Telegraf should detect the zpool and gather metrics
Actual behavior
Telegraf does not detect the zpool as there are no /proc/spl/kstat/zfs/datapool/objset-* files.
Additional info
There are 9 zpools on this host:
Compare contents of xd2_local and xdatapool-10:
The objset files contain data for datasets created in a zpool, but the actual pool-level data (that I actually expect) appears to be in the "io" file:
It is likely that the dataset created by Lustre lacks some feature that would require an objset file.
The text was updated successfully, but these errors were encountered: