-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JSON output support for zfs and zpool commands #16217
Conversation
Looks very helpful. Currently parsing especially 'zpool status' output is painful. One question: when you use the -p option you would get raw numbers and not human-readable ones? |
Yes, that is correct. Specifying
|
That's very cool, but If the properties are simple numbers, should they really be transmitted as strings? |
There is an upper bound on what can be natively shown as an integer in different JSON libraries that can be used to consume the output. So, returning the numbers as string appears to be the proper solution. To make it standard across the JSON output, all numbers are returned as strings, except for |
I can't say what is right or wrong for JSON, I have no idea about existing libraries and practices, but my thinking is that machine-parsable format should be maximally machine-parsable, that means no strings or suffixes for numbers, etc. Most of numbers in ZFS are uint64_t, and these days I would not consider library unable to handle uint64_t. Though I'll leave it to somebody else to decide. |
The limitation is not in the JSON specification but in JavaScript. Old versions of JavaScript had a limitation of accurately representing integers of 53bits in length. However, modern versions of this language have a BigInt type which works around this limitation. It's subjective on what the "proper" way of doing this is but if you want to try and appease all consumers of this API, strings should be returned. |
Yes, because javascript usually use double-precision IEEE-754 to handle numbers. But for the data exported by these ZFS commands, does it really matter if "9,007,199,254,740,993" cannot be represented exactly by javascript, and is averaged to 9,007,199,254,740,992 or 9,007,199,254,740,994? |
For sizes that might be OK, but for GUIDs it would not be a good idea for them to be lossy. |
Nice feature addition! Request -- for the Also, can we add the pool status above the config, like the scan state, raidz expansion status, etc since those are not properties that we can otherwise obtain. |
Right, and you don't want to do math with GUIDs anyway, so a string is great. |
Looks very helpful. Currently parsing especially 'zpool status' output is painful. One question: when you use the -p option you would get raw numbers and not human-readable ones?
As already mentioned it is JavaScript limitation. I've been working on RESTful API daemon for the last two years and at the beginning I couldn't agree to this, it felt wrong, but after a while I gave up, it is just easier to use strings for large numbers. JavaScript also won't return an error if the number is too large, it will silently round it down, which can led to some serious problems. In other words I agree with the author to represent the large numbers as strings. Because this is JSON, we could consider having both machine- and human-readable values in a single ouput and not depend on the -p option, eg. "total_space" and "total_space_hr" or something... |
Do you mean we should always show
This is how it is currently implemented. Scan state, checkpoint state, raidz expansion state are added above config. These objects are added to the output if they are found in the
|
I agree with this 👍 |
I have updated and |
First off, great work on this! 👏 I started kicking the tires on
Or to use a real-world example with two pools:
Example JSON from these two pools here ⬆️ (lots of fields excluded for brevity): This makes a lot of common queires very easy: # show me all leaf vdevs objects
cat zpool-status.json | jq '.pools[].vdevs[]'
"name": "file1",
"pool": "pool1",
"disk_type": "file",
"group_type": "raidz",
"vdev_type": "data",
"parent": "raidz-0",
"read_errors": 5
}
{
"name": "file2",
"pool": "pool1",
"disk_type": "file",
"group_type": "raidz",
...
# show me all leaf vdev names
cat zpool-status.json | jq '.pools[].vdevs[] | .name'
"file1"
"file2"
"file3"
"file6"
...
# show me all leaf vdevs objects with read errors
cat zpool-status.json | jq '.pools[].vdevs[] | select (.read_errors > 0)'
{
"name": "file1",
"pool": "pool1",
"disk_type": "file",
"group_type": "raidz",
"vdev_type": "data",
"parent": "raidz-0",
"read_errors": 5
}
# show me all mirror vdevs objects on pool1
cat zpool-status.json | jq '.pools[].vdevs[] | select (.pool == "pool1" and .group_type == "mirror")'
{
"name": "file4",
"pool": "pool1",
"disk_type": "file",
"group_type": "mirror",
"vdev_type": "special",
"parent": "mirror-1",
"read_errors": 0
}
{
"name": "file5",
"pool": "pool1",
...
# show me all pool names
cat zpool-status.json | jq '.pools[] | .pool_name'
"pool1"
"pool2" Thoughts? I'll start trying out the other commands soon. |
The downside of this is that you can't easily do numerical comparisons with cat junk.json | jq '.pools[].vdevs[] | select (.read_errors > 1)'
{
"name": "file1",
"read_errors": "5"
}
{
"name": "file2",
"read_errors": "0"
} Yes, you can get around it with cat junk.json | jq '.pools[].vdevs[] | .read_errors |= tonumber | select (.read_errors > 1)' Note that we do have the benefit of knowing from the nvlist which fields are numbers and which ones are not. |
@tonyhutter thanks for trying this out and taking a deeper look. While initially implementing the VDEVs part for Current implementation keeps it closer to text output which was one of the ideas. While having the flat hierarchy makes queries easier, nested output provides an accurate picture of how VDEVs are organized in the pool and accessing class VDEVs is simply looking up the class object like If more of us think flat hierarchy for VDEVs would be more beneficial, I can definitely update and have it organized like that. Also, I noticed in the example JSON output that you have shared, it does not contain VDEV object for class devices (like raidz or mirror). Is it intentional and we don't want to include those here? |
@usaleem-ix new idea - consider a variation on the flat vdev array that includes ALL vdevs. So the
Each of these vdevs objects would also include all their child vdevs. So you would still get a hierarchy of vdevs if you selected, say, the root vdev object, or the TLDs. The root vdev is actually important to include, since the root can have its own vdev properties. Example here (with most fields excluded for brevity) You might want to view the JSON as an interactive tree in https://jsoneditoronline.org/ since it is hard to conceptualize by looking the raw JSON. Queries: # Select any root or top-level-vdevs. root and TLDs have a .vdevs[] key
# since they have child vdevs.
cat new-zpool-status.json | jq '.pools[].vdevs[] | select(has("vdevs")) | .name'
# Select all leaf vdevs. Leaf vdevs do not have a .vdevs[] array, since they have
# no children vdevs.
cat new-zpool-status.json | jq '.pools[].vdevs[] | select(has("vdevs") == false)'
cat new-zpool-status.json | jq '.pools[].vdevs[] | select(.vdevs == null)'
# List all top level vdevs and their pool name
cat new-zpool-status.json | jq '.pools[].vdevs[] | select(.type == "tld")| [.pool,.name]'
# List all root vdevs objects
cat new-zpool-status.json | jq '.pools[].vdevs[] | select(.type == "root")'
# List all file based vdevs objects
cat new-zpool-status.json | jq '.pools[].vdevs[] | select(.type == "file")' |
@tonyhutter If I understand you right, your proposed output will be twice bigger than it has to. Please consider for a second that there can be 1000 vdevs with all the per-vdev properties. I don't know how many people really need to parse output with |
@amotin correct, it can be 2-4x larger output, depending on the vdev groups. The trade-off is that the JSON is easier iterate over and query. I tested out a 1000 vdev pool to see how bad it could get:
If it's 80ms difference from vanilla The benefits of having easily queryable JSON should not be understated. Right now our scripts need to screen scrape to get even the most basic information. Here's one example from ZTS; #
# Given a pool, and this function list all disks in the pool
#
function get_disklist # pool
{
echo $(zpool iostat -v $1 | awk '(NR > 4) {print $1}' | \
grep -vEe '^-----' -e "^(mirror|raidz[1-3]|draid[1-3]|spare|log|cache|special|dedup)|\-[0-9]$")
} To replace this with
I'm not sure how you could do it with the current JSON from this PR, since the vdevs are not arrayed together. |
9e34494
to
913da7b
Compare
@tonyhutter I have updated the
Each of these objects also include their child objects in If Can you please take a look and try it out? |
Thank you! I just did another round of testing. Some suggestions that came to mind:
zpool list -j | jq '.data."4555311788698535685".properties.health.value'
vs:
zpool list -j | jq '.data.tank.properties.health.value' It's going to make it easier to use in ZTS if you can use the pool name to query individual values. It also keeps things consistent with the vdev keys, which use the vdev names, not GUIDs.
Example:
This will help with querying: "give me all special leaf vdefs", "give me all vdevs names that are normal vdevs, but not spares", "give me all vdev paths that are actual physical drives", etc.
It also prevents shooting yourself in the foot when you do |
There's some good ideas and thinking in here, and I don't have anything to contribute to this bikeshed, so I will not. I did want to just raise a point about ongoing maintenance. This is going to address a longstanding gripe about ZFS, in that its output (especially My worry is that without care, it's going to cause similar issues we have with I don't think it has to be a big deal, so long as we have this in mind as stuff is added and removed to the output. That means some sort of criteria or rubric for deciding what and how to present things, and some rule for removing things that is clear to consumers. My light-touch suggestion for such a policy:
This is less rigid than I might like, but I think is probably about as minimal as we can comfortably enforce given our freewheeling nature :) |
@tonyhutter I have updated as per your suggestions, can you please take a look and try it out?
|
This commit adds support for JSON output for zfs list using '-j' option. Information is collected in JSON format which is later printed in jSON format. Existing options for zfs list also work with '-j'. man pages are updated with relevant information. Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Ameer Hamza <[email protected]> Signed-off-by: Umer Saleem <[email protected]> Closes #16217
This commit adds support for zfs mount to display mounted file systems in JSON format using '-j' option. Data is collected in nvlist which is printed in JSON format. man page for zfs mount is updated accordingly. Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Ameer Hamza <[email protected]> Signed-off-by: Umer Saleem <[email protected]> Closes #16217
This commit adds support for zpool version to output in JSON format using '-j' option. Userland kernel module version is collected in nvlist which is later displayed in JSON format. man page for zpool is updated. Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Ameer Hamza <[email protected]> Signed-off-by: Umer Saleem <[email protected]> Closes #16217
This commit adds support for zpool get command to output the list of properties for ZFS Pools and VDEVS in JSON format using '-j' option. Man page for zpool get is updated to include '-j' option. Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Ameer Hamza <[email protected]> Signed-off-by: Umer Saleem <[email protected]> Closes #16217
This commit adds support for zpool list command to output the list of ZFS pools in JSON format using '-j' option.. Information about available pools is collected in nvlist which is later printed to stdout in JSON format. Existing options for zfs list command work with '-j' flag. man page for zpool list is updated accordingly. Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Ameer Hamza <[email protected]> Signed-off-by: Umer Saleem <[email protected]> Closes #16217
This commit adds support for zpool status command to displpay status of ZFS pools in JSON format using '-j' option. Status information is collected in nvlist which is later dumped on stdout in JSON format. Existing options for zpool status work with '-j' flag. man page for zpool status is updated accordingly. Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Ameer Hamza <[email protected]> Signed-off-by: Umer Saleem <[email protected]> Closes #16217
Run basic JSON validation tests on the new `zfs|zpool -j` output. Reviewed-by: Ameer Hamza <[email protected]> Reviewed-by: Umer Saleem <[email protected]> Signed-off-by: Tony Hutter <[email protected]> Closes #16217
This fixes things so mirrored special vdevs report themselves as "class=special" rather than "class=normal". This happens due to the way the vdev nvlists are constructed: mirrored special devices - The 'mirror' vdev has allocation bias as "special" and it's leaf vdevs are "normal" single or RAID0 special devices - Leaf vdevs have allocation bias as "special". This commit adds in code to check if a leaf's parent is a "special" vdev to see if it should also report "special". Reviewed-by: Ameer Hamza <[email protected]> Reviewed-by: Umer Saleem <[email protected]> Signed-off-by: Tony Hutter <[email protected]> Closes #16217
This commit adds support for JSON output for zfs version and zfs get commands. '-j' flag can be used to get output in JSON format. Information is collected in nvlist objects which is later printed in JSON format. Existing options that work for zfs get and zfs version also work with '-j' flag. man pages for zfs get and zfs version are updated accordingly. Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Ameer Hamza <[email protected]> Signed-off-by: Umer Saleem <[email protected]> Closes openzfs#16217
This commit adds support for JSON output for zfs list using '-j' option. Information is collected in JSON format which is later printed in jSON format. Existing options for zfs list also work with '-j'. man pages are updated with relevant information. Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Ameer Hamza <[email protected]> Signed-off-by: Umer Saleem <[email protected]> Closes openzfs#16217
This commit adds support for zfs mount to display mounted file systems in JSON format using '-j' option. Data is collected in nvlist which is printed in JSON format. man page for zfs mount is updated accordingly. Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Ameer Hamza <[email protected]> Signed-off-by: Umer Saleem <[email protected]> Closes openzfs#16217
This commit adds support for zpool version to output in JSON format using '-j' option. Userland kernel module version is collected in nvlist which is later displayed in JSON format. man page for zpool is updated. Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Ameer Hamza <[email protected]> Signed-off-by: Umer Saleem <[email protected]> Closes openzfs#16217
This commit adds support for zpool get command to output the list of properties for ZFS Pools and VDEVS in JSON format using '-j' option. Man page for zpool get is updated to include '-j' option. Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Ameer Hamza <[email protected]> Signed-off-by: Umer Saleem <[email protected]> Closes openzfs#16217
This commit adds support for zpool list command to output the list of ZFS pools in JSON format using '-j' option.. Information about available pools is collected in nvlist which is later printed to stdout in JSON format. Existing options for zfs list command work with '-j' flag. man page for zpool list is updated accordingly. Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Ameer Hamza <[email protected]> Signed-off-by: Umer Saleem <[email protected]> Closes openzfs#16217
This commit adds support for zpool status command to displpay status of ZFS pools in JSON format using '-j' option. Status information is collected in nvlist which is later dumped on stdout in JSON format. Existing options for zpool status work with '-j' flag. man page for zpool status is updated accordingly. Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Ameer Hamza <[email protected]> Signed-off-by: Umer Saleem <[email protected]> Closes openzfs#16217
Run basic JSON validation tests on the new `zfs|zpool -j` output. Reviewed-by: Ameer Hamza <[email protected]> Reviewed-by: Umer Saleem <[email protected]> Signed-off-by: Tony Hutter <[email protected]> Closes openzfs#16217
This fixes things so mirrored special vdevs report themselves as "class=special" rather than "class=normal". This happens due to the way the vdev nvlists are constructed: mirrored special devices - The 'mirror' vdev has allocation bias as "special" and it's leaf vdevs are "normal" single or RAID0 special devices - Leaf vdevs have allocation bias as "special". This commit adds in code to check if a leaf's parent is a "special" vdev to see if it should also report "special". Reviewed-by: Ameer Hamza <[email protected]> Reviewed-by: Umer Saleem <[email protected]> Signed-off-by: Tony Hutter <[email protected]> Closes openzfs#16217
Motivation and Context
JSON output for ZFS commands can help enhance the consumability of ZFS data. Apart from libzfs, JSON output can provide a more friendly way to access ZFS data to external API users.
Description
This PR adds JSON output support for following
zfs
andzpool
commands:zfs list
zfs get
zfs mount
zfs version
zpool status
zpool list
zpool get
zpool version
Information is collected in an nvlist object in callback structure and later printed to stdout using
nvlist_print_json
.For future improvements and modifications, the output contains an
output_version
object that contains the version number:ZFS properties for datasets and pools are organized as below:
A dataset object for dataset will look like below:
A pool object will contain following information:
man pages for above listed commands have been updated and some examples have been added. Below are some more examples to demonstrate the JSON output:
How Has This Been Tested?
Manually tested in different pool configurations.
Types of changes
Checklist:
Signed-off-by
.