Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON output support for zfs and zpool commands #16217

Closed
wants to merge 9 commits into from

Conversation

usaleem-ix
Copy link
Contributor

@usaleem-ix usaleem-ix commented May 22, 2024

Motivation and Context

JSON output for ZFS commands can help enhance the consumability of ZFS data. Apart from libzfs, JSON output can provide a more friendly way to access ZFS data to external API users.

Description

This PR adds JSON output support for following zfs and zpool commands:

  • zfs list
  • zfs get
  • zfs mount
  • zfs version
  • zpool status
  • zpool list
  • zpool get
  • zpool version

Information is collected in an nvlist object in callback structure and later printed to stdout using nvlist_print_json.

For future improvements and modifications, the output contains an output_version object that contains the version number:

"output_version": {
  "command": <str>,
  "vers_major": <int>,
  "vers_minor": <int>
}

ZFS properties for datasets and pools are organized as below:

"property": {
  "value": <str>,
  "source": {
    "type": <str>, 
    "data": <str>             // If property is inherited, where is it inherited from
  }
}

A dataset object for dataset will look like below:

"dataset_name" : {
  "name": <str>,
  "type": <str>,
  "pool": <str>,
  "createtxg": <str>
  "properties": {
    .
    .
  }
}

A pool object will contain following information:

<pool_guid> : {
  "name": <str>,
  "type": <str>,
  "state": <str>,
  "guid": <str>,
  "txg": <str>,
  "spa_version": <str>,
  "zpl_version": <str>,
  "properties" : {
  .
  .
  }
}

man pages for above listed commands have been updated and some examples have been added. Below are some more examples to demonstrate the JSON output:

# query zfs version in JSON format
$ zfs version -j | jq
{
  "output_version": {
    "command": "zfs version",
    "vers_major": 0,
    "vers_minor": 1
  },
  "zfs_version": {
    "userland": "zfs-2.2.99-1",
    "kernel": "zfs-kmod-2.2.99-1"
  }
}

# list a dataset and only show mountpoint property in JSON format
$ zfs list -j -o mountpoint tank | jq
{
  "output_version": {
    "command": "zfs list",
    "vers_major": 0,
    "vers_minor": 1
  },
  "data": {
    "tank": {
      "name": "tank",
      "type": "FILESYSTEM",
      "pool": "tank",
      "createtxg": "1",
      "properties": {
        "mountpoint": {
          "value": "/mnt/tank",
          "source": {
            "type": "DEFAULT",
            "data": "-"
          }
        }
      }
    }
  }
}

# list all pools and only show size property
$  zpool list -j -o size | jq
{
  "output_version": {
    "command": "zpool list",
    "vers_major": 0,
    "vers_minor": 1
  },
  "data": {
    "1409156761762403139": {
      "name": "tpool",
      "type": "POOL",
      "state": "ONLINE",
      "guid": "1409156761762403139",
      "txg": "134928",
      "spa_version": "5000",
      "zpl_version": "5",
      "properties": {
        "size": {
          "value": "31.5G",
          "source": {
            "type": "NONE",
            "data": "-"
          }
        }
      }
    }
  }
}

# below is a zpool status of a single disk stripe pool:
$  zpool status -j tpool | jq
{
  "output_version": {
    "command": "zpool status",
    "vers_major": 0,
    "vers_minor": 1
  },
  "data": {
    "1409156761762403139": {
      "name": "tpool",
      "state": "ONLINE",
      "guid": "1409156761762403139",
      "txg": "134928",
      "spa_version": "5000",
      "zpl_version": "5",
      "status": "OK",
      "vdev_stats": {
        "tpool": {
          "name": "tpool",
          "alloc_space": "812K",
          "total_space": "31.5G",
          "def_space": "31.5G",
          "read_errors": "0",
          "write_errors": "0",
          "checksum_errors": "0",
          "children": {
            "scsi-0QEMU_QEMU_HARDDISK_drive-scsi1": {
              "name": "scsi-0QEMU_QEMU_HARDDISK_drive-scsi1",
              "vdev_type": "disk",
              "guid": "5501883177021187658",
              "path": "/dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi1-part1",
              "phys_path": "pci-0000:09:02.0-scsi-0:0:0:1",
              "devid": "scsi-0QEMU_QEMU_HARDDISK_drive-scsi1-part1",
              "state": "HEALTHY",
              "alloc_space": "812K",
              "total_space": "31.5G",
              "def_space": "31.5G",
              "rep_dev_size": "31.5G",
              "phys_space": "32.0G",
              "read_errors": "0",
              "write_errors": "0",
              "checksum_errors": "0"
            }
          }
        }
      },
      "error_count": "0"
    }
  }
}

How Has This Been Tested?

Manually tested in different pool configurations.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

@pjd
Copy link
Contributor

pjd commented May 23, 2024

Looks very helpful. Currently parsing especially 'zpool status' output is painful.

One question: when you use the -p option you would get raw numbers and not human-readable ones?

@usaleem-ix
Copy link
Contributor Author

usaleem-ix commented May 23, 2024

One question: when you use the -p option you would get raw numbers and not human-readable ones?

Yes, that is correct. Specifying -p option would get raw numbers instead of human-readable ones.

$ zpool list -jp -o size,allocated tank | jq
{
  "output_version": {
    "command": "zpool list",
    "vers_major": 0,
    "vers_minor": 1
  },
  "data": {
    "3920273586464696295": {
      "name": "tank",
      "type": "POOL",
      "state": "ONLINE",
      "guid": "3920273586464696295",
      "txg": "16597",
      "spa_version": "5000",
      "zpl_version": "5",
      "properties": {
        "size": {
          "value": "16454019710976",
          "source": {
            "type": "NONE",
            "data": "-"
          }
        },
        "allocated": {
          "value": "67213430784",
          "source": {
            "type": "NONE",
            "data": "-"
          }
        }
      }
    }
  }
}

@ceedriic
Copy link
Contributor

Yes, that is correct. Specifying -p option would get raw numbers instead of human-readable ones.

          "value": "16454019710976",
          "source": {
            "type": "NONE",
            "data": "-"
          }
        },

That's very cool, but If the properties are simple numbers, should they really be transmitted as strings?
Same question for "spa_version" and "zpl_version".

@usaleem-ix
Copy link
Contributor Author

If the properties are simple numbers, should they really be transmitted as strings?

There is an upper bound on what can be natively shown as an integer in different JSON libraries that can be used to consume the output. So, returning the numbers as string appears to be the proper solution. To make it standard across the JSON output, all numbers are returned as strings, except for vers_major and vers_minor in output_version, which are integers.

@amotin
Copy link
Member

amotin commented May 23, 2024

I can't say what is right or wrong for JSON, I have no idea about existing libraries and practices, but my thinking is that machine-parsable format should be maximally machine-parsable, that means no strings or suffixes for numbers, etc. Most of numbers in ZFS are uint64_t, and these days I would not consider library unable to handle uint64_t. Though I'll leave it to somebody else to decide.

@yocalebo
Copy link

yocalebo commented May 23, 2024

I can't say what is right or wrong for JSON, I have no idea about existing libraries and practices, but my thinking is that machine-parsable format should be maximally machine-parsable, that means no strings or suffixes for numbers, etc. Most of numbers in ZFS are uint64_t, and these days I would not consider library unable to handle uint64_t. Though I'll leave it to somebody else to decide.

The limitation is not in the JSON specification but in JavaScript. Old versions of JavaScript had a limitation of accurately representing integers of 53bits in length. However, modern versions of this language have a BigInt type which works around this limitation. It's subjective on what the "proper" way of doing this is but if you want to try and appease all consumers of this API, strings should be returned.

@ceedriic
Copy link
Contributor

The limitation is not in the JSON specification but in JavaScript. Old versions of JavaScript had a limitation of accurately representing integers of 53bits in length.

Yes, because javascript usually use double-precision IEEE-754 to handle numbers.

But for the data exported by these ZFS commands, does it really matter if "9,007,199,254,740,993" cannot be represented exactly by javascript, and is averaged to 9,007,199,254,740,992 or 9,007,199,254,740,994?

@don-brady
Copy link
Contributor

But for the data exported by these ZFS commands, does it really matter if "9,007,199,254,740,993" cannot be represented exactly by javascript, and is averaged to 9,007,199,254,740,992 or 9,007,199,254,740,994?

For sizes that might be OK, but for GUIDs it would not be a good idea for them to be lossy.

@don-brady
Copy link
Contributor

Nice feature addition!

Request -- for the zpool status -j can we make -s a default (i.e., implied by -j)?

Also, can we add the pool status above the config, like the scan state, raidz expansion status, etc since those are not properties that we can otherwise obtain.

@ceedriic
Copy link
Contributor

But for the data exported by these ZFS commands, does it really matter if "9,007,199,254,740,993" cannot be represented exactly by javascript, and is averaged to 9,007,199,254,740,992 or 9,007,199,254,740,994?

For sizes that might be OK, but for GUIDs it would not be a good idea for them to be lossy.

Right, and you don't want to do math with GUIDs anyway, so a string is great.

@pjd
Copy link
Contributor

pjd commented May 24, 2024

Looks very helpful. Currently parsing especially 'zpool status' output is painful.

One question: when you use the -p option you would get raw numbers and not human-readable ones?

I can't say what is right or wrong for JSON, I have no idea about existing libraries and practices, but my thinking is that machine-parsable format should be maximally machine-parsable, that means no strings or suffixes for numbers, etc. Most of numbers in ZFS are uint64_t, and these days I would not consider library unable to handle uint64_t. Though I'll leave it to somebody else to decide.

As already mentioned it is JavaScript limitation. I've been working on RESTful API daemon for the last two years and at the beginning I couldn't agree to this, it felt wrong, but after a while I gave up, it is just easier to use strings for large numbers. JavaScript also won't return an error if the number is too large, it will silently round it down, which can led to some serious problems.

In other words I agree with the author to represent the large numbers as strings.

Because this is JSON, we could consider having both machine- and human-readable values in a single ouput and not depend on the -p option, eg. "total_space" and "total_space_hr" or something...

@usaleem-ix
Copy link
Contributor Author

usaleem-ix commented May 24, 2024

Request -- for the zpool status -j can we make -s a default (i.e., implied by -j)?

Do you mean we should always show slow_ios for VDEVs regardless -s option is specified or not?

can we add the pool status above the config, like the scan state, raidz expansion status, etc since those are not properties that we can otherwise obtain.

This is how it is currently implemented. Scan state, checkpoint state, raidz expansion state are added above config. These objects are added to the output if they are found in the vdev_tree object in config. Below is a status output of a raidz1 pool that was expanded after creation:

  "data": {
    "13426222672292961779": {
      "name": "tank",
      "state": "ONLINE",
      "guid": "13426222672292961779",
      "txg": "732",
      "spa_version": "5000",
      "zpl_version": "5",
      "status": "OK",
      "scan_stats": {
        "function": "SCRUB",
        "state": "FINISHED",
        "start_time": "Tue May 21 04:14:14 PDT 2024",
        "end_time": "Tue May 21 04:14:19 PDT 2024",
        "to_examine": "2.30G",
        "examined": "2.30G",
        "skipped": "600K",
        "processed": "0B",
        "errors": "0",
        "bytes_per_scan": "0B",
        "pass_start": "1716290054",
        "scrub_pause": "-",
        "scrub_spent_paused": "0",
        "issued_bytes_per_scan": "2.30G",
        "issued": "2.30G"
      },
      "checkpoint_stats": {
        "state": "EXISTS",
        "start_time": "Tue May 21 04:16:47 PDT 2024",
        "space": "552K"
      },
      "raidz_expand_stats": {
        "name": "raidz1",
        "state": "FINISHED",
        "expanding_vdev": "0",
        "start_time": "Tue May 21 04:13:39 PDT 2024",
        "end_time": "Tue May 21 04:14:14 PDT 2024",
        "to_reflow": "2.30G",
        "reflowed": "2.30G",
        "waiting_for_resilver": "0"
      },
      "vdev_stats": {

@tonyhutter
Copy link
Contributor

Request -- for the zpool status -j can we make -s a default (i.e., implied by -j)?

I agree with this 👍

@behlendorf behlendorf added the Status: Code Review Needed Ready for review and testing label May 25, 2024
@usaleem-ix
Copy link
Contributor Author

I have updated and slow_ios for leaf VDEVs are always shown, regardless -s option is specified or not.

@tonyhutter
Copy link
Contributor

First off, great work on this! 👏

I started kicking the tires on zpool status -j. I see that the current JSON output is very close to our internal representation of the data (nvlists), which unfortunately, doesn't make for easy queries. For example, the main vdevs are in the vdev_stat.children array, but the logs, spares, l2cache, special, and dedup vdevs are two levels up in the pool's object. It made me consider what would be the easiest to use JSON tree for everyday users. I came up with roughly this:

 pools[]
	pool1
		vdevs[]
			vdev1
			vdev2
 
	pool2
		vdevs[]
			vdev1
			vdev2
			vdev3

Or to use a real-world example with two pools:

  pool: pool1
 state: ONLINE
config:

	NAME                        STATE     READ WRITE CKSUM
	pool1                       ONLINE       2     0     0
	  raidz1-0                  ONLINE       2     0     0
	    /home/hutter/zfs/file1  ONLINE       2     0     0
	    /home/hutter/zfs/file2  ONLINE       0     0     0
	    /home/hutter/zfs/file3  ONLINE       0     0     0
	dedup	
	  /home/hutter/zfs/file6    ONLINE       0     0     0
	special	
	  mirror-1                  ONLINE       0     0     0
	    /home/hutter/zfs/file4  ONLINE       0     0     0
	    /home/hutter/zfs/file5  ONLINE       0     0     0
	logs	
	  /home/hutter/zfs/file7    ONLINE       0     0     0
	cache
	  /home/hutter/zfs/file8    ONLINE       0     0     0
	spares
	  /home/hutter/zfs/file9    AVAIL   

errors: No known data errors

  pool: pool2
 state: ONLINE
config:

	NAME                         STATE     READ WRITE CKSUM
	pool2                        ONLINE       0     0     0
	  mirror-0                   ONLINE       0     0     0
	    /home/hutter/zfs/file10  ONLINE       0     0     0
	    /home/hutter/zfs/file11  ONLINE       0     0     0
	    /home/hutter/zfs/file12  ONLINE       0     0     0

Example JSON from these two pools here ⬆️ (lots of fields excluded for brevity):

This makes a lot of common queires very easy:

# show me all leaf vdevs objects
cat zpool-status.json | jq '.pools[].vdevs[]'

  "name": "file1",
  "pool": "pool1",
  "disk_type": "file",
  "group_type": "raidz",
  "vdev_type": "data",
  "parent": "raidz-0",
  "read_errors": 5
}
{
  "name": "file2",
  "pool": "pool1",
  "disk_type": "file",
  "group_type": "raidz",
...

# show me all leaf vdev names
cat zpool-status.json | jq '.pools[].vdevs[] | .name'
"file1"
"file2"
"file3"
"file6"
...

# show me all leaf vdevs objects with read errors
cat zpool-status.json | jq '.pools[].vdevs[] | select (.read_errors > 0)'
{
  "name": "file1",
  "pool": "pool1",
  "disk_type": "file",
  "group_type": "raidz",
  "vdev_type": "data",
  "parent": "raidz-0",
  "read_errors": 5
}

# show me all mirror vdevs objects on pool1
cat zpool-status.json | jq '.pools[].vdevs[] | select (.pool == "pool1" and .group_type == "mirror")'
{
  "name": "file4",
  "pool": "pool1",
  "disk_type": "file",
  "group_type": "mirror",
  "vdev_type": "special",
  "parent": "mirror-1",
  "read_errors": 0
}
{
  "name": "file5",
  "pool": "pool1",
...

# show me all pool names
cat zpool-status.json | jq '.pools[] | .pool_name'
"pool1"
"pool2"

Thoughts?

I'll start trying out the other commands soon.

@tonyhutter
Copy link
Contributor

In other words I agree with the author to represent the large numbers as strings.

The downside of this is that you can't easily do numerical comparisons with jq. If the number is a string, it can lead to...

cat junk.json | jq '.pools[].vdevs[] | select (.read_errors > 1)' 
{
  "name": "file1",
  "read_errors": "5"
}
{
  "name": "file2",
  "read_errors": "0"
}

Yes, you can get around it with tonumber, but it's kind of annoying:

cat junk.json | jq '.pools[].vdevs[] | .read_errors |= tonumber | select (.read_errors > 1)'

Note that we do have the benefit of knowing from the nvlist which fields are numbers and which ones are not.

@usaleem-ix
Copy link
Contributor Author

usaleem-ix commented May 29, 2024

@tonyhutter thanks for trying this out and taking a deeper look.

While initially implementing the VDEVs part for zpool status, I also had thought that it would be nice to have a flat hierarchy of VDEVs instead of creating nested objects and create children object in each VDEV object if it has any children. While we can have properties like vdev_type (for data/special/dedup/logs/cache/spare), group_type (for raidz/mirror/stripe etc) and parent (for indicating parent name) in each object to know where it belongs in the hierarchy but at the user end it would require extra steps to figure out where does one particular VDEV belong in the hierarchy. It might potentially confuse users if a drive is being replaced.

Current implementation keeps it closer to text output which was one of the ideas. While having the flat hierarchy makes queries easier, nested output provides an accurate picture of how VDEVs are organized in the pool and accessing class VDEVs is simply looking up the class object like dedup for looking into how dedup is organized.

If more of us think flat hierarchy for VDEVs would be more beneficial, I can definitely update and have it organized like that.

Also, I noticed in the example JSON output that you have shared, it does not contain VDEV object for class devices (like raidz or mirror). Is it intentional and we don't want to include those here?

@tonyhutter
Copy link
Contributor

@usaleem-ix new idea - consider a variation on the flat vdev array that includes ALL vdevs. So the pools[].vdevs[] array would include:

  1. Leaf vdevs
  2. Top-level-vdevs
  3. The root vdev

Each of these vdevs objects would also include all their child vdevs. So you would still get a hierarchy of vdevs if you selected, say, the root vdev object, or the TLDs. The root vdev is actually important to include, since the root can have its own vdev properties.

Example here (with most fields excluded for brevity)

You might want to view the JSON as an interactive tree in https://jsoneditoronline.org/ since it is hard to conceptualize by looking the raw JSON.

Queries:

# Select any root or top-level-vdevs.  root and TLDs have a .vdevs[] key
# since they have child vdevs.
cat new-zpool-status.json | jq '.pools[].vdevs[] | select(has("vdevs")) | .name'

# Select all leaf vdevs.  Leaf vdevs do not have a .vdevs[] array, since they have
# no children vdevs.
cat new-zpool-status.json | jq '.pools[].vdevs[] | select(has("vdevs") == false)'
cat new-zpool-status.json | jq '.pools[].vdevs[] | select(.vdevs == null)'

# List all top level vdevs and their pool name
cat new-zpool-status.json | jq '.pools[].vdevs[] | select(.type == "tld")| [.pool,.name]'

# List all root vdevs objects
cat new-zpool-status.json | jq '.pools[].vdevs[] | select(.type == "root")'

# List all file based vdevs objects
cat new-zpool-status.json | jq '.pools[].vdevs[] | select(.type == "file")'

@amotin
Copy link
Member

amotin commented May 29, 2024

@tonyhutter If I understand you right, your proposed output will be twice bigger than it has to. Please consider for a second that there can be 1000 vdevs with all the per-vdev properties. I don't know how many people really need to parse output with jq, while on any reasonable programming language limited depth tree traversal should be pretty trivial operation.

@tonyhutter
Copy link
Contributor

tonyhutter commented May 30, 2024

@amotin correct, it can be 2-4x larger output, depending on the vdev groups. The trade-off is that the JSON is easier iterate over and query.

I tested out a 1000 vdev pool to see how bad it could get:

zpool status 152ms, 57kb output
zpool status -j 232ms, 318kb output
difference in cmds 80ms, 261kb output

If it's 80ms difference from vanilla zpool status, then even at 4x JSON that's only 320ms, which isn't a big deal. Most pools are going to be in the ~100 vdev or less range anyway.

The benefits of having easily queryable JSON should not be understated. Right now our scripts need to screen scrape to get even the most basic information. Here's one example from ZTS;

#
# Given a pool, and this function list all disks in the pool
#
function get_disklist # pool
{
	echo $(zpool iostat -v $1 | awk '(NR > 4) {print $1}' | \
	    grep -vEe '^-----' -e "^(mirror|raidz[1-3]|draid[1-3]|spare|log|cache|special|dedup)|\-[0-9]$")
}

To replace this with jq using the pools[].vdevs[] JSON layout I'm proposing, it would simply be:

zpool status -j | jq '.pools[].vdevs[] | select(.vdevs == null) | .name' | xargs echo

I'm not sure how you could do it with the current JSON from this PR, since the vdevs are not arrayed together.

@usaleem-ix usaleem-ix force-pushed the zfs-json branch 2 times, most recently from 9e34494 to 913da7b Compare June 6, 2024 05:21
@usaleem-ix
Copy link
Contributor Author

@tonyhutter I have updated the zpool status and added a --flat option in addition to -j. If --flat option is used with -j, vdev_stats object would include:

  1. Leaf vdevs
  2. Top level vdevs
  3. Root vdev

Each of these objects also include their child objects in children object.

If --flat option is not specified, vdev_stats object would display vdevs in hierarchical format as it is being done previously.

Can you please take a look and try it out?

@tonyhutter
Copy link
Contributor

I have updated the zpool status and added a --flat option in addition to -j. If --flat option is used with -j

Thank you!

I just did another round of testing. Some suggestions that came to mind:

  1. Rename the "data" object to "pools" in the zpool JSON and "data" to "datasets" on the zfs JSON. That's just makes them more descriptive when your reading jq queries and or python code that uses the objects.

  2. "vdev_stats" and "children" objects should be renamed "vdevs". There's really no reason to have different names for them since they represent the same concept ("the vdevs beneath me"). It might make some queries easier if the name is the same too.

  3. Earlier I was suggesting pools[] and vdevs[] be arrays, but looks like you kept them as dictionaries. I now think you may have the right idea. Dictionaries make it easy to look up individual stats if you know the pool and vdev names in advance, and you can always convert a dictionary to an array in jq, which is nice. We just need to make sure that the "pools", "vdevs", and "datasets" dictionary objects only ever contain pools/vdevs/dataset entries, and nothing else.

  4. I would suggest you use the pool name rather than GUID for keys. Consider that you have to quote the GUID, but not the pool name, if you want to query it in jq:

zpool list -j | jq '.data."4555311788698535685".properties.health.value' 

vs:

zpool list -j | jq '.data.tank.properties.health.value'

It's going to make it easier to use in ZTS if you can use the pool name to query individual values. It also keeps things consistent with the vdev keys, which use the vdev names, not GUIDs.

  1. All vdevs should have objects for:
    -Their class: "special", "dedup", "logs", "spare" ...
    -Their redundancy level: "raidz", "raidz2", "mirror" ...
    -Their type: "file", "drive".

Example:

                {
                  "name": "file1",
                  "pool": "pool1",
                  "class": "normal",
                  "redundancy": "raidz",
                  "type": "file",
                  "parent": "raidz1-0",
                  ...

This will help with querying: "give me all special leaf vdefs", "give me all vdevs names that are normal vdevs, but not spares", "give me all vdev paths that are actual physical drives", etc.

  1. Numerical values should be numbers, not strings. This allows easy numerical comparisons:
jq '.pools[].vdevs[] | select (.read_errors > 1)'

It also prevents shooting yourself in the foot when you do sort_by(size) and it does a dictionary sort instead of the numeric sort you wanted.

@robn
Copy link
Member

robn commented Jun 8, 2024

There's some good ideas and thinking in here, and I don't have anything to contribute to this bikeshed, so I will not. I did want to just raise a point about ongoing maintenance.

This is going to address a longstanding gripe about ZFS, in that its output (especially zpool status) is tricky to parse and use in other programs. People are going to use it. And that's great!

My worry is that without care, it's going to cause similar issues we have with zpool status in particular: too many things out there depend on the shape of its output, making it hard for us to change. In particular, I'd be concerned about exposing internal values that are really implementation details, which can be even harder to remove.

I don't think it has to be a big deal, so long as we have this in mind as stuff is added and removed to the output. That means some sort of criteria or rubric for deciding what and how to present things, and some rule for removing things that is clear to consumers.

My light-touch suggestion for such a policy:

  • in a .0 release, we can change the output entirely; structure, types, formats and fields are all up for grabs
  • within a stable series, we will only add fields if necessary, but not change the overall structure or model, and won't change types or formats. If we ever did have to remove a field in a stable series, the field would remain with a default or zero value
  • consumer can always tell what version of the output they're getting, through a key that we guarantee will always be there
    • (like output_version, but maybe I would make the version be 2.1 or 2.2 or similar, if that was the semantic lifecycle as above)

This is less rigid than I might like, but I think is probably about as minimal as we can comfortably enforce given our freewheeling nature :)

@usaleem-ix
Copy link
Contributor Author

usaleem-ix commented Jun 25, 2024

@tonyhutter I have updated as per your suggestions, can you please take a look and try it out?

  1. Renamed the "data" object to "pools" in the pool objects and "data" to "datasets" in dataset objects.
  2. Renamed "vdev_stats" and "children" objects to "vdevs".
  3. Pools, vdevs and datasets objects are kept as dictionaries.
  4. I agree that having pool name as key rather than GUID would simplify performing the operations, but pool names can be identical. Commands that operate on exported pools, for example, zpool import lists available pools that can be imported, would hopefully soon have JSON output support. If we have pool names as keys for pool objects, we cannot list pool objects with identical names. Since GUIDs will be different for both pool with same name, and having a consistent key identifier for all pool objects, it makes more sense to keep pool GUID as key for pool objects.
  5. All vdevs have objects for vdev_type (for "file", "disk" etc.) and class (for "special", "dedup", "logs", "spare" etc.)
  6. I have added an extra parameter --int, when given it will display all numbers as integers. It is applicable for all commands that have JSON support. One drawback here is that for large numbers like GUIDs, when passed to jq, it rounds the GUID value when displayed as integer. zpool/zfs commands print the integer GUID values correctly but when jq parses them, the least significant digits are rounded. So large integer values when parsed using tools like jq may not show up correctly.
$ zpool get -j guid | jq
{
  "output_version": {
    "command": "zpool get",
    "vers_major": 0,
    "vers_minor": 1
  },
  "pools": {
    "1640519301551502919": {
      "name": "rdz",
      "type": "POOL",
      "state": "ONLINE",
      "pool_guid": "1640519301551502919",
      "txg": "4",
      "spa_version": "5000",
      "zpl_version": "5",
      "properties": {
        "guid": {
          "value": "1640519301551502919",
          "source": {
            "type": "NONE",
            "data": "-"
          }
        }
      }
    }
  }
}

$ zpool get -j --int guid | jq
{
  "output_version": {
    "command": "zpool get",
    "vers_major": 0,
    "vers_minor": 1
  },
  "pools": {
    "1640519301551502919": {
      "name": "rdz",
      "type": "POOL",
      "state": "ONLINE",
      "pool_guid": 1640519301551502800,
      "txg": 4,
      "spa_version": 5000,
      "zpl_version": 5,
      "properties": {
        "guid": {
          "value": 1640519301551502800,
          "source": {
            "type": "NONE",
            "data": "-"
          }
        }
      }
    }
  }
}

include/libzfs.h Show resolved Hide resolved
@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Aug 6, 2024
@behlendorf behlendorf closed this in aa15b60 Aug 6, 2024
behlendorf pushed a commit that referenced this pull request Aug 6, 2024
This commit adds support for JSON output for zfs list using '-j' option.
Information is collected in JSON format which is later printed in jSON
format. Existing options for zfs list also work with '-j'. man pages are
updated with relevant information.

Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Ameer Hamza <[email protected]>
Signed-off-by: Umer Saleem <[email protected]>
Closes #16217
behlendorf pushed a commit that referenced this pull request Aug 6, 2024
This commit adds support for zfs mount to display mounted file systems
in JSON format using '-j' option. Data is collected in nvlist which is
printed in JSON format. man page for zfs mount is updated accordingly.

Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Ameer Hamza <[email protected]>
Signed-off-by: Umer Saleem <[email protected]>
Closes #16217
behlendorf pushed a commit that referenced this pull request Aug 6, 2024
This commit adds support for zpool version to output in JSON format
using '-j' option. Userland kernel module version is collected in nvlist
which  is later displayed in JSON format. man page for zpool is updated.

Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Ameer Hamza <[email protected]>
Signed-off-by: Umer Saleem <[email protected]>
Closes #16217
behlendorf pushed a commit that referenced this pull request Aug 6, 2024
This commit adds support for zpool get command to output the list of
properties for ZFS Pools and VDEVS in JSON format using '-j' option.
Man page for zpool get is updated to include '-j' option.

Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Ameer Hamza <[email protected]>
Signed-off-by: Umer Saleem <[email protected]>
Closes #16217
behlendorf pushed a commit that referenced this pull request Aug 6, 2024
This commit adds support for zpool list command to output the list of
ZFS pools in JSON format using '-j' option.. Information about available
pools is collected in nvlist which is later printed to stdout in JSON
format.

Existing options for zfs list command work with '-j' flag. man page for
zpool list is updated accordingly.

Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Ameer Hamza <[email protected]>
Signed-off-by: Umer Saleem <[email protected]>
Closes #16217
behlendorf pushed a commit that referenced this pull request Aug 6, 2024
This commit adds support for zpool status command to displpay status
of ZFS pools in JSON format using '-j' option. Status information is
collected in nvlist which is later dumped on stdout in JSON format.
Existing options for zpool status work with '-j' flag. man page for
zpool status is updated accordingly.

Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Ameer Hamza <[email protected]>
Signed-off-by: Umer Saleem <[email protected]>
Closes #16217
behlendorf pushed a commit that referenced this pull request Aug 6, 2024
Run basic JSON validation tests on the new `zfs|zpool -j` output.

Reviewed-by: Ameer Hamza <[email protected]>
Reviewed-by: Umer Saleem <[email protected]>
Signed-off-by: Tony Hutter <[email protected]>
Closes #16217
behlendorf pushed a commit that referenced this pull request Aug 6, 2024
This fixes things so mirrored special vdevs report themselves as
"class=special" rather than "class=normal".

This happens due to the way the vdev nvlists are constructed:

mirrored special devices - The 'mirror' vdev has allocation bias as
"special" and it's leaf vdevs are "normal"

single or RAID0 special devices - Leaf vdevs have allocation bias as
"special".

This commit adds in code to check if a leaf's parent is a "special"
vdev to see if it should also report "special".

Reviewed-by: Ameer Hamza <[email protected]>
Reviewed-by: Umer Saleem <[email protected]>
Signed-off-by: Tony Hutter <[email protected]>
Closes #16217
@usaleem-ix usaleem-ix deleted the zfs-json branch August 7, 2024 05:04
lundman pushed a commit to openzfsonwindows/openzfs that referenced this pull request Sep 4, 2024
This commit adds support for JSON output for zfs version and zfs get
commands. '-j' flag can be used to get output in JSON format.

Information is collected in nvlist objects which is later printed in
JSON format. Existing options that work for zfs get and zfs version
also work with '-j' flag.

man pages for zfs get and zfs version are updated accordingly.

Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Ameer Hamza <[email protected]>
Signed-off-by: Umer Saleem <[email protected]>
Closes openzfs#16217
lundman pushed a commit to openzfsonwindows/openzfs that referenced this pull request Sep 4, 2024
This commit adds support for JSON output for zfs list using '-j' option.
Information is collected in JSON format which is later printed in jSON
format. Existing options for zfs list also work with '-j'. man pages are
updated with relevant information.

Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Ameer Hamza <[email protected]>
Signed-off-by: Umer Saleem <[email protected]>
Closes openzfs#16217
lundman pushed a commit to openzfsonwindows/openzfs that referenced this pull request Sep 4, 2024
This commit adds support for zfs mount to display mounted file systems
in JSON format using '-j' option. Data is collected in nvlist which is
printed in JSON format. man page for zfs mount is updated accordingly.

Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Ameer Hamza <[email protected]>
Signed-off-by: Umer Saleem <[email protected]>
Closes openzfs#16217
lundman pushed a commit to openzfsonwindows/openzfs that referenced this pull request Sep 4, 2024
This commit adds support for zpool version to output in JSON format
using '-j' option. Userland kernel module version is collected in nvlist
which  is later displayed in JSON format. man page for zpool is updated.

Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Ameer Hamza <[email protected]>
Signed-off-by: Umer Saleem <[email protected]>
Closes openzfs#16217
lundman pushed a commit to openzfsonwindows/openzfs that referenced this pull request Sep 4, 2024
This commit adds support for zpool get command to output the list of
properties for ZFS Pools and VDEVS in JSON format using '-j' option.
Man page for zpool get is updated to include '-j' option.

Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Ameer Hamza <[email protected]>
Signed-off-by: Umer Saleem <[email protected]>
Closes openzfs#16217
lundman pushed a commit to openzfsonwindows/openzfs that referenced this pull request Sep 4, 2024
This commit adds support for zpool list command to output the list of
ZFS pools in JSON format using '-j' option.. Information about available
pools is collected in nvlist which is later printed to stdout in JSON
format.

Existing options for zfs list command work with '-j' flag. man page for
zpool list is updated accordingly.

Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Ameer Hamza <[email protected]>
Signed-off-by: Umer Saleem <[email protected]>
Closes openzfs#16217
lundman pushed a commit to openzfsonwindows/openzfs that referenced this pull request Sep 4, 2024
This commit adds support for zpool status command to displpay status
of ZFS pools in JSON format using '-j' option. Status information is
collected in nvlist which is later dumped on stdout in JSON format.
Existing options for zpool status work with '-j' flag. man page for
zpool status is updated accordingly.

Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Ameer Hamza <[email protected]>
Signed-off-by: Umer Saleem <[email protected]>
Closes openzfs#16217
lundman pushed a commit to openzfsonwindows/openzfs that referenced this pull request Sep 4, 2024
Run basic JSON validation tests on the new `zfs|zpool -j` output.

Reviewed-by: Ameer Hamza <[email protected]>
Reviewed-by: Umer Saleem <[email protected]>
Signed-off-by: Tony Hutter <[email protected]>
Closes openzfs#16217
lundman pushed a commit to openzfsonwindows/openzfs that referenced this pull request Sep 4, 2024
This fixes things so mirrored special vdevs report themselves as
"class=special" rather than "class=normal".

This happens due to the way the vdev nvlists are constructed:

mirrored special devices - The 'mirror' vdev has allocation bias as
"special" and it's leaf vdevs are "normal"

single or RAID0 special devices - Leaf vdevs have allocation bias as
"special".

This commit adds in code to check if a leaf's parent is a "special"
vdev to see if it should also report "special".

Reviewed-by: Ameer Hamza <[email protected]>
Reviewed-by: Umer Saleem <[email protected]>
Signed-off-by: Tony Hutter <[email protected]>
Closes openzfs#16217
@ptahmose ptahmose mentioned this pull request Nov 9, 2024
14 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants