zfs-0.7.0
New Features
-
Resumable
zfs send/receive
- Allow an interruptedzfs receive
to be resumed if the stream was prematurely terminated (e.g. due to remote system or network failure). -
Compressed
zfs send/receive
- Use thezfs send -c
option to directly send the compressed data in the ARC or on-disk to another pool without needing to decompress it. -
Multiple Import Protection - Prevents a shared pool in a fail-over configuration from being imported on different hosts at the same time. When the multihost pool property is on, perform an activity check prior to importing the pool to verify it is not in use.
-
Customized
zpool iostat|status
columns - Additional columns can be added to thezpool iostat
andzpool status
output to show more information. Several useful scripts are provided which can report drive temperature, SMART data, enclosure LED status, and more. Administrators and users can add additional scripts to meet their needs. -
Latency and request size histograms - Use the
zpool iostat -l
option to show on-the-fly latency stats andzpool iostat -w
to generate a histogram showing the total latency of each IO. Thezpool iostat -r
option can be used to show the size of each IO. These statistics are available per-disk to aid in finding misbehaving devices. -
Scrub Pause - The
zpool scrub -p
option can be used to pause/resume an active scrub without having to cancel it. -
Delegations - The
zfs allow
andzfs unallow
subcommands can be used to delegate ZFS administrative permissions for the file systems to non-privileged users. -
Large dnodes - This feature improves metadata performance allowing extended attributes, ACLs, and symbolic links with long target names to be stored in the dnode. This benefits workloads such as SELinux, distributed filesystems like Lustre and Ceph, and any application which makes use of extended attributes.
-
User/group object accounting and quota - This feature adds per-object user/group accounting and quota limits to the existing space accounting and quota functionality. The
zfs userspace
andzfs groupspace
subcommands have been extended to set quota limits and report on object usage. -
Cryptographic checksums - Stronger SHA-512, Skein, or Edon-R checksums are available.
-
JBOD Management
- Automatic drive online - Newly detected devices which are determined to be part of an imported pool are automatically brought online.
- Automatic drive replacement - When the autoreplace pool property is on, any new device found in the same physical location as a device that previously belonged to the pool, is automatically formatted and replaced.
- Automatic hot spares - When a device is faulted start a rebuild to a hot-spare device if available.
- Fault LEDs - Set the fault LED for a device when it's faulted, clear it when it has been replaced.
- Drive health monitoring - Automatically fault a device when an excessive number of read, write, or checksum errors are detected.
- Force fault - Use
zpool offline -f
to proactively fault a problematic device. - Multipath aware - Can be used with advanced multipath configurations.
Performance
- ARC Buffer Data (ABD) - Allocates ARC data buffers using scatter lists of pages instead of virtual memory. This approach minimizes fragmentation on the system allowing for a more efficient use of memory. The reduced demand for virtual memory also improves stability and performance on 32-bit architectures.
- Compressed ARC - Cached file data is compressed by default in memory and uncompressed on demand. This allows for an larger effective cache which improves overall performance.
- Vectorized RAIDZ - Hardware optimized RAIDZ which reduces CPU usage.
Supported SIMD instructions: sse2, ssse3, avx2, avx512f, and avx512bw, neon, neonx2 - Vectorized checksums - Hardware optimized Fletcher-4 checksums which reduce CPU usage.
Supported SIMD instructions: sse2, ssse3, avx2, avx512f, neon - GZIP compression offloading - Hardware optimized GZIP compression offloading with QAT accelerator.
- Metadata performance - Overall improved metadata performance. Optimizations include a multi-threaded allocator, batched quota updates, improved prefetching, and streamlined call paths.
- Faster RAIDZ resilver - When resilvering RAIDZ intelligently skips sections of the device which don't need to be rebuilt.
Changes in Behavior
- Non-privileged users are allowed to run
zpool list
,zpool iostat
,zpool status
,zpool get
,zfs list
, andzfs get
. These commands no longer need to be added to the/etc/sudoers
file. - The permissions of the
/dev/zfs
device have changed from0600
to0666
to let ZFS do access control in kernel space and makezfs allow
andzfs unallow
work properly. If you have been changing permissions / group owner of the device file yourself your change won't work correctly anymore and breaks proper behavior ofzfs allow
. From this release forward you should be able to satisfy your use-case with the officially supportedzfs allow
command. - By default task queues are now dynamic and worker threads will be created and destroyed as needed. This allows the system to automatically tune itself to ensure the optimal number of threads are used for the active workload which can result in a performance improvement.
- Accessing snapshots over NFS now requires the
crossmnt
option be added to the/etc/exports
file. Thenfsd
service is now aware that snapshots are different filesystems. A result of this change is that older distributions, like CentOS 6.x, can no longer provide access to snapshots over NFS.
Supported Kernels
- Compatible with 2.6.32 - 4.12 Linux kernels.
Module Options
- The default values for the module options were selected to yield good performance for the majority of workloads and configurations. They should not need to be tuned for most systems but are available for performance analysis and tuning. See the
zfs-module-parameters(5)
man page for a more complete description of the options and what they control. - Added:
- dbuf_cache_hiwater_pct - Percent over
dbuf_cache_max_bytes
when dbufs must be evicted - dbuf_cache_lowater_pct - Percent below
dbuf_cache_max_bytes
when dbufs stop being evicted - dbuf_cache_max_bytes - Maximum size in bytes of the dbuf cache
- dbuf_cache_max_shift - Cap the size of the dbuf cache to a log2 fraction of arc size
- dmu_object_alloc_chunk_shift - CPU-specific allocator grabs 2^N objects at once
- send_holes_without_birth_time - Ignore hole_birth txg for zfs send
- zfetch_max_distance - Max bytes to prefetch per stream
- zfs_abd_scatter_enabled - Toggle whether ABD allocations must be linear
- zfs_abd_scatter_max_order - Maximum order allocation used for a scatter ABD
- zfs_arc_dnode_limit - Minimum bytes of dnodes in ARC
- zfs_arc_dnode_limit_percent - Percent of ARC meta buffers for dnodes
- zfs_arc_dnode_reduce_percent - Percentage of excess dnodes to try to unpin
- zfs_arc_meta_limit_percent - Percent of arc size for arc meta limit
- zfs_arc_pc_percent - Percent of pagecache to reclaim ARC to
- zfs_compressed_arc_enabled - Disable compressed arc buffers
- zfs_deadman_checktime_ms - Dead I/O check interval in milliseconds
- zfs_delete_blocks - Delete files larger than N blocks asynchronously
- zfs_dmu_offset_next_sync - Enable forcing txg sync to find holes
- zfs_free_bpobj_enabled - Enable processing of the free_bpobj
- zfs_metaslab_segment_weight_enabled - Enable segment-based metaslab selection
- zfs_metaslab_switch_threshold - Metaslab selection max buckets before switching
- zfs_multihost_fail_intervals - Max allowed period without a successful mmp write
- zfs_multihost_history - Historical statistics for last N multihost writes
- zfs_multihost_import_intervals - Number of zfs_multihost_interval periods to wait for activity
- zfs_multihost_interval - Milliseconds between mmp writes to each leaf
- zfs_multilist_num_sublists - Number of sublists used in each multilist
- zfs_per_txg_dirty_frees_percent - Percentage of dirtied blocks from frees in one TXG
- zfs_sync_taskq_batch_pct - Percentage of CPUs to run an IO worker thread
- zfs_vdev_mirror_non_rotating_inc - Non-rotating media load increment for non-seeking I/O's
- zfs_vdev_mirror_non_rotating_seek_inc - Non-rotating media load increment for seeking I/O's
- zfs_vdev_mirror_rotating_inc - Rotating media load increment for non-seeking I/O's
- zfs_vdev_mirror_rotating_seek_inc - Rotating media load increment for seeking I/O's
- zfs_vdev_mirror_rotating_seek_offset - Offset in bytes from the last I/O to trigger seek increment
- zfs_vdev_queue_depth_pct - Queue depth percentage for each top-level vdev
- zfs_vdev_raidz_impl - Select RAIDZ implementation.
- zil_slog_bulk - Limit in bytes slog sync writes per commit
- zio_dva_throttle_enabled - Throttle block allocations in the ZIO pipeline
- zvol_request_sync - Synchronously handle bio requests
- zvol_threads - Max number of threads to handle I/O requests
- zvol_volmode - Default volmode property value
- spl_max_show_tasks - Max number of tasks shown in taskq proc
- spl_panic_halt - Cause kernel panic on assertion failures
- dbuf_cache_hiwater_pct - Percent over
- Removed:
- l2arc_nocompress - Skip compressing L2ARC buffers
- zfetch_block_cap - Max number of blocks to fetch at a time
- zfs_arc_num_sublists_per_state - Number of sublists used in each of the ARC state lists
- zfs_disable_dup_eviction - Disable duplicate buffer eviction
- zfs_vdev_mirror_switch_us - Switch mirrors every N microseconds
- zil_slog_limit - Max commit bytes to separate log device
- Changed:
- zfs_admin_snapshot - Enable mkdir/rmdir/mv in .zfs/snapshot