Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add zfs_sb_prune_compat() function #3501

Closed
wants to merge 2 commits into from

Conversation

behlendorf
Copy link
Contributor

For kernels which do not implement a per-suberblock shrinker the
shrink_dcache_parent() function was used to attempt to reclaim
dentries. This was found not be entirely reliable which could
lead to performance issues on older kernels running meta-data
heavy workloads.

To address this issue a zfs_sb_prune_compat() function has been
added to implement this functionality. It relies on traversing
the list of znodes for a filesystem and adding them to a private
list with a reference held. The private list can then be safely
walked outside the z_znodes_lock to prune dentires and drop the
last reference so the inode can be freed.

This provides the same synchronous behavior as the per-filesystem
shrinker and has the advantage of depending on only long standing
interfaces.

The number of threads in the iput taskq has also been increased
to speed up the handling of asynchronous iputs. This improves
the rate of meta data reclaim regardless of the kernel version.

Signed-off-by: Brian Behlendorf [email protected]

@behlendorf
Copy link
Contributor Author

@dweeezil could you please review this. It address the issues with zfs_sb_prune() for old kernels without a per-filesystem shrinker. Without the patch, on CentOS 6, dbench 64 blows right through the meta limit right up to arc_max since reclaim is largely ineffective. With the patch I can run the same test with the arc_meta_limit cranked down to a tiny value and get the same behavior as newer kernels.

dbench -x -D /tank/dbench/  64 -t 1800

@odoucet
Copy link

odoucet commented Jun 17, 2015

can you be more precise on what "newer kernel" means ? In which version was introduced per-suberblock shrinker ? thank you !

@behlendorf
Copy link
Contributor Author

@odoucet this patch is targeted for kernels older that Linux 3.1 (which is mentioned in a commit comment). Commit 90947b2 which was just merged solves a similar problem for 3.12 and newer kernels.

The number of threads in the iput taskq has been increased to speed
up the number of iputs which can be handled.  This has been observed
to improve the  meta data reclaim regardless of zfs_sb_prune()
implementation in use.

The taskq has also been renamed z_iput to for consistency with the
rest of the I/O pipeline taskqs which are all named z_*.

Signed-off-by: Brian Behlendorf <[email protected]>
@dweeezil
Copy link
Contributor

@behlendorf This LGTM. I generally don't run old enough kernels on my test systems to try it but it sounds like it helped your test scenario. On a semi-related note, one item on my large-scale testing to-do list is to find other candidates for multilist_t and I have a feeling the z_all_znodes list might be a candidate.

For kernels which do not implement a per-suberblock shrinker,
those older than Linux 3.1, the shrink_dcache_parent() function
was used to attempt to reclaim dentries.  This was found not be
entirely reliable and could lead to performance issues on older
kernels running meta-data heavy workloads.

To address this issue a zfs_sb_prune_aliases() function has been
added to implement this functionality.  It relies on traversing
the list of znodes for a filesystem and adding them to a private
list with a reference held.  The private list can then be safely
walked outside the z_znodes_lock to prune dentires and drop the
last reference so the inode can be freed.

This provides the same synchronous behavior as the per-filesystem
shrinker and has the advantage of depending on only long standing
interfaces.

Signed-off-by: Brian Behlendorf <[email protected]>
@behlendorf
Copy link
Contributor Author

@dweeezil refreshed. I took a slightly different approach with the updated version and I like it much better. It also has passed an hour of so of torture testing just fine and I'll leave it running (although I don't expect any issues). If you could look it over again I'd appreciate it.

Initially I was just going to wrap the list_node_t I added with an #ifdef but that looked pretty horrible. Instead I opted not to use a list at all and allocated a small private array of pointers. This does introduce a small memory allocation but it several big advantages.

  • All the compatibility code is now confined to a small block in zfs_vfsops.c. That's good for portability and preventing upstream merge conflicts.
  • Removes the need to take the znode->z_lock since the array is private.
  • Avoids increasing the size of a znode_t in all cases.

Other changes include:

  • Added autoconf check for d_prune_aliases() just to be extra careful.
  • Made the !HAVE_SPLIT_SHRINKER_CALLBACK && !HAVE_SHRINK && !HAVE_D_PRUNE_ALIASES case fatal at build time to ensure we always have a valid pruning mechanism.
  • Renamed zfs_sb_prune_compat() to zfs_sb_prune_aliases().
  • Split the iput taskq tweak in to its own patch where it should have always been.

On a semi-related note, one item on my large-scale testing to-do list is to find other candidates for multilist_t and I have a feeling the z_all_znodes list might be a candidate.

That's an excellent idea. Definitely z_all_znodes is a candidate and that probably makes sense for upstream. But under Linux I think we'd frankly be better off retiring it outright and using the existing super block lists.

@dweeezil
Copy link
Contributor

@behlendorf LGTM. I like the 207a3ae scheme much better. The af8e61d commit also LGTM.

@behlendorf
Copy link
Contributor Author

@dweeezil thanks for the review. I've added your signed-off-by and merged these changes:

218b4e0 Add zfs_sb_prune_aliases() function
4c6a700 Increase the number of iput taskq threads

behlendorf added a commit that referenced this pull request Jun 24, 2015
For kernels which do not implement a per-suberblock shrinker,
those older than Linux 3.1, the shrink_dcache_parent() function
was used to attempt to reclaim dentries.  This was found not be
entirely reliable and could lead to performance issues on older
kernels running meta-data heavy workloads.

To address this issue a zfs_sb_prune_aliases() function has been
added to implement this functionality.  It relies on traversing
the list of znodes for a filesystem and adding them to a private
list with a reference held.  The private list can then be safely
walked outside the z_znodes_lock to prune dentires and drop the
last reference so the inode can be freed.

This provides the same synchronous behavior as the per-filesystem
shrinker and has the advantage of depending on only long standing
interfaces.

Signed-off-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tim Chase <[email protected]>
Closes #3501
@behlendorf behlendorf mentioned this pull request Jul 3, 2015
MorpheusTeam pushed a commit to Xyratex/lustre-stable that referenced this pull request Aug 10, 2015
Updates ZFS and SPL to latest maintence version.  Includes the
following:

Bug Fixes:
* Fix panic due to corrupt nvlist when running utilities
(openzfs/zfs#3335)
* Fix hard lockup due to infinite loop in zfs_zget()
(openzfs/zfs#3349)
* Fix panic on unmount due to iput taskq (openzfs/zfs#3281)
* Improve metadata shrinker performance on pre-3.1 kernels
(openzfs/zfs#3501)
* Linux 4.1 compat: use read_iter() / write_iter()
* Linux 3.12 compat: NUMA-aware per-superblock shrinker
* Fix spurious hung task watchdog stack traces (openzfs/zfs#3402)
* Fix module loading in zfs import systemd service
(openzfs/zfs#3440)
* Fix intermittent libzfs_init() failure to open /dev/zfs
(openzfs/zfs#2556)

Signed-off-by: Nathaniel Clark <[email protected]>
Change-Id: I053087317ff9e5bedc1671bb46062e96bfe6f074
Reviewed-on: http://review.whamcloud.com/15481
Reviewed-by: Alex Zhuravlev <[email protected]>
Tested-by: Jenkins
Reviewed-by: Isaac Huang <[email protected]>
Tested-by: Maloo <[email protected]>
Reviewed-by: Oleg Drokin <[email protected]>
@behlendorf behlendorf deleted the prune-compat branch May 1, 2017 20:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants