-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
scalability issues with zvol_create_minors(pool) #2217
Comments
Correction: zvol_remove_minors() is also called at zvol/snapshot destroy time. Also, there is no objset namespace scan in zvol_remove_minors(), instead, the in-memory zvol state list is used. So, the scalability issues discussed above affect the zvol_create_minors() code paths only. |
The solution that seems to work well and does not impact the existing 'snapdev' related functionality is as follows. The dmu_objset_find() traversal can be effectively split in two parts. First, generic, traverses datasets and their children. Second, specialized, traverses snapshots of datasets whose 'snapdev' property is set to 'visible'. The second part of the traversal is invoked from the zvol_create_minors_cb(). This does not violate the rules for invoking dmu_objset_find(). This sub-traversal does not traverse the children, and it skips the dataset itself to avoid infinite recursion. This approach results in very limited additional recursion (one or two small stack frames), and it effectively prunes traversal of snapshots for all datasets whose 'snapdev' property is not set to 'visible'. |
#2388 is related to this. |
before traversing snapshots
before traversing snapshots
before traversing snapshots
before traversing snapshots
before traversing snapshots
before traversing snapshots
before traversing snapshots
before traversing snapshots
before traversing snapshots
before traversing snapshots
before traversing snapshots
property before traversing snapshots of a dataset zfsonlinux issue openzfs#3681 - lock order inversion between zvol_open() and dsl_pool_sync()...zvol_rename_minors() Add support for asynchronous zvol minor operations. Create a per-pool zvol taskq for asynchronous zvol tasks. There are a few key design decisions to be aware of. * Each taskq must be single threaded to ensure tasks are always processed in the order in which they were dispatched. * There is a taskq per-pool in order to keep the pools independent. This way if one pool is suspended it will not impact another. * The preferred location to dispatch a zvol minor task is a sync task. In this context there is easy access to the spa_t and minimal error handling is required because the sync task must succeed. Support for synchronous zvol minor operations address issue openzfs#3681. Signed-off-by: Brian Behlendorf <[email protected]>
property before traversing snapshots of a dataset zfsonlinux issue openzfs#3681 - lock order inversion between zvol_open() and dsl_pool_sync()...zvol_rename_minors() Add support for asynchronous zvol minor operations. Create a per-pool zvol taskq for asynchronous zvol tasks. There are a few key design decisions to be aware of. * Each taskq must be single threaded to ensure tasks are always processed in the order in which they were dispatched. * There is a taskq per-pool in order to keep the pools independent. This way if one pool is suspended it will not impact another. * The preferred location to dispatch a zvol minor task is a sync task. In this context there is easy access to the spa_t and minimal error handling is required because the sync task must succeed. Support for synchronous zvol minor operations address issue openzfs#3681. Signed-off-by: Brian Behlendorf <[email protected]>
property before traversing snapshots of a dataset zfsonlinux issue openzfs#3681 - lock order inversion between zvol_open() and dsl_pool_sync()...zvol_rename_minors() Add support for asynchronous zvol minor operations. Create a per-pool zvol taskq for asynchronous zvol tasks. There are a few key design decisions to be aware of. * Each taskq must be single threaded to ensure tasks are always processed in the order in which they were dispatched. * There is a taskq per-pool in order to keep the pools independent. This way if one pool is suspended it will not impact another. * The preferred location to dispatch a zvol minor task is a sync task. In this context there is easy access to the spa_t and minimal error handling is required because the sync task must succeed. Support for asynchronous zvol minor operations address issue openzfs#3681. Signed-off-by: Brian Behlendorf <[email protected]>
property before traversing snapshots of a dataset zfsonlinux issue openzfs#3681 - lock order inversion between zvol_open() and dsl_pool_sync()...zvol_rename_minors() Add support for asynchronous zvol minor operations. Create a per-pool zvol taskq for asynchronous zvol tasks. There are a few key design decisions to be aware of. * Each taskq must be single threaded to ensure tasks are always processed in the order in which they were dispatched. * There is a taskq per-pool in order to keep the pools independent. This way if one pool is suspended it will not impact another. * The preferred location to dispatch a zvol minor task is a sync task. In this context there is easy access to the spa_t and minimal error handling is required because the sync task must succeed. Support for asynchronous zvol minor operations address issue openzfs#3681. Signed-off-by: Brian Behlendorf <[email protected]>
property before traversing snapshots of a dataset zfsonlinux issue openzfs#3681 - lock order inversion between zvol_open() and dsl_pool_sync()...zvol_rename_minors() Add support for asynchronous zvol minor operations. Create a per-pool zvol taskq for asynchronous zvol tasks. There are a few key design decisions to be aware of. * Each taskq must be single threaded to ensure tasks are always processed in the order in which they were dispatched. * There is a taskq per-pool in order to keep the pools independent. This way if one pool is suspended it will not impact another. * The preferred location to dispatch a zvol minor task is a sync task. In this context there is easy access to the spa_t and minimal error handling is required because the sync task must succeed. Support for asynchronous zvol minor operations address issue openzfs#3681. Signed-off-by: Brian Behlendorf <[email protected]>
property before traversing snapshots of a dataset zfsonlinux issue openzfs#3681 - lock order inversion between zvol_open() and dsl_pool_sync()...zvol_rename_minors() Add support for asynchronous zvol minor operations. Create a per-pool zvol taskq for asynchronous zvol tasks. There are a few key design decisions to be aware of. * Each taskq must be single threaded to ensure tasks are always processed in the order in which they were dispatched. * There is a taskq per-pool in order to keep the pools independent. This way if one pool is suspended it will not impact another. * The preferred location to dispatch a zvol minor task is a sync task. In this context there is easy access to the spa_t and minimal error handling is required because the sync task must succeed. Support for asynchronous zvol minor operations address issue openzfs#3681. Signed-off-by: Brian Behlendorf <[email protected]>
property before traversing snapshots of a dataset zfsonlinux issue openzfs#3681 - lock order inversion between zvol_open() and dsl_pool_sync()...zvol_rename_minors() Add support for asynchronous zvol minor operations. Create a per-pool zvol taskq for asynchronous zvol tasks. There are a few key design decisions to be aware of. * Each taskq must be single threaded to ensure tasks are always processed in the order in which they were dispatched. * There is a taskq per-pool in order to keep the pools independent. This way if one pool is suspended it will not impact another. * The preferred location to dispatch a zvol minor task is a sync task. In this context there is easy access to the spa_t and minimal error handling is required because the sync task must succeed. Support for asynchronous zvol minor operations address issue openzfs#3681. Signed-off-by: Brian Behlendorf <[email protected]>
property before traversing snapshots of a dataset zfsonlinux issue openzfs#3681 - lock order inversion between zvol_open() and dsl_pool_sync()...zvol_rename_minors() Add support for asynchronous zvol minor operations. Create a per-pool zvol taskq for asynchronous zvol tasks. There are a few key design decisions to be aware of. * Each taskq must be single threaded to ensure tasks are always processed in the order in which they were dispatched. * There is a taskq per-pool in order to keep the pools independent. This way if one pool is suspended it will not impact another. * The preferred location to dispatch a zvol minor task is a sync task. In this context there is easy access to the spa_t and minimal error handling is required because the sync task must succeed. Support for asynchronous zvol minor operations address issue openzfs#3681. Signed-off-by: Brian Behlendorf <[email protected]>
property before traversing snapshots of a dataset zfsonlinux issue openzfs#3681 - lock order inversion between zvol_open() and dsl_pool_sync()...zvol_rename_minors() Add support for asynchronous zvol minor operations. Create a per-pool zvol taskq for asynchronous zvol tasks. There are a few key design decisions to be aware of. * Each taskq must be single threaded to ensure tasks are always processed in the order in which they were dispatched. * There is a taskq per-pool in order to keep the pools independent. This way if one pool is suspended it will not impact another. * The preferred location to dispatch a zvol minor task is a sync task. In this context there is easy access to the spa_t and minimal error handling is required because the sync task must succeed. Support for asynchronous zvol minor operations address issue openzfs#3681. Signed-off-by: Brian Behlendorf <[email protected]>
property before traversing snapshots of a dataset zfsonlinux issue openzfs#3681 - lock order inversion between zvol_open() and dsl_pool_sync()...zvol_rename_minors() Add support for asynchronous zvol minor operations. Create a per-pool zvol taskq for asynchronous zvol tasks. There are a few key design decisions to be aware of. * Each taskq must be single threaded to ensure tasks are always processed in the order in which they were dispatched. * There is a taskq per-pool in order to keep the pools independent. This way if one pool is suspended it will not impact another. * The preferred location to dispatch a zvol minor task is a sync task. In this context there is easy access to the spa_t and minimal error handling is required because the sync task must succeed. Support for asynchronous zvol minor operations address issue openzfs#3681. Signed-off-by: Brian Behlendorf <[email protected]>
property before traversing snapshots of a dataset zfsonlinux issue openzfs#3681 - lock order inversion between zvol_open() and dsl_pool_sync()...zvol_rename_minors() Add support for asynchronous zvol minor operations. Create a per-pool zvol taskq for asynchronous zvol tasks. There are a few key design decisions to be aware of. * Each taskq must be single threaded to ensure tasks are always processed in the order in which they were dispatched. * There is a taskq per-pool in order to keep the pools independent. This way if one pool is suspended it will not impact another. * The preferred location to dispatch a zvol minor task is a sync task. In this context there is easy access to the spa_t and minimal error handling is required because the sync task must succeed. Support for asynchronous zvol minor operations address issue openzfs#3681. Signed-off-by: Brian Behlendorf <[email protected]>
property before traversing snapshots of a dataset zfsonlinux issue openzfs#3681 - lock order inversion between zvol_open() and dsl_pool_sync()...zvol_rename_minors() Add support for asynchronous zvol minor operations. Create a per-pool zvol taskq for asynchronous zvol tasks. There are a few key design decisions to be aware of. * Each taskq must be single threaded to ensure tasks are always processed in the order in which they were dispatched. * There is a taskq per-pool in order to keep the pools independent. This way if one pool is suspended it will not impact another. * The preferred location to dispatch a zvol minor task is a sync task. In this context there is easy access to the spa_t and minimal error handling is required because the sync task must succeed. Support for asynchronous zvol minor operations address issue openzfs#3681. Signed-off-by: Brian Behlendorf <[email protected]>
property before traversing snapshots of a dataset zfsonlinux issue openzfs#3681 - lock order inversion between zvol_open() and dsl_pool_sync()...zvol_rename_minors() Add support for asynchronous zvol minor operations. Create a per-pool zvol taskq for asynchronous zvol tasks. There are a few key design decisions to be aware of. * Each taskq must be single threaded to ensure tasks are always processed in the order in which they were dispatched. * There is a taskq per-pool in order to keep the pools independent. This way if one pool is suspended it will not impact another. * The preferred location to dispatch a zvol minor task is a sync task. In this context there is easy access to the spa_t and minimal error handling is required because the sync task must succeed. Support for asynchronous zvol minor operations address issue openzfs#3681. Signed-off-by: Brian Behlendorf <[email protected]>
property before traversing snapshots of a dataset zfsonlinux issue openzfs#3681 - lock order inversion between zvol_open() and dsl_pool_sync()...zvol_rename_minors() Add support for asynchronous zvol minor operations. Create a per-pool zvol taskq for asynchronous zvol tasks. There are a few key design decisions to be aware of. * Each taskq must be single threaded to ensure tasks are always processed in the order in which they were dispatched. * There is a taskq per-pool in order to keep the pools independent. This way if one pool is suspended it will not impact another. * The preferred location to dispatch a zvol minor task is a sync task. In this context there is easy access to the spa_t and minimal error handling is required because the sync task must succeed. Support for asynchronous zvol minor operations address issue openzfs#3681. Signed-off-by: Brian Behlendorf <[email protected]>
zfsonlinux issue #2217 - zvol minor operations: check snapdev property before traversing snapshots of a dataset zfsonlinux issue #3681 - lock order inversion between zvol_open() and dsl_pool_sync()...zvol_rename_minors() Create a per-pool zvol taskq for asynchronous zvol tasks. There are a few key design decisions to be aware of. * Each taskq must be single threaded to ensure tasks are always processed in the order in which they were dispatched. * There is a taskq per-pool in order to keep the pools independent. This way if one pool is suspended it will not impact another. * The preferred location to dispatch a zvol minor task is a sync task. In this context there is easy access to the spa_t and minimal error handling is required because the sync task must succeed. Support for asynchronous zvol minor operations address issue #3681. Signed-off-by: Boris Protopopov <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes #2217 Closes #3678 Closes #3681
With this change all ZVOL updates are initiated from the SPA sync context instead of a mix of the sync and open contexts. The updates are queued to be applied by a dedicated thread in the original order. This should ensure that ZVOLs always accurately reflect the corresponding datasets. ZFS ioctl operations wait on the mentioned thread to complete its work. Thus, the illusion of the synchronous ZVOL update is preserved. At the same time, the SPA sync thread never blocks on ZVOL related operations avoiding problems like reported in bug 203864. This change is based on earlier work in the same direction: D7179 and D14669 by Anthoine Bourgeois. D7179 tried to perform ZVOL operations in the open context and that opened races between them. D14669 uses a design very similar to this change but with different implementation details. This change also heavily borrows from similar code in ZoL, but there are many differences too. See: - openzfs/zfs@a0bd735 - openzfs/zfs#3681 - openzfs/zfs#2217 PR: 203864 MFC after: 5 weeks Sponsored by: CyberSecure Differential Revision: https://reviews.freebsd.org/D23478 git-svn-id: svn+ssh://svn.freebsd.org/base/head@362047 ccf9f872-aa2e-dd11-9fc8-001c23d0bc1f
With this change all ZVOL updates are initiated from the SPA sync context instead of a mix of the sync and open contexts. The updates are queued to be applied by a dedicated thread in the original order. This should ensure that ZVOLs always accurately reflect the corresponding datasets. ZFS ioctl operations wait on the mentioned thread to complete its work. Thus, the illusion of the synchronous ZVOL update is preserved. At the same time, the SPA sync thread never blocks on ZVOL related operations avoiding problems like reported in bug 203864. This change is based on earlier work in the same direction: D7179 and D14669 by Anthoine Bourgeois. D7179 tried to perform ZVOL operations in the open context and that opened races between them. D14669 uses a design very similar to this change but with different implementation details. This change also heavily borrows from similar code in ZoL, but there are many differences too. See: - openzfs/zfs@a0bd735 - openzfs/zfs#3681 - openzfs/zfs#2217 PR: 203864 MFC after: 5 weeks Sponsored by: CyberSecure Differential Revision: https://reviews.freebsd.org/D23478
With this change all ZVOL updates are initiated from the SPA sync context instead of a mix of the sync and open contexts. The updates are queued to be applied by a dedicated thread in the original order. This should ensure that ZVOLs always accurately reflect the corresponding datasets. ZFS ioctl operations wait on the mentioned thread to complete its work. Thus, the illusion of the synchronous ZVOL update is preserved. At the same time, the SPA sync thread never blocks on ZVOL related operations avoiding problems like reported in bug 203864. This change is based on earlier work in the same direction: D7179 and D14669 by Anthoine Bourgeois. D7179 tried to perform ZVOL operations in the open context and that opened races between them. D14669 uses a design very similar to this change but with different implementation details. This change also heavily borrows from similar code in ZoL, but there are many differences too. See: - openzfs/zfs@a0bd735 - openzfs/zfs#3681 - openzfs/zfs#2217 PR: 203864 MFC after: 5 weeks Sponsored by: CyberSecure Differential Revision: https://reviews.freebsd.org/D23478
With this change all ZVOL updates are initiated from the SPA sync context instead of a mix of the sync and open contexts. The updates are queued to be applied by a dedicated thread in the original order. This should ensure that ZVOLs always accurately reflect the corresponding datasets. ZFS ioctl operations wait on the mentioned thread to complete its work. Thus, the illusion of the synchronous ZVOL update is preserved. At the same time, the SPA sync thread never blocks on ZVOL related operations avoiding problems like reported in bug 203864. This change is based on earlier work in the same direction: D7179 and D14669 by Anthoine Bourgeois. D7179 tried to perform ZVOL operations in the open context and that opened races between them. D14669 uses a design very similar to this change but with different implementation details. This change also heavily borrows from similar code in ZoL, but there are many differences too. See: - openzfs/zfs@a0bd735 - openzfs/zfs#3681 - openzfs/zfs#2217 PR: 203864 MFC after: 5 weeks Sponsored by: CyberSecure Differential Revision: https://reviews.freebsd.org/D23478 git-svn-id: https://svn.freebsd.org/base/head@362047 ccf9f872-aa2e-dd11-9fc8-001c23d0bc1f
With this change all ZVOL updates are initiated from the SPA sync context instead of a mix of the sync and open contexts. The updates are queued to be applied by a dedicated thread in the original order. This should ensure that ZVOLs always accurately reflect the corresponding datasets. ZFS ioctl operations wait on the mentioned thread to complete its work. Thus, the illusion of the synchronous ZVOL update is preserved. At the same time, the SPA sync thread never blocks on ZVOL related operations avoiding problems like reported in bug 203864. This change is based on earlier work in the same direction: D7179 and D14669 by Anthoine Bourgeois. D7179 tried to perform ZVOL operations in the open context and that opened races between them. D14669 uses a design very similar to this change but with different implementation details. This change also heavily borrows from similar code in ZoL, but there are many differences too. See: - openzfs/zfs@a0bd735 - openzfs/zfs#3681 - openzfs/zfs#2217 PR: 203864 MFC after: 5 weeks Sponsored by: CyberSecure Differential Revision: https://reviews.freebsd.org/D23478
zvol_create_minors() is a generic function that recursively scans pool namespace and invokes a callback for each object set found. The goal is to create device nodes for all the zvols and their snapshots if the snapdev property is set on the corresponding zvols.
zvol_create_minors() is invoked at spa first open/import, and at zvol, clone, and zvol snapshot create time. Similar function (zvol_remove_minors()) is invoked at last close/export time to remove the device nodes created earlier. At snapshot create time, it should only be invoked on the snapshots under processing, whereas currently, it is called on the whole pool at the end of zfs_ioc_snapshot(). This invocation should be removed.
However, other issues still remain. Specifically, if there are many zvols and snapshots, all of them will have to be visited, and device nodes created/removed (which includes interaction with udev and dynamic creation/removal of the /dev nodes). One issue here is that regardless of the value of the snapdev property ('visible' or 'hidden'), all the snapshots are visited. As the total number of snapshots can be much larger than the total number of zvols/clones, this can present a scalability problem.
The reason this happens is that zvol_create_minors() uses generic pool namespace traversal routine (dmu_fobjset_find()) that does not have a way of filtering or pruning its traversal of the objset namespace. It does not interact with the callback it invokes on the objsets under traversal in any way other than aborting the entire traversal if a callback returns non-zero value. As a result, if DS_FIND_SNAPSHOTS flag is passed, the callback will be invoked on all the snapshots of a zvol, whether of not this zvol has znapdev='visible' set (and each callback will get the zvol name and will check for the snapdev property value).
This can be addressed either by amending the objset space traversal to filter (by object type) and/or prune (if zvol does not expose the snaps, don't iterate over the snaps). Alternatively, one can filter the zvols with snapdev='hidden' in the callback, if the name of the zvol is passed in the (currently unused) argument to the callback. The latter approach cannot eliminate the unneeded snapshot traversal but it can make the callbacks very fast.
To summarize, the zvol_create_minors(poolname) at the end of zfs_ioc_snapshot() should be removed, and the overhead during spa first open/import should be controlled by filtering/pruning objset namespace scan to avoid scanning zvol snapshots if the snapdev property of the parent zvol is det to 'hidden'.
Unfortunately, unless one keeps the history of changes of snapdev property for the volumes, one is still likely to have to scan all the snapshots on export/last close to make sure the /dev is cleaned up.
The text was updated successfully, but these errors were encountered: