Timeout waiting for ZVOL device to be created #9380

prakashsurya · 2019-09-30T17:16:59Z

We've seen cases where after creating a ZVOL, the ZVOL device node in
"/dev" isn't generated after 20 seconds of waiting, which is the point
at which our applications gives up on waiting and reports an error.

The workload when this occurs is to "refresh" 400+ ZVOLs roughly at the
same time, based on a policy set by the user. This refresh operation
will destroy the ZVOL, and re-create it based on a snapshot.

When this occurs, we see many hundreds of entries on the "z_zvol" taskq
(based on inspection of the /proc/spl/taskq-all file). Many of the
entries on the taskq end up in the "zvol_remove_minors_impl" function,
and I've measured the latency of that function:

Function = zvol_remove_minors_impl
msecs               : count     distribution
    0 -> 1          : 0        |                                        |
    2 -> 3          : 0        |                                        |
    4 -> 7          : 1        |                                        |
    8 -> 15         : 0        |                                        |
   16 -> 31         : 0        |                                        |
   32 -> 63         : 0        |                                        |
   64 -> 127        : 1        |                                        |
  128 -> 255        : 45       |****************************************|
  256 -> 511        : 5        |****                                    |

That data is from a 10 second sample, using the BCC "funclatency" tool.
As we can see, in this 10 second sample, most calls took 128ms at a
minimum. Thus, some basic math tells us that in any 20 second interval,
we could only process at most about 150 removals, which is much less
than the 400+ that'll occur based on the workload.

As a result of this, and since all ZVOL minor operations will go through
the single threaded "z_zvol" taskq, the latency for creating a single
ZVOL device can be unreasonably large due to other ZVOL activity on the
system. In our case, it's large enough to cause the application to
generate an error and fail the operation.

When profiling the "zvol_remove_minors_impl" function, I saw that most
of the time in the function was spent off-cpu, blocked in the function
"taskq_wait_outstanding". How this works, is "zvol_remove_minors_impl"
will dispatch calls to "zvol_free" using the "system_taskq", and then
the "taskq_wait_outstanding" function is used to wait for all of those
dispatched calls to occur before "zvol_remove_minors_impl" will return.

As far as I can tell, "zvol_remove_minors_impl" doesn't necessarily have
to wait for all calls to "zvol_free" to occur before it returns. Thus,
this change removes the call to "taskq_wait_oustanding", so that calls
to "zvol_free" don't affect the latency of "zvol_remove_minors_impl".

prakashsurya · 2019-09-30T17:17:22Z

cc: @sdimitro @jgallag88

behlendorf

doesn't necessarily have to wait for all calls to "zvol_free" to occur before it returns.

Since the entire removal process is asynchronous deferring the free should be OK.

module/zfs/zvol.c

We've seen cases where after creating a ZVOL, the ZVOL device node in "/dev" isn't generated after 20 seconds of waiting, which is the point at which our applications gives up on waiting and reports an error. The workload when this occurs is to "refresh" 400+ ZVOLs roughly at the same time, based on a policy set by the user. This refresh operation will destroy the ZVOL, and re-create it based on a snapshot. When this occurs, we see many hundreds of entries on the "z_zvol" taskq (based on inspection of the /proc/spl/taskq-all file). Many of the entries on the taskq end up in the "zvol_remove_minors_impl" function, and I've measured the latency of that function: Function = zvol_remove_minors_impl msecs : count distribution 0 -> 1 : 0 | | 2 -> 3 : 0 | | 4 -> 7 : 1 | | 8 -> 15 : 0 | | 16 -> 31 : 0 | | 32 -> 63 : 0 | | 64 -> 127 : 1 | | 128 -> 255 : 45 |****************************************| 256 -> 511 : 5 |**** | That data is from a 10 second sample, using the BCC "funclatency" tool. As we can see, in this 10 second sample, most calls took 128ms at a minimum. Thus, some basic math tells us that in any 20 second interval, we could only process at most about 150 removals, which is much less than the 400+ that'll occur based on the workload. As a result of this, and since all ZVOL minor operations will go through the single threaded "z_zvol" taskq, the latency for creating a single ZVOL device can be unreasonably large due to other ZVOL activity on the system. In our case, it's large enough to cause the application to generate an error and fail the operation. When profiling the "zvol_remove_minors_impl" function, I saw that most of the time in the function was spent off-cpu, blocked in the function "taskq_wait_outstanding". How this works, is "zvol_remove_minors_impl" will dispatch calls to "zvol_free" using the "system_taskq", and then the "taskq_wait_outstanding" function is used to wait for all of those dispatched calls to occur before "zvol_remove_minors_impl" will return. As far as I can tell, "zvol_remove_minors_impl" doesn't necessarily have to wait for all calls to "zvol_free" to occur before it returns. Thus, this change removes the call to "taskq_wait_oustanding", so that calls to "zvol_free" don't affect the latency of "zvol_remove_minors_impl". Signed-off-by: Prakash Surya <[email protected]>

prakashsurya · 2019-10-01T01:40:43Z

The failure in the buildbot/Debian 9 x86_64 (BUILD) test appears to be environmental, and not due to this change request:

checking for zlib.h... no
configure: error: in `/var/lib/buildbot/slaves/zfs/Debian_9_x86_64__BUILD_/build/zfs':
configure: error: 
	*** zlib.h missing, zlib-devel package required

codecov · 2019-10-01T04:12:37Z

Codecov Report

Merging #9380 into master will decrease coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #9380      +/-   ##
==========================================
- Coverage   79.07%   79.06%   -0.01%     
==========================================
  Files         404      404              
  Lines      122539   122538       -1     
==========================================
- Hits        96892    96880      -12     
- Misses      25647    25658      +11

Flag	Coverage Δ
#kernel	`79.77% <100%> (ø)`	⬆️
#user	`66.67% <ø> (+0.33%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3283f13...5260983. Read the comment docs.

prakashsurya · 2019-10-01T04:22:19Z

The failure in the buildbot/Ubuntu 18.04 x86_64 (TEST) test failed the test case trim/trim_config:

Tests with results other than PASS that are unexpected:
    FAIL trim/trim_config (expected PASS)

The test log shows this:

02:52:51.38 Failure /mnt/trim-vdev1 is 768 MB which is not -gt than 768 MB

I'm unfamiliar with this test, but I'd be surprised if my change is causing this. Please let me know if I should look more into this failure, otherwise I'll assume it's a flaky test.

prakashsurya · 2019-10-01T17:34:52Z

@behlendorf anything more that I need to do for this?

behlendorf · 2019-10-01T18:58:55Z

@prakashsurya nope, looks good. I'll get it merged, thanks!

behlendorf added the Status: Code Review Needed Ready for review and testing label Sep 30, 2019

behlendorf reviewed Sep 30, 2019

View reviewed changes

module/zfs/zvol.c Show resolved Hide resolved

behlendorf approved these changes Sep 30, 2019

View reviewed changes

jgallag88 approved these changes Sep 30, 2019

View reviewed changes

behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Oct 1, 2019

behlendorf merged commit 99573cc into openzfs:master Oct 1, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Timeout waiting for ZVOL device to be created #9380

Timeout waiting for ZVOL device to be created #9380

prakashsurya commented Sep 30, 2019

prakashsurya commented Sep 30, 2019

behlendorf left a comment

prakashsurya commented Oct 1, 2019

codecov bot commented Oct 1, 2019 •

edited

Loading

prakashsurya commented Oct 1, 2019

prakashsurya commented Oct 1, 2019

behlendorf commented Oct 1, 2019

Timeout waiting for ZVOL device to be created #9380

Timeout waiting for ZVOL device to be created #9380

Conversation

prakashsurya commented Sep 30, 2019

prakashsurya commented Sep 30, 2019

behlendorf left a comment

Choose a reason for hiding this comment

prakashsurya commented Oct 1, 2019

codecov bot commented Oct 1, 2019 • edited Loading

Codecov Report

prakashsurya commented Oct 1, 2019

prakashsurya commented Oct 1, 2019

behlendorf commented Oct 1, 2019

codecov bot commented Oct 1, 2019 •

edited

Loading