Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

traverse_visitbp() stack usage #675

Closed
behlendorf opened this issue Apr 18, 2012 · 2 comments
Closed

traverse_visitbp() stack usage #675

behlendorf opened this issue Apr 18, 2012 · 2 comments

Comments

@behlendorf
Copy link
Contributor

Observed when running zfs on top of dm multipath devices. I suspect we still are a little stack heavy can trashed the stack, this was with RHEL6.2. Further savings will need to be found, after a reboot we didn't hit the problem again.

SPLError: 16311:0:(arc.c:1094:remove_reference()) ASSERTION(state == arc_anon || MUTEX_HELD(hash_lock)) failed
SPLError: 16311:0:(arc.c:1094:remove_reference()) SPL PANIC
SPL: Showing stack for process 16311
Pid: 16311, comm: zpool Tainted: P        W  ----------------   2.6.32-220.7.1.7chaos.ch5.x86_64 #1
Call Trace:
 [] ? spl_debug_dumpstack+0x27/0x40 [spl]
 [] ? spl_debug_bug+0x81/0xd0 [spl]
 [] ? remove_reference+0x23f/0x2e0 [zfs]
 [] ? arc_read+0x13f/0x2b0 [zfs]
 [] ? arc_buf_remove_ref+0x155/0x3c0 [zfs]
 [] ? traverse_visitbp+0x1cb/0x5e0 [zfs]
 [] ? arc_read+0x13f/0x2b0 [zfs]
 [] ? traverse_visitbp+0x31c/0x5e0 [zfs]
 [] ? arc_read+0x13f/0x2b0 [zfs]
 [] ? traverse_visitbp+0x31c/0x5e0 [zfs]
 [] ? traverse_dnode+0x80/0x110 [zfs]
 [] ? traverse_visitbp+0x400/0x5e0 [zfs]
 [] ? arc_read+0x13f/0x2b0 [zfs]
 [] ? traverse_visitbp+0x31c/0x5e0 [zfs]
 [] ? arc_read+0x13f/0x2b0 [zfs]
 [] ? traverse_visitbp+0x31c/0x5e0 [zfs]
 [] ? arc_read+0x13f/0x2b0 [zfs]
 [] ? traverse_visitbp+0x31c/0x5e0 [zfs]
 [] ? arc_read+0x13f/0x2b0 [zfs]
 [] ? traverse_visitbp+0x31c/0x5e0 [zfs]
 [] ? arc_read+0x13f/0x2b0 [zfs]
 [] ? traverse_visitbp+0x31c/0x5e0 [zfs]
 [] ? arc_read+0x13f/0x2b0 [zfs]
 [] ? traverse_visitbp+0x31c/0x5e0 [zfs]
 [] ? traverse_dnode+0x80/0x110 [zfs]
 [] ? traverse_visitbp+0x4a6/0x5e0 [zfs]
 [] ? __wake_up+0x53/0x70
 [] ? traverse_impl+0x172/0x320 [zfs]
 [] ? dsl_dataset_hold_ref+0x238/0x340 [zfs]
 [] ? spa_load_verify_cb+0x0/0xb0 [zfs]
 [] ? traverse_dataset+0x37/0x40 [zfs]
 [] ? traverse_pool+0x234/0x300 [zfs]
 [] ? spa_load_verify_cb+0x0/0xb0 [zfs]
 [] ? dca_find_provider_by_dev+0x1/0xd0 [dca]
 [] ? spa_load_verify+0x94/0x270 [zfs]
 [] ? spa_load+0x11b7/0x1700 [zfs]
 [] ? nvpair_value_common+0xf9/0x140 [znvpair]
 [] ? spa_load_best+0x4e/0x230 [zfs]
 [] ? spa_import+0x194/0x750 [zfs]
 [] ? nvpair_value_common+0xf9/0x140 [znvpair]
 [] ? nvlist_lookup_common+0x84/0xd0 [znvpair]
 [] ? zfs_ioc_pool_import+0xf4/0x130 [zfs]
 [] ? zfsdev_ioctl+0xfd/0x1d0 [zfs]
 [] ? vfs_ioctl+0x22/0xa0
 [] ? unmap_region+0x110/0x130
 [] ? do_vfs_ioctl+0x84/0x580
 [] ? sys_ioctl+0x81/0xa0
 [] ? system_call_fastpath+0x16/0x1b
@behlendorf
Copy link
Contributor Author

Below is a real world example of traverse_visitbp() causing a stack overrun. It's responsible for nearly 2k of the stack, if this recursive function could be reworked to an iterative implementation it would help considerably.

        Depth    Size   Location    (57 entries)
        -----    ----   --------
  0)     8288     128   __rmqueue+0xf4/0x670
  1)     8160     272   get_page_from_freelist+0x802/0x9a0
  2)     7888     288   __alloc_pages_nodemask+0x1f5/0xb40
  3)     7600      80   alloc_pages_current+0xb0/0x120
  4)     7520      80   new_slab+0x2c0/0x390
  5)     7440     256   __slab_alloc+0x3a2/0x58f
  6)     7184      80   kmem_cache_alloc+0x237/0x290
  7)     7104      16   mempool_alloc_slab+0x15/0x20
  8)     7088     128   mempool_alloc+0x68/0x180
  9)     6960     208   get_request+0x3ad/0xb30
 10)     6752      96   blk_queue_bio+0x7d/0x550
 11)     6656      48   generic_make_request+0xc2/0x110
 12)     6608     112   submit_bio+0x85/0x110
 13)     6496     192   __vdev_disk_physio+0x36a/0x490 [zfs]
 14)     6304      32   vdev_disk_io_start+0x6c/0x110 [zfs]
 15)     6272      96   zio_vdev_io_start+0xdd/0x510 [zfs]
 16)     6176      80   zio_nowait+0x103/0x350 [zfs]
 17)     6096     208   vdev_raidz_io_start+0x5ff/0xa40 [zfs]
 18)     5888      96   zio_vdev_io_start+0xdd/0x510 [zfs]
 19)     5792      80   zio_nowait+0x103/0x350 [zfs]
 20)     5712     144   vdev_mirror_io_start+0x254/0x400 [zfs]
 21)     5568      96   zio_vdev_io_start+0x26f/0x510 [zfs]
 22)     5472      80   zio_nowait+0x103/0x350 [zfs]
 23)     5392      96   spa_load_verify_cb+0x8d/0xb0 [zfs]
 24)     5296     208   traverse_visitbp+0x282/0x6e0 [zfs]
 25)     5088     208   traverse_visitbp+0x38c/0x6e0 [zfs]
 26)     4880     208   traverse_visitbp+0x38c/0x6e0 [zfs]
 27)     4672     208   traverse_visitbp+0x38c/0x6e0 [zfs]
 28)     4464     208   traverse_visitbp+0x38c/0x6e0 [zfs]
 29)     4256     128   traverse_dnode+0x89/0x150 [zfs]
 30)     4128     208   traverse_visitbp+0x48d/0x6e0 [zfs]
 31)     3920     208   traverse_visitbp+0x38c/0x6e0 [zfs]
 32)     3712     208   traverse_visitbp+0x38c/0x6e0 [zfs]
 33)     3504     208   traverse_visitbp+0x38c/0x6e0 [zfs]
 34)     3296     208   traverse_visitbp+0x38c/0x6e0 [zfs]
 35)     3088     208   traverse_visitbp+0x38c/0x6e0 [zfs]
 36)     2880     208   traverse_visitbp+0x38c/0x6e0 [zfs]
 37)     2672     128   traverse_dnode+0x89/0x150 [zfs]
 38)     2544     208   traverse_visitbp+0x534/0x6e0 [zfs]
 39)     2336     144   traverse_impl+0x17d/0x3d0 [zfs]
 40)     2192      32   traverse_dataset+0x37/0x40 [zfs]
 41)     2160     224   traverse_pool+0x240/0x420 [zfs]
 42)     1936     112   spa_load_verify+0x73/0x221 [zfs]
 43)     1824     256   spa_load+0x11f7/0x1840 [zfs]
 44)     1568     256   spa_load+0xbd5/0x1840 [zfs]
 45)     1312      96   spa_load_best+0x4d/0x200 [zfs]
 46)     1216     144   spa_open_common+0x158/0x480 [zfs]
 47)     1072      16   spa_open+0x13/0x20 [zfs]
 48)     1056     208   dsl_dir_open_spa+0x5fb/0x900 [zfs]
 49)      848     224   dmu_objset_find_spa+0x56/0x710 [zfs]
 50)      624      80   zvol_create_minors+0x10d/0x200 [zfs]
 51)      544      48   zvol_init+0xde/0x120 [zfs]
 52)      496      48   _init+0x22/0x140 [zfs]
 53)      448      16   spl__init+0x13/0x20 [zfs]
 54)      432      48   do_one_initcall+0x12a/0x180
 55)      384     256   sys_init_module+0x116e/0x2260
 56)      128     128   system_call_fastpath+0x16/0x1b

@behlendorf
Copy link
Contributor Author

We've whittled the stack usage here down such that it's no longer a problem. We just need to be careful going forward not to let it grow again.

@behlendorf behlendorf removed this from the 0.6.6 milestone Oct 6, 2014
pcd1193182 pushed a commit to pcd1193182/zfs that referenced this issue Sep 26, 2023
fix nits that will be flagged by rust 1.66's clippy
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant