Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ASoC: core: delete component->card_list in soc_remove_component only #1191

Closed
wants to merge 1 commit into from

Conversation

bardliao
Copy link
Collaborator

We add component->card_list in the end of soc_probe_component(). In
other words, component->card_list will not be added if there is an
error in the soc_probe_component() function. So we can't delete
component->card_list in the error handling of soc_probe_component().

Signed-off-by: Bard Liao [email protected]

ranj063
ranj063 previously approved these changes Sep 10, 2019
@@ -976,7 +976,6 @@ static void soc_set_name_prefix(struct snd_soc_card *card,
static void soc_cleanup_component(struct snd_soc_component *component)
{
snd_soc_component_set_jack(component, NULL, NULL);
list_del(&component->card_list);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bardliao did you actually got a crash for this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dbaluta Yes, I hit the issue when codec driver return -EPROBE_DEFER from its probe function.

Copy link

@paulstelian97 paulstelian97 Sep 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bardliao You could use INIT_LIST_HEAD early on before adding the component to the list. In that case list_del will work just fine and not cause crashes of any kind (doing list_del on an empty list_head is a no-op).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@paulstelian97 Yes, INIT_LIST_HEAD(&component->card_list); is called by snd_soc_component_initialize() which is before soc_probe_component(). And yes, it is fine if we call list_del() once, but it will crash if we call it twice. When codec probe return -EPROBE_DEFER, kernel will try to re probe it again, and it will crash at the second probe failed.

@@ -976,7 +976,6 @@ static void soc_set_name_prefix(struct snd_soc_card *card,
static void soc_cleanup_component(struct snd_soc_component *component)
{
snd_soc_component_set_jack(component, NULL, NULL);
list_del(&component->card_list);
snd_soc_dapm_free(snd_soc_component_get_dapm(component));
Copy link
Member

@plbossart plbossart Sep 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you follow the logic, then this should also not be done in cleanup_component but in remove_component, when we hit the err_probe in soc_probe_component we shouldn't deal with DAPM

Can you review all changes in 'ASoC: soc-core: add soc_cleanup_component()' and resubmit with a Fixes: tag?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@plbossart I see "And soc_probe_component() doesn't call snd_soc_dapm_free(), but it should" from the commit message of "ASoC: soc-core: add soc_cleanup_component()". It is right after snd_soc_dapm_new_controls() run successfully. And snd_soc_dapm_free() will do nothing if snd_soc_dapm_new_controls() fails. So I think it is fine to keep snd_soc_dapm_free() in the soc_cleanup_component()

@plbossart
Copy link
Member

@bardliao you must have pushed the wrong branch? conflicts and tons of old commits in there.

@bardliao
Copy link
Collaborator Author

@bardliao you must have pushed the wrong branch? conflicts and tons of old commits in there.

Sorry about it. I just pushed again. I added fix: tag in the commit header only.

@plbossart
Copy link
Member

@bardliao you must have pushed the wrong branch? conflicts and tons of old commits in there.

Sorry about it. I just pushed again. I added fix: tag in the commit header only.

something's wrong on my side, the commit looks the same?

@bardliao
Copy link
Collaborator Author

something's wrong on my side, the commit looks the same?

@plbossart I didn't change the commit. I just added a fix: tag on the commit header.
ASoC: core: fix: delete component->card_list in soc_remove_component only

@ranj063
Copy link
Collaborator

ranj063 commented Sep 12, 2019

ASoC: core: fix: delete component->card_list in soc_remove_component only

@bardliao what Pierre meant was to add Fixes: of the commit that introduced this issue.

We add component->card_list in the end of soc_probe_component(). In
other words, component->card_list will not be added if there is an
error in the soc_probe_component() function. So we can't delete
component->card_list in the error handling of soc_probe_component().

Fixes: 22d1423 ("ASoC: soc-core: add soc_cleanup_component()")
Signed-off-by: Bard Liao <[email protected]>
@bardliao
Copy link
Collaborator Author

ASoC: core: fix: delete component->card_list in soc_remove_component only

@bardliao what Pierre meant was to add Fixes: of the commit that introduced this issue.

Thanks @ranj063 updated.

Copy link
Member

@plbossart plbossart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.
@bardliao can you send to alsa-devel, the merge window opened and this needs to go as a fix to 5.5 and added to 5.4 stable. please Cc: Morimoto-san

@plbossart
Copy link
Member

merged upstream w/ follow-up patches, will be handled with the usual weekly upstream merge.
Thanks @bardliao

@plbossart plbossart closed this Sep 18, 2019
cujomalainey pushed a commit to cujomalainey/linux that referenced this pull request Oct 26, 2020
[ Upstream commit a47bd78 ]

Dave hit this splat during testing btrfs/078:

  ======================================================
  WARNING: possible circular locking dependency detected
  5.8.0-rc6-default+ thesofproject#1191 Not tainted
  ------------------------------------------------------
  kswapd0/75 is trying to acquire lock:
  ffffa040e9d04ff8 (&delayed_node->mutex){+.+.}-{3:3}, at: __btrfs_release_delayed_node.part.0+0x3f/0x310 [btrfs]

  but task is already holding lock:
  ffffffff8b0c8040 (fs_reclaim){+.+.}-{0:0}, at: __fs_reclaim_acquire+0x5/0x30

  which lock already depends on the new lock.

  the existing dependency chain (in reverse order) is:

  -> #2 (fs_reclaim){+.+.}-{0:0}:
	 __lock_acquire+0x56f/0xaa0
	 lock_acquire+0xa3/0x440
	 fs_reclaim_acquire.part.0+0x25/0x30
	 __kmalloc_track_caller+0x49/0x330
	 kstrdup+0x2e/0x60
	 __kernfs_new_node.constprop.0+0x44/0x250
	 kernfs_new_node+0x25/0x50
	 kernfs_create_link+0x34/0xa0
	 sysfs_do_create_link_sd+0x5e/0xd0
	 btrfs_sysfs_add_devices_dir+0x65/0x100 [btrfs]
	 btrfs_init_new_device+0x44c/0x12b0 [btrfs]
	 btrfs_ioctl+0xc3c/0x25c0 [btrfs]
	 ksys_ioctl+0x68/0xa0
	 __x64_sys_ioctl+0x16/0x20
	 do_syscall_64+0x50/0xe0
	 entry_SYSCALL_64_after_hwframe+0x44/0xa9

  -> #1 (&fs_info->chunk_mutex){+.+.}-{3:3}:
	 __lock_acquire+0x56f/0xaa0
	 lock_acquire+0xa3/0x440
	 __mutex_lock+0xa0/0xaf0
	 btrfs_chunk_alloc+0x137/0x3e0 [btrfs]
	 find_free_extent+0xb44/0xfb0 [btrfs]
	 btrfs_reserve_extent+0x9b/0x180 [btrfs]
	 btrfs_alloc_tree_block+0xc1/0x350 [btrfs]
	 alloc_tree_block_no_bg_flush+0x4a/0x60 [btrfs]
	 __btrfs_cow_block+0x143/0x7a0 [btrfs]
	 btrfs_cow_block+0x15f/0x310 [btrfs]
	 push_leaf_right+0x150/0x240 [btrfs]
	 split_leaf+0x3cd/0x6d0 [btrfs]
	 btrfs_search_slot+0xd14/0xf70 [btrfs]
	 btrfs_insert_empty_items+0x64/0xc0 [btrfs]
	 __btrfs_commit_inode_delayed_items+0xb2/0x840 [btrfs]
	 btrfs_async_run_delayed_root+0x10e/0x1d0 [btrfs]
	 btrfs_work_helper+0x2f9/0x650 [btrfs]
	 process_one_work+0x22c/0x600
	 worker_thread+0x50/0x3b0
	 kthread+0x137/0x150
	 ret_from_fork+0x1f/0x30

  -> #0 (&delayed_node->mutex){+.+.}-{3:3}:
	 check_prev_add+0x98/0xa20
	 validate_chain+0xa8c/0x2a00
	 __lock_acquire+0x56f/0xaa0
	 lock_acquire+0xa3/0x440
	 __mutex_lock+0xa0/0xaf0
	 __btrfs_release_delayed_node.part.0+0x3f/0x310 [btrfs]
	 btrfs_evict_inode+0x3bf/0x560 [btrfs]
	 evict+0xd6/0x1c0
	 dispose_list+0x48/0x70
	 prune_icache_sb+0x54/0x80
	 super_cache_scan+0x121/0x1a0
	 do_shrink_slab+0x175/0x420
	 shrink_slab+0xb1/0x2e0
	 shrink_node+0x192/0x600
	 balance_pgdat+0x31f/0x750
	 kswapd+0x206/0x510
	 kthread+0x137/0x150
	 ret_from_fork+0x1f/0x30

  other info that might help us debug this:

  Chain exists of:
    &delayed_node->mutex --> &fs_info->chunk_mutex --> fs_reclaim

   Possible unsafe locking scenario:

	 CPU0                    CPU1
	 ----                    ----
    lock(fs_reclaim);
				 lock(&fs_info->chunk_mutex);
				 lock(fs_reclaim);
    lock(&delayed_node->mutex);

   *** DEADLOCK ***

  3 locks held by kswapd0/75:
   #0: ffffffff8b0c8040 (fs_reclaim){+.+.}-{0:0}, at: __fs_reclaim_acquire+0x5/0x30
   #1: ffffffff8b0b50b8 (shrinker_rwsem){++++}-{3:3}, at: shrink_slab+0x54/0x2e0
   #2: ffffa040e057c0e8 (&type->s_umount_key#26){++++}-{3:3}, at: trylock_super+0x16/0x50

  stack backtrace:
  CPU: 2 PID: 75 Comm: kswapd0 Not tainted 5.8.0-rc6-default+ thesofproject#1191
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba527-rebuilt.opensuse.org 04/01/2014
  Call Trace:
   dump_stack+0x78/0xa0
   check_noncircular+0x16f/0x190
   check_prev_add+0x98/0xa20
   validate_chain+0xa8c/0x2a00
   __lock_acquire+0x56f/0xaa0
   lock_acquire+0xa3/0x440
   ? __btrfs_release_delayed_node.part.0+0x3f/0x310 [btrfs]
   __mutex_lock+0xa0/0xaf0
   ? __btrfs_release_delayed_node.part.0+0x3f/0x310 [btrfs]
   ? __lock_acquire+0x56f/0xaa0
   ? __btrfs_release_delayed_node.part.0+0x3f/0x310 [btrfs]
   ? lock_acquire+0xa3/0x440
   ? btrfs_evict_inode+0x138/0x560 [btrfs]
   ? btrfs_evict_inode+0x2fe/0x560 [btrfs]
   ? __btrfs_release_delayed_node.part.0+0x3f/0x310 [btrfs]
   __btrfs_release_delayed_node.part.0+0x3f/0x310 [btrfs]
   btrfs_evict_inode+0x3bf/0x560 [btrfs]
   evict+0xd6/0x1c0
   dispose_list+0x48/0x70
   prune_icache_sb+0x54/0x80
   super_cache_scan+0x121/0x1a0
   do_shrink_slab+0x175/0x420
   shrink_slab+0xb1/0x2e0
   shrink_node+0x192/0x600
   balance_pgdat+0x31f/0x750
   kswapd+0x206/0x510
   ? _raw_spin_unlock_irqrestore+0x3e/0x50
   ? finish_wait+0x90/0x90
   ? balance_pgdat+0x750/0x750
   kthread+0x137/0x150
   ? kthread_stop+0x2a0/0x2a0
   ret_from_fork+0x1f/0x30

This is because we're holding the chunk_mutex while adding this device
and adding its sysfs entries.  We actually hold different locks in
different places when calling this function, the dev_replace semaphore
for instance in dev replace, so instead of moving this call around
simply wrap it's operations in NOFS.

CC: [email protected] # 4.14+
Reported-by: David Sterba <[email protected]>
Signed-off-by: Josef Bacik <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants