This repository has been archived by the owner on Jun 18, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 29
memory leak in scx_ops_enable
#142
Comments
Do we need to be calling |
Hmm, and it looks like we're not actually calling kfree() on the disable path. Let me set up the reproducer and write up a patch. |
Byte-Lab
added a commit
that referenced
this issue
Feb 14, 2024
We're making two mistakes with respect to object lifecycle for scx_root_kobj: - We're not calling kobject_put() if kobject_init_and_add() returns an error code. According to the API for that function, we should be. - We weren't kfree()'ing scx_root_kobj on the scx_ops_disable_workfn() path. Address both of these. With these changes, we no longer trigger the memory leak detected by Syzkaller in #142. Without this patch, we see the following with memleak enabled: unreferenced object 0xffff8dd78a99f000 (size 64): comm "runner", pid 2383, jiffies 4294697813 (age 128.878s) hex dump (first 32 bytes): 78 17 5f be ff ff ff ff 08 f0 99 8a d7 8d ff ff x._............. 08 f0 99 8a d7 8d ff ff 00 00 00 00 00 00 00 00 ................ backtrace: [<0000000034bda486>] __kmem_cache_alloc_node+0x109/0x1f0 [<00000000a91612ee>] kmalloc_trace+0x29/0xa0 [<00000000bfe2a4b7>] bpf_scx_reg+0x135/0xd80 [<000000000e3a5728>] bpf_struct_ops_link_create+0xd7/0x130 [<00000000b14e1608>] __sys_bpf+0x2a9/0x530 [<000000003140300e>] __x64_sys_bpf+0x17/0x20 [<00000000fa410384>] do_syscall_64+0x4d/0xf0 [<000000008cbc0d51>] entry_SYSCALL_64_after_hwframe+0x6f/0x77 With it, we see no such entries. Signed-off-by: David Vernet <[email protected]>
Byte-Lab
added a commit
that referenced
this issue
Feb 14, 2024
We're making two mistakes with respect to object lifecycle for scx_root_kobj: - We're not calling kobject_put() if kobject_init_and_add() returns an error code. According to the API for that function, we should be. - We weren't kobject_putt()'ing scx_root_kobj on the scx_ops_disable_workfn() path. We were calling kobject_del(), but this doesn't actually call the release callback under the hood. It just unlinks the kobject from the hierarchy. Address both of these. With these changes, we no longer trigger the memory leak detected by Syzkaller in #142. Without this patch, we see the following with memleak enabled: unreferenced object 0xffff8dd78a99f000 (size 64): comm "runner", pid 2383, jiffies 4294697813 (age 128.878s) hex dump (first 32 bytes): 78 17 5f be ff ff ff ff 08 f0 99 8a d7 8d ff ff x._............. 08 f0 99 8a d7 8d ff ff 00 00 00 00 00 00 00 00 ................ backtrace: [<0000000034bda486>] __kmem_cache_alloc_node+0x109/0x1f0 [<00000000a91612ee>] kmalloc_trace+0x29/0xa0 [<00000000bfe2a4b7>] bpf_scx_reg+0x135/0xd80 [<000000000e3a5728>] bpf_struct_ops_link_create+0xd7/0x130 [<00000000b14e1608>] __sys_bpf+0x2a9/0x530 [<000000003140300e>] __x64_sys_bpf+0x17/0x20 [<00000000fa410384>] do_syscall_64+0x4d/0xf0 [<000000008cbc0d51>] entry_SYSCALL_64_after_hwframe+0x6f/0x77 With it, we see no such entries. Fixes: e7a7781 ("scx: Add /sys/kernel/sched_ext interface") Signed-off-by: David Vernet <[email protected]>
Byte-Lab
added a commit
that referenced
this issue
Feb 14, 2024
We're making two mistakes with respect to object lifecycle for scx_root_kobj: - We're not calling kobject_put() if kobject_init_and_add() returns an error code. According to the API for that function, we should be. - We weren't kobject_putt()'ing scx_root_kobj on the scx_ops_disable_workfn() path. We were calling kobject_del(), but this doesn't actually call the release callback under the hood. It just unlinks the kobject from the hierarchy. Address both of these. With these changes, we no longer trigger the memory leak detected by Syzkaller in #142. Without this patch, we see the following with memleak enabled: unreferenced object 0xffff8dd78a99f000 (size 64): comm "runner", pid 2383, jiffies 4294697813 (age 128.878s) hex dump (first 32 bytes): 78 17 5f be ff ff ff ff 08 f0 99 8a d7 8d ff ff x._............. 08 f0 99 8a d7 8d ff ff 00 00 00 00 00 00 00 00 ................ backtrace: [<0000000034bda486>] __kmem_cache_alloc_node+0x109/0x1f0 [<00000000a91612ee>] kmalloc_trace+0x29/0xa0 [<00000000bfe2a4b7>] bpf_scx_reg+0x135/0xd80 [<000000000e3a5728>] bpf_struct_ops_link_create+0xd7/0x130 [<00000000b14e1608>] __sys_bpf+0x2a9/0x530 [<000000003140300e>] __x64_sys_bpf+0x17/0x20 [<00000000fa410384>] do_syscall_64+0x4d/0xf0 [<000000008cbc0d51>] entry_SYSCALL_64_after_hwframe+0x6f/0x77 With it, we see no such entries. Fixes: e7a7781 ("scx: Add /sys/kernel/sched_ext interface") Signed-off-by: David Vernet <[email protected]>
Byte-Lab
added a commit
that referenced
this issue
Feb 14, 2024
We're making two mistakes with respect to object lifecycle for scx_root_kobj: - We're not calling kobject_put() if kobject_init_and_add() returns an error code. According to the API for that function, we should be. - We weren't kobject_putt()'ing scx_root_kobj on the scx_ops_disable_workfn() path. We were calling kobject_del(), but this doesn't actually call the release callback under the hood. It just unlinks the kobject from the hierarchy. Address both of these. With these changes, we no longer trigger the memory leak detected by Syzkaller in #142. Without this patch, we see the following with memleak enabled: unreferenced object 0xffff8dd78a99f000 (size 64): comm "runner", pid 2383, jiffies 4294697813 (age 128.878s) hex dump (first 32 bytes): 78 17 5f be ff ff ff ff 08 f0 99 8a d7 8d ff ff x._............. 08 f0 99 8a d7 8d ff ff 00 00 00 00 00 00 00 00 ................ backtrace: [<0000000034bda486>] __kmem_cache_alloc_node+0x109/0x1f0 [<00000000a91612ee>] kmalloc_trace+0x29/0xa0 [<00000000bfe2a4b7>] bpf_scx_reg+0x135/0xd80 [<000000000e3a5728>] bpf_struct_ops_link_create+0xd7/0x130 [<00000000b14e1608>] __sys_bpf+0x2a9/0x530 [<000000003140300e>] __x64_sys_bpf+0x17/0x20 [<00000000fa410384>] do_syscall_64+0x4d/0xf0 [<000000008cbc0d51>] entry_SYSCALL_64_after_hwframe+0x6f/0x77 With it, we see no such entries. Fixes: e7a7781 ("scx: Add /sys/kernel/sched_ext interface") Signed-off-by: David Vernet <[email protected]>
Thanks a lot for the report and for finding this bug. This should be addressed with #143. Could you please let us know if you still observe this issue? |
htejun
pushed a commit
that referenced
this issue
Feb 17, 2024
Recently, when running './test_progs -j', I occasionally hit the following errors: test_lwt_redirect:PASS:pthread_create 0 nsec test_lwt_redirect_run:FAIL:netns_create unexpected error: 256 (errno 0) #142/2 lwt_redirect/lwt_redirect_normal_nomac:FAIL #142 lwt_redirect:FAIL test_lwt_reroute:PASS:pthread_create 0 nsec test_lwt_reroute_run:FAIL:netns_create unexpected error: 256 (errno 0) test_lwt_reroute:PASS:pthread_join 0 nsec #143/2 lwt_reroute/lwt_reroute_qdisc_dropped:FAIL #143 lwt_reroute:FAIL The netns_create() definition looks like below: #define NETNS "ns_lwt" static inline int netns_create(void) { return system("ip netns add " NETNS); } One possibility is that both lwt_redirect and lwt_reroute create netns with the same name "ns_lwt" which may cause conflict. I tried the following example: $ sudo ip netns add abc $ echo $? 0 $ sudo ip netns add abc Cannot create namespace file "/var/run/netns/abc": File exists $ echo $? 1 $ The return code for above netns_create() is 256. The internet search suggests that the return value for 'ip netns add ns_lwt' is 1, which matches the above 'sudo ip netns add abc' example. This patch tried to use different netns names for two tests to avoid 'ip netns add <name>' failure. I ran './test_progs -j' 10 times and all succeeded with lwt_redirect/lwt_reroute tests. Signed-off-by: Yonghong Song <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]> Tested-by: Eduard Zingerman <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Syzkaller found a memory leak in
scx_ops_enable
. Looking at the code, it seems that the allocation is being allocated and freed properly so we're not exactly sure what causes the unreferenced object, but this issue has popped up quite frequently from Syzkaller that we thought you might want to take a look.Commit version: 65c2651
Config file:
config.txt
Bug report:
A reproducer was found for this problem:
The text was updated successfully, but these errors were encountered: