-
Notifications
You must be signed in to change notification settings - Fork 415
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
native_queued_spin_lock_slowpath contention even after #215 fix #218
Comments
Is that specific to #215? My 8 CPU laptop doesn't show anything Could you find an exact call-chain leading to that spinlock? |
I noticed now it happens sporadically, sometimes on the client side and sometimes on the server side |
That helps, thanks! The culprit is There was a bug fix that, I guess, made it worse. Could diff --git a/fs/io_uring.c b/fs/io_uring.c
index eed0d068904c..fa108a7b4fd4 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -5470,8 +5470,6 @@ static int io_req_defer_prep(struct io_kiocb *req,
if (unlikely(ret))
return ret;
- io_prep_async_work(req);
-
switch (req->opcode) {
case IORING_OP_NOP:
break; |
@axboe, btw any chance you remember why {SEND,RCV}MSG need fs? |
Only for SCM_RIGHTS iirc, which is currently disabled explicitly. |
You could probably also just try unshare(CLONE_FS) in the thread and see if that makes it better. |
Yes, both fixes seem to mitigate the issue. |
Not sure I see it somewhere disabled on the io_uring side, but it |
@romange, thanks for testing! I'll see how to do it similarish |
It's done in __sys_sendmsg_sock():
So yes, it can be removed in io_uring, just needs a comment on why. |
And ditto __sys_recvmsg_sock(), of course, for the receive side. |
@axboe, got it, thanks! It seems this would be |
Hmm, I can't find it used anywhere near SCM_RIGHTS or net. It seems, need_fs there is just for safety now. |
@axboe Thanks for pushing this forward. I've seen that you submitted part of the fixes into 5.10 branch and some into 5.11. Could you please tell if 5.10 will reduce the contention or dependence on 5.11 is necessary? |
There's really two things here, Pavel is covering the second one and that will be fine in 5.10. For the TWA_SIGNAL based contention, it'll land in 5.11, but I plan on pushing it for -stable as well once it lands in 5.11-rc. |
Thanks!
…On Sun, Nov 1, 2020 at 8:57 PM Jens Axboe ***@***.***> wrote:
There's really two things here, Pavel is covering the second one and that
will be fine in 5.10. For the TWA_SIGNAL based contention, it'll land in
5.11, but I plan on pushing it for -stable as well once it lands in 5.11-rc.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#218 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AA4BFCEAVTDWPMNZAIIPCTDSNWVRRANCNFSM4SBSC5HQ>
.
--
Best regards,
Roman
|
@romange, the previous path is gone, but that may happen if requests go through io-wq or there is another spinlock contended. Could you again locate the full stack trace? e.g. via |
p.s. IIRC there is nothing in for-5.11 yet that would help with this particular contention. |
I do not succeed to find something meaningful. dwarf option throws lots of errors like I will keep checking. I want to add that I did build kernel from of Jens branches, I think |
@isilence Pavel, I recorded the profile data with you can see all the callchains with |
commit 2e98fb75d85c2c3b0ccb690ff138395d329a0698
Author: Pavel Begunkov <[email protected]>
Date: Fri Oct 2 22:36:54 2020 +0300
io_uring: no need in fs for {recv,send}msg
SENDMSG and RECVMSG don't actually need ->fs, it's used only for
SCM_RIGHTS, which is disallowed by __sys_{send,revb}msg_sock().
Remove ->needs_fs for them because taking fs->lock may became pretty
contended if a lot of requests are going through io-wq (or just being
async prepared).
Signed-off-by: Pavel Begunkov <[email protected]>
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 80be2184f4a5..57709a268eff 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -849,8 +849,7 @@ static const struct io_op_def io_op_defs[] = {
.pollout = 1,
.needs_async_data = 1,
.async_size = sizeof(struct io_async_msghdr),
- .work_flags = IO_WQ_WORK_MM | IO_WQ_WORK_BLKCG |
- IO_WQ_WORK_FS,
+ .work_flags = IO_WQ_WORK_MM | IO_WQ_WORK_BLKCG,
},
[IORING_OP_RECVMSG] = {
.needs_file = 1,
@@ -859,8 +858,7 @@ static const struct io_op_def io_op_defs[] = {
.buffer_select = 1,
.needs_async_data = 1,
.async_size = sizeof(struct io_async_msghdr),
- .work_flags = IO_WQ_WORK_MM | IO_WQ_WORK_BLKCG |
- IO_WQ_WORK_FS,
+ .work_flags = IO_WQ_WORK_MM | IO_WQ_WORK_BLKCG,
},
[IORING_OP_TIMEOUT] = {
.needs_async_data = 1,
|
I don't know at glance what's the issue, but I'd update libdwarf and build a new perf |
In generally perf.data are not very portable, even if my perf wouldn't say that the format is incompatible, there is also kernel map that's usually needed. |
Ok, would it help if I attach here |
That would certainy be better - remember to use -g --no-children for perf report, that makes it easier to see. Pavel seems convinced that this is something new, but did you run tif-task_work.arch and see any difference? The task_work with TWA_SIGNAL fix for threaded application would exactly show this kind of contention, that's basically where we started with #215. But the perf report would be crucial in seeing what's going on here, and if it's the signal lock or something else. |
By "new" I mean that it's different from what we've seen before in this issue (i.e. spinlocking on fs). It's probably tif-task_work as you pointed that it was struggling in |
I am currently struggling to build a bootable linux from source, I am checking out
|
"for-next" branch https://git.kernel.dk/cgit/linux-block/log/?h=for-next does not have |
OK, I think that settles that the newly reported case is indeed the #215 case. The contention you're seeing now is off the socket, and that's a generic thing that would apply using epoll etc as well. There might be room for improvement there, but that's outside the scope of io_uring. |
Oh, I just noticed now - I've commented on the wrong issue. I meant to comment yesterday on #215 and not on this one. sorry for the confusion I caused. To conclude, #218 is fixed in 5.10 and #215 is waiting for 5.11. And once 5.11 release window opens, Jens will try to cherry-pick the fix for #215 into 5.10. |
Closing this one out. |
A bit different scenario:
When using
IOSQE_IO_LINK
for chaining SQEs there is a contention bottleneck withnative_queued_spin_lock_slowpath (see below).
I run it on the patched kernel from
tif-task_work
branch.To reproduce - I used my own echo server: https://github.com/romange/async/blob/master/examples/echo_server.cc
Binary (compiled on ubuntu 20.04): https://drive.google.com/file/d/1hgimzG2Y9Olf1cVVUGyssF2pgx1QiBJ3/view?usp=sharing
To run on server side: ./echo_server --logtostderr
To run on client side: ./echo_server --connect -n 100000 -c 5
67.01% [kernel] [k] native_queued_spin_lock_slowpath
3.35% [kernel] [k] _raw_spin_lock
1.88% [kernel] [k] io_prep_async_work
1.01% [kernel] [k] io_dismantle_req
0.79% [ena] [k] ena_io_poll
0.61% [kernel] [k] fput_many
0.47% [kernel] [k] __io_free_req_finish
0.43% [kernel] [k] _copy_from_user
0.42% [ena] [k] ena_start_xmit
0.37% [kernel] [k] _raw_spin_lock_irq
0.36% [kernel] [k] irq_entries_start
0.30% [kernel] [k] copy_user_enhanced_fast_string
0.30% [kernel] [k] skb_release_data
PerfTop: 123479 irqs/sec kernel:91.4% exact: 0.0% lost: 0/0 drop: 0/0 [4000Hz cycles], (all, 36 CPUs)
The text was updated successfully, but these errors were encountered: