io_uring buffer performance #692
-
This isn't so much io_uring related, so forgive me if I'm talking in the wrong place. But it sort of is related, and I feel like the expertise/experience of people here may come in handy. I have a sample single threaded echo server application that:
I've noticed that the application really likes order. By which I mean the buffers being used for recvs and sends Though I don't believe I'm misusing io_uring, but both IORING_OP_POLL_ADD and epoll_wait are not affected by the position of the buffers in memory and consistently perform around the same number (237K for uring and 236K for epoll). As far as I know, recv/send memory copy are done in task work in the userspace so I can't see why recv/send in kernel space would be worse. My closest guess is that in user space, the send is right after the recv, but in uring the recv happens some time after the send. I've considered using PROVIDE_BUFFERS to eliminate the randomness of the buffer addresses but I would prefer not to have extra overhead if possible. I've also set thread affinity every time and it actually makes performance worse, but more consistent. So far my hypothesis seems to be true. Performing all the sends in a separate loop from the recvs brings epoll performance down noticeably to 211K and increases the variability in performance. I'm wondering if anyone has any insight related to this. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 12 replies
-
If you are able to use the newer ring provided buffers rather than the older way of providing buffers, you can use provided buffers without really incurring any extra overhead. |
Beta Was this translation helpful? Give feedback.
-
Ok I tried using different buffers per connection for epoll and that seemed to bring down the performance to a point where it gets consistently outperformed by io_uring. I think my issue was at significantly higher number of connections (6144) the overhead of having that many sockets is the main bottleneck instead of the recv/send buffer addresses |
Beta Was this translation helpful? Give feedback.
Ok I tried using different buffers per connection for epoll and that seemed to bring down the performance to a point where it gets consistently outperformed by io_uring. I think my issue was at significantly higher number of connections (6144) the overhead of having that many sockets is the main bottleneck instead of the recv/send buffer addresses