Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling non-blocking file descriptors consistently. #364

Closed
ioquatix opened this issue Jun 16, 2021 · 107 comments
Closed

Handling non-blocking file descriptors consistently. #364

ioquatix opened this issue Jun 16, 2021 · 107 comments

Comments

@ioquatix
Copy link
Contributor

This is more of a discussion than an actual issue at the moment, but I hope to provide more concrete details as we explore the use cases.

We have an interesting problem regarding IORING_OP_READV/IORING_OP_WRITEV. In a general programming language environment, we don't always know if the file descriptor is in non-blocking mode or not.

But the behaviour of IORING_OP_READV/IORING_OP_WRITEV depends on it. In non-blocking mode, we can get a result of EAGAIN.

However, this is not very useful in the case of even driven concurrency where we want operations like this to be "asynchronous" "blocking" - i.e. the CQE should not be generated until the read/write could make progress.

In general, my understanding is, we'd need 3-4 system calls to work around this, something like this:

int current_state = get_nonblocking_state(fd);
if (current_state != nonblocking) set_nonblocking_state(fd);
io_uring_read(..., fd, ...)
if (nonblocking_state_was_changed) restore_nonblocking_state(fd, current_state);

We can cache this to reduce the round trips, but it feels like this would be something better solved by a flag on IORING_OP_READV/IORING_OP_WRITEV. In fact, I'm having a hard time imagining any valid use case for non-blocking operation in the uring.

I'll try to test out specific cases where this is a problem and report back.

@YoSTEALTH
Copy link
Contributor

Whole point of io_uring is its asynchronous by design! If there is a blocking call it either handles in async or opens a new kernel thread for that request.

By using O_NONBLOCK what you are saying is, let me choose what to do next when blocking happens vs letting io_uring deal with it.

There already is an flags option.

fd = open('path/to/file', O_RDONLY)  # normal blocking read only open.
...
io_uring_prep_readv(sqe, fd, iovecs, nr_vecs, offset)
sqe.rw_flags |= RWF_NOWAIT  # now read is non-blocking
...
# catch `-EGAIN` in `cqe.res`, ...

@ioquatix
Copy link
Contributor Author

Is there a flag for the opposite of RWF_NOWAIT?

@YoSTEALTH
Copy link
Contributor

Not that I know of. If you want a blocking read just use normal read() don't use io_uring_read*

@ioquatix
Copy link
Contributor Author

We have file descriptors that might already have O_NONBLOCK set. But we still want to use them with io_uring without getting EAGAIN.

@YoSTEALTH
Copy link
Contributor

Hmm that's a tricky situation. Monkey patching O_NONBLOCK flag to not be non-blocking might lead to other problems/bugs.

If you have access to when the original flag is being set

original_flags = O_CREAT | O_RDWR | O_NONBLOCK

You could add a check like

if original_flags & O_NONBLOCK:
    # raise error saying `O_NONBLOCK` is not supported, invalid or something!

I am not really proud of that solution.

Why not just catch -EGAIN and poll it? io_uring_prep_poll_add(sqe, fd, POLLIN)

@ioquatix
Copy link
Contributor Author

Yes it’s one option but we wanted to use the direct read/write operations if possible.

@YoSTEALTH
Copy link
Contributor

YoSTEALTH commented Jun 16, 2021

suppose you can also remove the flag by:

original_flags = O_CREAT | O_RDWR | O_NONBLOCK
new_flags = original_flags & ~O_NONBLOCK

@axboe
Copy link
Owner

axboe commented Jun 16, 2021

It would be a pretty trivial change to arm poll rather than return -EAGAIN to the application, if O_NONBLOCK is set on the file. The main issue here is how to inform io_uring of that desired behavior. OP_READV/OP_WRITEV flags are really readv2/writev2 flags, so it'd then be a generic change to allow this. Which arguably would make sense, as the justification would be that you can do blocking IO without having the query/set new file flags.

This applies to more than OP_READV/OP_WRITEV, though. It also applies to SEND/RECV and friends, so it's actually more of a generic change. Which means that an IOSQE_ flag would make more sense. The only potential issue there is that the flag space is pretty small, we have 8 bits and 6 are already used. Not a huge issue, we have two bits left, and the last bit could always be turned into a "look at extended bits" or something like that.

In any case, let me know if you want to test a patch for this. Should be easy to cook up.

@ioquatix
Copy link
Contributor Author

I am happy to test and help with this.

@ioquatix
Copy link
Contributor Author

ioquatix commented Jun 16, 2021

Just thinking through what you described, do you think it would make sense for us to propose a flag to readv2/writev2 to be more specific about this?

That way, this doesn't become an io_uring specific feature, but works at the kernel level in a generic way.

e.g.

       RWF_NONBLOCK
              Perform the operation as if the descriptor was in `O_NONBLOCK` mode.
       RWF_BLOCK
              Perform the operation as if the descriptor was not in `O_NONBLOCK` mode.

Maybe it's crazy proposal, but that seems logical to me, at least from my POV.

@axboe
Copy link
Owner

axboe commented Jun 16, 2021

As mentioned, that only then works for the read/write variants, not for any other opcode. Hence I'd be leaning towards making it an sqe flag instead. If it was just applicable to OP_READV and friends, then the RWF approach would be preferable.

@ioquatix
Copy link
Contributor Author

Do other opcodes also take flags? e.g. accept, open, close, etc.

@axboe
Copy link
Owner

axboe commented Jun 16, 2021

Each opcode has a specific set of flags, like the RWF_* for read/write, but there's also a generic set of IOSQE_* flags that apply to any opcode. That handles things like links, etc. Things that sit above the specific request type.

@ioquatix
Copy link
Contributor Author

Oh that makes total sense. Okay, I'm keen to test this when you are ready.

cc @ciconia

@axboe
Copy link
Owner

axboe commented Jun 17, 2021 via email

@YoSTEALTH
Copy link
Contributor

It would be a pretty trivial change to arm poll rather than return -EAGAIN to the application, if O_NONBLOCK is set on the file.

@axboe O.o something like IOSQE_IO_BLOCKING_POLL*? If you can make this into an feature I would totally welcome it as well. Currently I have lines of code just checking -EAGAIN and polling it to account for both blocking/non-blocking flags. This feature would save many unnecessary round-trips.

Rather then just account for EAGAIN it would be nice if it accounted for "BLOCKING" states like EAGAIN, EALREADY, EINPROGRESS, EWOULDBLOCK, ...

@ioquatix
Copy link
Contributor Author

I agree, the vast majority of user space code doesn't care about EAGAIN, it's like the half way house of concurrency.

@noteflakes
Copy link

That's a great idea, adding such a flag will be very useful! Keep in mind though that if you want to support older Linux kernel versions (once this feature lands in a kernel release), you'll still have to provide some workaround, either resetting O_NONBLOCK or handling EAGAIN.

@axboe
Copy link
Owner

axboe commented Jun 17, 2021

Probably something like IOSQE_IGNORE_NONBLOCK or whatever. It'll just be a way to force the normal poll path for pollable IO even if the fd is marked non-blocking.

@axboe
Copy link
Owner

axboe commented Jun 17, 2021

Here's a first attempt. Let me know if you'd prefer a patch somewhere, not sure what this does to formatting... If it destroys whitespace, probably just use patch -l to ignore that part.

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 46a25a7cb70a..25f72f35902d 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -108,7 +108,7 @@
 
 #define SQE_VALID_FLAGS	(IOSQE_FIXED_FILE|IOSQE_IO_DRAIN|IOSQE_IO_LINK|	\
 				IOSQE_IO_HARDLINK | IOSQE_ASYNC | \
-				IOSQE_BUFFER_SELECT)
+				IOSQE_BUFFER_SELECT | IOSQE_IGNORE_NONBLOCK)
 
 #define IO_TCTX_REFS_CACHE_NR	(1U << 10)
 
@@ -704,6 +704,7 @@ enum {
 	REQ_F_HARDLINK_BIT	= IOSQE_IO_HARDLINK_BIT,
 	REQ_F_FORCE_ASYNC_BIT	= IOSQE_ASYNC_BIT,
 	REQ_F_BUFFER_SELECT_BIT	= IOSQE_BUFFER_SELECT_BIT,
+	REQ_F_IGNORE_NONBLOCK_BIT	= IOSQE_IGNORE_NONBLOCK_BIT,
 
 	/* first byte is taken by user flags, shift it to not overlap */
 	REQ_F_FAIL_BIT		= 8,
@@ -740,6 +741,8 @@ enum {
 	REQ_F_FORCE_ASYNC	= BIT(REQ_F_FORCE_ASYNC_BIT),
 	/* IOSQE_BUFFER_SELECT */
 	REQ_F_BUFFER_SELECT	= BIT(REQ_F_BUFFER_SELECT_BIT),
+	/* IOSQE_IGNORE_NONBLOCK */
+	REQ_F_IGNORE_NONBLOCK	= BIT(REQ_F_IGNORE_NONBLOCK_BIT),
 
 	/* fail rest of links */
 	REQ_F_FAIL		= BIT(REQ_F_FAIL_BIT),
@@ -2444,6 +2447,15 @@ static bool io_resubmit_prep(struct io_kiocb *req)
 	return true;
 }
 
+/*
+ * Honor REQ_F_NOWAIT only if REQ_F_IGNORE_NONBLOCK isn't set
+ */
+static bool io_req_nowait(struct io_kiocb *req)
+{
+	return (req->flags & (REQ_F_NOWAIT | REQ_F_IGNORE_NONBLOCK)) ==
+		REQ_F_NOWAIT;
+}
+
 static bool io_rw_should_reissue(struct io_kiocb *req)
 {
 	umode_t mode = file_inode(req->file)->i_mode;
@@ -2451,7 +2463,7 @@ static bool io_rw_should_reissue(struct io_kiocb *req)
 
 	if (!S_ISBLK(mode) && !S_ISREG(mode))
 		return false;
-	if ((req->flags & REQ_F_NOWAIT) || (io_wq_current_is_worker() &&
+	if (io_req_nowait(req) || (io_wq_current_is_worker() &&
 	    !(ctx->flags & IORING_SETUP_IOPOLL)))
 		return false;
 	/*
@@ -3223,7 +3235,7 @@ static bool io_rw_should_retry(struct io_kiocb *req)
 	struct kiocb *kiocb = &req->rw.kiocb;
 
 	/* never retry for NOWAIT, we just complete with -EAGAIN */
-	if (req->flags & REQ_F_NOWAIT)
+	if (io_req_nowait(req))
 		return false;
 
 	/* Only for buffered IO */
@@ -3303,7 +3315,7 @@ static int io_read(struct io_kiocb *req, unsigned int issue_flags)
 		if (!force_nonblock && !(req->ctx->flags & IORING_SETUP_IOPOLL))
 			goto done;
 		/* no retry on NONBLOCK nor RWF_NOWAIT */
-		if (req->flags & REQ_F_NOWAIT)
+		if (io_req_nowait(req))
 			goto done;
 		/* some cases will consume bytes even on error returns */
 		iov_iter_revert(iter, io_size - iov_iter_count(iter));
@@ -3311,7 +3323,7 @@ static int io_read(struct io_kiocb *req, unsigned int issue_flags)
 	} else if (ret == -EIOCBQUEUED) {
 		goto out_free;
 	} else if (ret <= 0 || ret == io_size || !force_nonblock ||
-		   (req->flags & REQ_F_NOWAIT) || !(req->flags & REQ_F_ISREG)) {
+		   io_req_nowait(req) || !(req->flags & REQ_F_ISREG)) {
 		/* read all, failed, already did sync or don't want to retry */
 		goto done;
 	}
@@ -3434,7 +3446,7 @@ static int io_write(struct io_kiocb *req, unsigned int issue_flags)
 	if (ret2 == -EOPNOTSUPP && (kiocb->ki_flags & IOCB_NOWAIT))
 		ret2 = -EAGAIN;
 	/* no retry on NONBLOCK nor RWF_NOWAIT */
-	if (ret2 == -EAGAIN && (req->flags & REQ_F_NOWAIT))
+	if (ret2 == -EAGAIN && io_req_nowait(req))
 		goto done;
 	if (!force_nonblock || ret2 != -EAGAIN) {
 		/* IOPOLL retry should happen for io-wq threads */
@@ -6455,7 +6467,7 @@ static void __io_queue_sqe(struct io_kiocb *req)
 		} else {
 			io_put_req(req);
 		}
-	} else if (ret == -EAGAIN && !(req->flags & REQ_F_NOWAIT)) {
+	} else if (ret == -EAGAIN && !io_req_nowait(req)) {
 		if (!io_arm_poll_handler(req)) {
 			/*
 			 * Queued up for async execution, worker will release
diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
index f1f9ac114b51..582eee6e898b 100644
--- a/include/uapi/linux/io_uring.h
+++ b/include/uapi/linux/io_uring.h
@@ -70,6 +70,7 @@ enum {
 	IOSQE_IO_HARDLINK_BIT,
 	IOSQE_ASYNC_BIT,
 	IOSQE_BUFFER_SELECT_BIT,
+	IOSQE_IGNORE_NONBLOCK_BIT,
 };
 
 /*
@@ -87,6 +88,8 @@ enum {
 #define IOSQE_ASYNC		(1U << IOSQE_ASYNC_BIT)
 /* select buffer from sqe->buf_group */
 #define IOSQE_BUFFER_SELECT	(1U << IOSQE_BUFFER_SELECT_BIT)
+/* ignore if file has O_NONBLOCK set */
+#define IOSQE_IGNORE_NONBLOCK	(1U << IOSQE_IGNORE_NONBLOCK_BIT)
 
 /*
  * io_uring_setup() flags

@ioquatix
Copy link
Contributor Author

I will try this out over the weekend.

@axboe
Copy link
Owner

axboe commented Jun 17, 2021

I will try this out over the weekend.

I'll run some quick testing on this later, I'll post an updated version here if needed.

@YoSTEALTH
Copy link
Contributor

YoSTEALTH commented Jun 18, 2021

That's neat,

Does it account for cases where O_NONBLOCK is not set but MSG_DONTWAIT flag is set directly into socket *recv?

Can this be in next Linux 5.13-rc* patch or its more of a 5.14 thing.

@ioquatix
Copy link
Contributor Author

I'm hoping we can transparently support this by using the poll fallback - i.e. assume it's going to work, but if we get EAGAIN, go to a fallback path.

@isilence
Copy link
Collaborator

Ok, seems I need to ask unpopular questions...

Disregarding generality for other io_uring requests, I have a problem with that design. We have a behaviour, blocking or not, and it is specified by rw flags (and also by file flags). So, it's blocking by default and RWF_NOWAIT overwrites the behaviour to non-blocking. This one will be overriding overridden behaviour, that doesn't sound good from the design perspective.
To give the idea about concern, reductio to absurdium here would be to also add IOSQE_IGNORE_IGNORE_NONBLOCK

So, the question here, does it add something that can't be currently achieved? (e.g. by clearing/not non-blocking flags). Finer blocking control or just yet another BLOCK/NONBLOCK flag doing the same thing? If there are inconsistencies in io_uring handling noblock (and I remember there were at some point), we need to figure out and document how it should be treated.

Also curious, assuming that the library takes fds from the user but doesn't take over the ownership, i.e. the user may use it in any other way it wish in parallel:

  1. How come that you force 'blocking' behaviour for userspace passing in non-blocking file, when it can be the desired outcome?
  2. Why the requirement of passing non-blocking files can't be forced on the user?

@isilence
Copy link
Collaborator

Can this be in next Linux 5.13-rc* patch or its more of a 5.14 thing.

@YoSTEALTH, 5.13 is feature closed and will be released pretty soon, so definitely not 5.13

@axboe
Copy link
Owner

axboe commented Jun 18, 2021

Pavel, outside of the patch being a quick'n dirty, the intent is only to ignore O_NONBLOCK if set. We should not be fiddling with RWF_NOWAIT or ditto send/recvmsg flags. Just for per-file state, not per request.

@YoSTEALTH
Copy link
Contributor

@axboe My initial thought was if cqe.res == -EAGAIN and IOSQE_new_flag: wait/poll for task to finish at kernel-space, thus preventing a round trip to user-space and back. This way no set flags are messed with only checks for cqe end of the data. Is that even close to whats happening?

@axboe
Copy link
Owner

axboe commented Jun 18, 2021

What you described, but only if O_NONBLOCK is set on the file. If you are setting request specific flags, like RWF_NOWAIT, then the -EAGAIN is returned as before. At that point you are specifically asking for non-block checking, the new sqe flag should only apply to state that you don't necessarily control (eg the file flags).

@YoSTEALTH
Copy link
Contributor

YoSTEALTH commented Jun 18, 2021

@axboe OK, thank you for clarifying. So it can only be used in *open* atm. Which is such a small use-case! wouldn't it have been better for developer to new_flags = original_flags & ~O_NONBLOCK way to remove the flag.

As much as I would like to see new feature added to io_uring I also wouldn't want to waste resources that could be better used!

ammarfaizi2 pushed a commit to ammarfaizi2/linux-block that referenced this issue Mar 11, 2023
commit c16bda3 upstream.

If we get woken spuriously when polling and fail the operation with
-EAGAIN again, then we generally only allow polling again if data
had been transferred at some point. This is indicated with
REQ_F_PARTIAL_IO. However, if the spurious poll triggers when the socket
was originally empty, then we haven't transferred data yet and we will
fail the poll re-arm. This either punts the socket to io-wq if it's
blocking, or it fails the request with -EAGAIN if not. Neither condition
is desirable, as the former will slow things down, while the latter
will make the application confused.

We want to ensure that a repeated poll trigger doesn't lead to infinite
work making no progress, that's what the REQ_F_PARTIAL_IO check was
for. But it doesn't protect against a loop post the first receive, and
it's unnecessarily strict if we started out with an empty socket.

Add a somewhat random retry count, just to put an upper limit on the
potential number of retries that will be done. This should be high enough
that we won't really hit it in practice, unless something needs to be
aborted anyway.

Cc: [email protected] # v5.10+
Link: axboe/liburing#364
Signed-off-by: Jens Axboe <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Whissi pushed a commit to Whissi/linux-stable that referenced this issue Mar 11, 2023
commit c16bda3 upstream.

If we get woken spuriously when polling and fail the operation with
-EAGAIN again, then we generally only allow polling again if data
had been transferred at some point. This is indicated with
REQ_F_PARTIAL_IO. However, if the spurious poll triggers when the socket
was originally empty, then we haven't transferred data yet and we will
fail the poll re-arm. This either punts the socket to io-wq if it's
blocking, or it fails the request with -EAGAIN if not. Neither condition
is desirable, as the former will slow things down, while the latter
will make the application confused.

We want to ensure that a repeated poll trigger doesn't lead to infinite
work making no progress, that's what the REQ_F_PARTIAL_IO check was
for. But it doesn't protect against a loop post the first receive, and
it's unnecessarily strict if we started out with an empty socket.

Add a somewhat random retry count, just to put an upper limit on the
potential number of retries that will be done. This should be high enough
that we won't really hit it in practice, unless something needs to be
aborted anyway.

Cc: [email protected] # v5.10+
Link: axboe/liburing#364
Signed-off-by: Jens Axboe <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
ammarfaizi2 pushed a commit to ammarfaizi2/linux-block that referenced this issue Mar 11, 2023
commit c16bda3 upstream.

If we get woken spuriously when polling and fail the operation with
-EAGAIN again, then we generally only allow polling again if data
had been transferred at some point. This is indicated with
REQ_F_PARTIAL_IO. However, if the spurious poll triggers when the socket
was originally empty, then we haven't transferred data yet and we will
fail the poll re-arm. This either punts the socket to io-wq if it's
blocking, or it fails the request with -EAGAIN if not. Neither condition
is desirable, as the former will slow things down, while the latter
will make the application confused.

We want to ensure that a repeated poll trigger doesn't lead to infinite
work making no progress, that's what the REQ_F_PARTIAL_IO check was
for. But it doesn't protect against a loop post the first receive, and
it's unnecessarily strict if we started out with an empty socket.

Add a somewhat random retry count, just to put an upper limit on the
potential number of retries that will be done. This should be high enough
that we won't really hit it in practice, unless something needs to be
aborted anyway.

Cc: [email protected] # v5.10+
Link: axboe/liburing#364
Signed-off-by: Jens Axboe <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
ammarfaizi2 pushed a commit to ammarfaizi2/linux-block that referenced this issue Mar 11, 2023
commit c16bda3 upstream.

If we get woken spuriously when polling and fail the operation with
-EAGAIN again, then we generally only allow polling again if data
had been transferred at some point. This is indicated with
REQ_F_PARTIAL_IO. However, if the spurious poll triggers when the socket
was originally empty, then we haven't transferred data yet and we will
fail the poll re-arm. This either punts the socket to io-wq if it's
blocking, or it fails the request with -EAGAIN if not. Neither condition
is desirable, as the former will slow things down, while the latter
will make the application confused.

We want to ensure that a repeated poll trigger doesn't lead to infinite
work making no progress, that's what the REQ_F_PARTIAL_IO check was
for. But it doesn't protect against a loop post the first receive, and
it's unnecessarily strict if we started out with an empty socket.

Add a somewhat random retry count, just to put an upper limit on the
potential number of retries that will be done. This should be high enough
that we won't really hit it in practice, unless something needs to be
aborted anyway.

Cc: [email protected] # v5.10+
Link: axboe/liburing#364
Signed-off-by: Jens Axboe <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
ammarfaizi2 pushed a commit to ammarfaizi2/linux-block that referenced this issue Mar 11, 2023
commit c16bda3 upstream.

If we get woken spuriously when polling and fail the operation with
-EAGAIN again, then we generally only allow polling again if data
had been transferred at some point. This is indicated with
REQ_F_PARTIAL_IO. However, if the spurious poll triggers when the socket
was originally empty, then we haven't transferred data yet and we will
fail the poll re-arm. This either punts the socket to io-wq if it's
blocking, or it fails the request with -EAGAIN if not. Neither condition
is desirable, as the former will slow things down, while the latter
will make the application confused.

We want to ensure that a repeated poll trigger doesn't lead to infinite
work making no progress, that's what the REQ_F_PARTIAL_IO check was
for. But it doesn't protect against a loop post the first receive, and
it's unnecessarily strict if we started out with an empty socket.

Add a somewhat random retry count, just to put an upper limit on the
potential number of retries that will be done. This should be high enough
that we won't really hit it in practice, unless something needs to be
aborted anyway.

Cc: [email protected] # v5.10+
Link: axboe/liburing#364
Signed-off-by: Jens Axboe <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
ammarfaizi2 pushed a commit to ammarfaizi2/linux-block that referenced this issue Mar 11, 2023
commit c16bda3 upstream.

If we get woken spuriously when polling and fail the operation with
-EAGAIN again, then we generally only allow polling again if data
had been transferred at some point. This is indicated with
REQ_F_PARTIAL_IO. However, if the spurious poll triggers when the socket
was originally empty, then we haven't transferred data yet and we will
fail the poll re-arm. This either punts the socket to io-wq if it's
blocking, or it fails the request with -EAGAIN if not. Neither condition
is desirable, as the former will slow things down, while the latter
will make the application confused.

We want to ensure that a repeated poll trigger doesn't lead to infinite
work making no progress, that's what the REQ_F_PARTIAL_IO check was
for. But it doesn't protect against a loop post the first receive, and
it's unnecessarily strict if we started out with an empty socket.

Add a somewhat random retry count, just to put an upper limit on the
potential number of retries that will be done. This should be high enough
that we won't really hit it in practice, unless something needs to be
aborted anyway.

Cc: [email protected] # v5.10+
Link: axboe/liburing#364
Signed-off-by: Jens Axboe <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
ammarfaizi2 pushed a commit to ammarfaizi2/linux-block that referenced this issue Mar 11, 2023
commit c16bda3 upstream.

If we get woken spuriously when polling and fail the operation with
-EAGAIN again, then we generally only allow polling again if data
had been transferred at some point. This is indicated with
REQ_F_PARTIAL_IO. However, if the spurious poll triggers when the socket
was originally empty, then we haven't transferred data yet and we will
fail the poll re-arm. This either punts the socket to io-wq if it's
blocking, or it fails the request with -EAGAIN if not. Neither condition
is desirable, as the former will slow things down, while the latter
will make the application confused.

We want to ensure that a repeated poll trigger doesn't lead to infinite
work making no progress, that's what the REQ_F_PARTIAL_IO check was
for. But it doesn't protect against a loop post the first receive, and
it's unnecessarily strict if we started out with an empty socket.

Add a somewhat random retry count, just to put an upper limit on the
potential number of retries that will be done. This should be high enough
that we won't really hit it in practice, unless something needs to be
aborted anyway.

Cc: [email protected] # v5.10+
Link: axboe/liburing#364
Signed-off-by: Jens Axboe <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
ammarfaizi2 pushed a commit to ammarfaizi2/linux-block that referenced this issue Mar 11, 2023
commit c16bda3 upstream.

If we get woken spuriously when polling and fail the operation with
-EAGAIN again, then we generally only allow polling again if data
had been transferred at some point. This is indicated with
REQ_F_PARTIAL_IO. However, if the spurious poll triggers when the socket
was originally empty, then we haven't transferred data yet and we will
fail the poll re-arm. This either punts the socket to io-wq if it's
blocking, or it fails the request with -EAGAIN if not. Neither condition
is desirable, as the former will slow things down, while the latter
will make the application confused.

We want to ensure that a repeated poll trigger doesn't lead to infinite
work making no progress, that's what the REQ_F_PARTIAL_IO check was
for. But it doesn't protect against a loop post the first receive, and
it's unnecessarily strict if we started out with an empty socket.

Add a somewhat random retry count, just to put an upper limit on the
potential number of retries that will be done. This should be high enough
that we won't really hit it in practice, unless something needs to be
aborted anyway.

Cc: [email protected] # v5.10+
Link: axboe/liburing#364
Signed-off-by: Jens Axboe <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
jpuhlman pushed a commit to MontaVista-OpenSourceTechnology/linux-mvista that referenced this issue Mar 18, 2023
Source: Kernel.org
MR: 125104
Type: Integration
Disposition: Backport from git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable linux-5.10-y
ChangeID: 246f26664b2ec47b4d6ba41b5c2b779550bda61d
Description:

commit c16bda3 upstream.

If we get woken spuriously when polling and fail the operation with
-EAGAIN again, then we generally only allow polling again if data
had been transferred at some point. This is indicated with
REQ_F_PARTIAL_IO. However, if the spurious poll triggers when the socket
was originally empty, then we haven't transferred data yet and we will
fail the poll re-arm. This either punts the socket to io-wq if it's
blocking, or it fails the request with -EAGAIN if not. Neither condition
is desirable, as the former will slow things down, while the latter
will make the application confused.

We want to ensure that a repeated poll trigger doesn't lead to infinite
work making no progress, that's what the REQ_F_PARTIAL_IO check was
for. But it doesn't protect against a loop post the first receive, and
it's unnecessarily strict if we started out with an empty socket.

Add a somewhat random retry count, just to put an upper limit on the
potential number of retries that will be done. This should be high enough
that we won't really hit it in practice, unless something needs to be
aborted anyway.

Cc: [email protected] # v5.10+
Link: axboe/liburing#364
Signed-off-by: Jens Axboe <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Armin Kuster <[email protected]>
jpuhlman pushed a commit to MontaVista-OpenSourceTechnology/linux-mvista that referenced this issue Mar 18, 2023
Source: Kernel.org
MR: 125104
Type: Integration
Disposition: Backport from git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable linux-5.10-y
ChangeID: 246f26664b2ec47b4d6ba41b5c2b779550bda61d
Description:

commit c16bda3 upstream.

If we get woken spuriously when polling and fail the operation with
-EAGAIN again, then we generally only allow polling again if data
had been transferred at some point. This is indicated with
REQ_F_PARTIAL_IO. However, if the spurious poll triggers when the socket
was originally empty, then we haven't transferred data yet and we will
fail the poll re-arm. This either punts the socket to io-wq if it's
blocking, or it fails the request with -EAGAIN if not. Neither condition
is desirable, as the former will slow things down, while the latter
will make the application confused.

We want to ensure that a repeated poll trigger doesn't lead to infinite
work making no progress, that's what the REQ_F_PARTIAL_IO check was
for. But it doesn't protect against a loop post the first receive, and
it's unnecessarily strict if we started out with an empty socket.

Add a somewhat random retry count, just to put an upper limit on the
potential number of retries that will be done. This should be high enough
that we won't really hit it in practice, unless something needs to be
aborted anyway.

Cc: [email protected] # v5.10+
Link: axboe/liburing#364
Signed-off-by: Jens Axboe <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Armin Kuster <[email protected]>
jpuhlman pushed a commit to MontaVista-OpenSourceTechnology/linux-mvista that referenced this issue Mar 18, 2023
Source: Kernel.org
MR: 125104
Type: Integration
Disposition: Backport from git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable linux-5.10-y
ChangeID: 246f26664b2ec47b4d6ba41b5c2b779550bda61d
Description:

commit c16bda3 upstream.

If we get woken spuriously when polling and fail the operation with
-EAGAIN again, then we generally only allow polling again if data
had been transferred at some point. This is indicated with
REQ_F_PARTIAL_IO. However, if the spurious poll triggers when the socket
was originally empty, then we haven't transferred data yet and we will
fail the poll re-arm. This either punts the socket to io-wq if it's
blocking, or it fails the request with -EAGAIN if not. Neither condition
is desirable, as the former will slow things down, while the latter
will make the application confused.

We want to ensure that a repeated poll trigger doesn't lead to infinite
work making no progress, that's what the REQ_F_PARTIAL_IO check was
for. But it doesn't protect against a loop post the first receive, and
it's unnecessarily strict if we started out with an empty socket.

Add a somewhat random retry count, just to put an upper limit on the
potential number of retries that will be done. This should be high enough
that we won't really hit it in practice, unless something needs to be
aborted anyway.

Cc: [email protected] # v5.10+
Link: axboe/liburing#364
Signed-off-by: Jens Axboe <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Armin Kuster <[email protected]>
oraclelinuxkernel pushed a commit to oracle/linux-uek that referenced this issue Apr 7, 2023
commit c16bda3 upstream.

If we get woken spuriously when polling and fail the operation with
-EAGAIN again, then we generally only allow polling again if data
had been transferred at some point. This is indicated with
REQ_F_PARTIAL_IO. However, if the spurious poll triggers when the socket
was originally empty, then we haven't transferred data yet and we will
fail the poll re-arm. This either punts the socket to io-wq if it's
blocking, or it fails the request with -EAGAIN if not. Neither condition
is desirable, as the former will slow things down, while the latter
will make the application confused.

We want to ensure that a repeated poll trigger doesn't lead to infinite
work making no progress, that's what the REQ_F_PARTIAL_IO check was
for. But it doesn't protect against a loop post the first receive, and
it's unnecessarily strict if we started out with an empty socket.

Add a somewhat random retry count, just to put an upper limit on the
potential number of retries that will be done. This should be high enough
that we won't really hit it in practice, unless something needs to be
aborted anyway.

Cc: [email protected] # v5.10+
Link: axboe/liburing#364
Signed-off-by: Jens Axboe <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
(cherry picked from commit 345fb368e5f5b6b6809b5c97419da436b3694e3f)
Signed-off-by: Jack Vogel <[email protected]>
mcarlin-ds pushed a commit to DatumSystems/linux that referenced this issue Apr 18, 2023
commit c16bda3 upstream.

If we get woken spuriously when polling and fail the operation with
-EAGAIN again, then we generally only allow polling again if data
had been transferred at some point. This is indicated with
REQ_F_PARTIAL_IO. However, if the spurious poll triggers when the socket
was originally empty, then we haven't transferred data yet and we will
fail the poll re-arm. This either punts the socket to io-wq if it's
blocking, or it fails the request with -EAGAIN if not. Neither condition
is desirable, as the former will slow things down, while the latter
will make the application confused.

We want to ensure that a repeated poll trigger doesn't lead to infinite
work making no progress, that's what the REQ_F_PARTIAL_IO check was
for. But it doesn't protect against a loop post the first receive, and
it's unnecessarily strict if we started out with an empty socket.

Add a somewhat random retry count, just to put an upper limit on the
potential number of retries that will be done. This should be high enough
that we won't really hit it in practice, unless something needs to be
aborted anyway.

Cc: [email protected] # v5.10+
Link: axboe/liburing#364
Signed-off-by: Jens Axboe <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
ahsanhussain pushed a commit to ahsanhussain/linux-flex-imx that referenced this issue May 19, 2023
commit c16bda37594f83147b167d381d54c010024efecf upstream.

If we get woken spuriously when polling and fail the operation with
-EAGAIN again, then we generally only allow polling again if data
had been transferred at some point. This is indicated with
REQ_F_PARTIAL_IO. However, if the spurious poll triggers when the socket
was originally empty, then we haven't transferred data yet and we will
fail the poll re-arm. This either punts the socket to io-wq if it's
blocking, or it fails the request with -EAGAIN if not. Neither condition
is desirable, as the former will slow things down, while the latter
will make the application confused.

We want to ensure that a repeated poll trigger doesn't lead to infinite
work making no progress, that's what the REQ_F_PARTIAL_IO check was
for. But it doesn't protect against a loop post the first receive, and
it's unnecessarily strict if we started out with an empty socket.

Add a somewhat random retry count, just to put an upper limit on the
potential number of retries that will be done. This should be high enough
that we won't really hit it in practice, unless something needs to be
aborted anyway.

Cc: [email protected] # v5.10+
Link: axboe/liburing#364
Signed-off-by: Jens Axboe <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
@vasama
Copy link

vasama commented Aug 15, 2023

What's the status of this issue? Is IORING_SETUP_IGNORE_ONONBLOCK going anywhere?

oraclelinuxkernel pushed a commit to oracle/linux-uek that referenced this issue Aug 18, 2023
commit c16bda3 upstream.

If we get woken spuriously when polling and fail the operation with
-EAGAIN again, then we generally only allow polling again if data
had been transferred at some point. This is indicated with
REQ_F_PARTIAL_IO. However, if the spurious poll triggers when the socket
was originally empty, then we haven't transferred data yet and we will
fail the poll re-arm. This either punts the socket to io-wq if it's
blocking, or it fails the request with -EAGAIN if not. Neither condition
is desirable, as the former will slow things down, while the latter
will make the application confused.

We want to ensure that a repeated poll trigger doesn't lead to infinite
work making no progress, that's what the REQ_F_PARTIAL_IO check was
for. But it doesn't protect against a loop post the first receive, and
it's unnecessarily strict if we started out with an empty socket.

Add a somewhat random retry count, just to put an upper limit on the
potential number of retries that will be done. This should be high enough
that we won't really hit it in practice, unless something needs to be
aborted anyway.

Cc: [email protected] # v5.10+
Link: axboe/liburing#364
Signed-off-by: Jens Axboe <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
(cherry picked from commit 246f26664b2ec47b4d6ba41b5c2b779550bda61d)

Orabug: 35495339

Signed-off-by: Prasad Singamsetty <[email protected]>
Reviewed-by: Alan Adamson <[email protected]>
Reviewed-by: Himanshu Madhani <[email protected]>
Signed-off-by: Brian Maly <[email protected]>
wanghao75 pushed a commit to openeuler-mirror/kernel that referenced this issue Oct 10, 2023
stable inclusion
from stable-v5.10.173
commit 246f26664b2ec47b4d6ba41b5c2b779550bda61d
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I7X0QU

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=246f26664b2ec47b4d6ba41b5c2b779550bda61d

--------------------------------

commit c16bda3 upstream.

If we get woken spuriously when polling and fail the operation with
-EAGAIN again, then we generally only allow polling again if data
had been transferred at some point. This is indicated with
REQ_F_PARTIAL_IO. However, if the spurious poll triggers when the socket
was originally empty, then we haven't transferred data yet and we will
fail the poll re-arm. This either punts the socket to io-wq if it's
blocking, or it fails the request with -EAGAIN if not. Neither condition
is desirable, as the former will slow things down, while the latter
will make the application confused.

We want to ensure that a repeated poll trigger doesn't lead to infinite
work making no progress, that's what the REQ_F_PARTIAL_IO check was
for. But it doesn't protect against a loop post the first receive, and
it's unnecessarily strict if we started out with an empty socket.

Add a somewhat random retry count, just to put an upper limit on the
potential number of retries that will be done. This should be high enough
that we won't really hit it in practice, unless something needs to be
aborted anyway.

Cc: [email protected] # v5.10+
Link: axboe/liburing#364
Signed-off-by: Jens Axboe <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: sanglipeng <[email protected]>
eclipse-oniro-oh-bot pushed a commit to eclipse-oniro-mirrors/kernel_linux_5.10 that referenced this issue Oct 27, 2023
stable inclusion
from stable-5.10.173
commit 246f26664b2ec47b4d6ba41b5c2b779550bda61d
category: bugfix
issue: #I8AH5O
CVE: NA

Signed-off-by: Ywenrui <[email protected]>
---------------------------------------

commit c16bda37594f83147b167d381d54c010024efecf upstream.

If we get woken spuriously when polling and fail the operation with
-EAGAIN again, then we generally only allow polling again if data
had been transferred at some point. This is indicated with
REQ_F_PARTIAL_IO. However, if the spurious poll triggers when the socket
was originally empty, then we haven't transferred data yet and we will
fail the poll re-arm. This either punts the socket to io-wq if it's
blocking, or it fails the request with -EAGAIN if not. Neither condition
is desirable, as the former will slow things down, while the latter
will make the application confused.

We want to ensure that a repeated poll trigger doesn't lead to infinite
work making no progress, that's what the REQ_F_PARTIAL_IO check was
for. But it doesn't protect against a loop post the first receive, and
it's unnecessarily strict if we started out with an empty socket.

Add a somewhat random retry count, just to put an upper limit on the
potential number of retries that will be done. This should be high enough
that we won't really hit it in practice, unless something needs to be
aborted anyway.

Cc: [email protected] # v5.10+
Link: axboe/liburing#364
Signed-off-by: Jens Axboe <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: yaowenrui <[email protected]>
Edgars-Cirulis pushed a commit to samsung-sm8550/kernel_samsung_sm8550-common that referenced this issue Jan 28, 2024
commit c16bda37594f83147b167d381d54c010024efecf upstream.

If we get woken spuriously when polling and fail the operation with
-EAGAIN again, then we generally only allow polling again if data
had been transferred at some point. This is indicated with
REQ_F_PARTIAL_IO. However, if the spurious poll triggers when the socket
was originally empty, then we haven't transferred data yet and we will
fail the poll re-arm. This either punts the socket to io-wq if it's
blocking, or it fails the request with -EAGAIN if not. Neither condition
is desirable, as the former will slow things down, while the latter
will make the application confused.

We want to ensure that a repeated poll trigger doesn't lead to infinite
work making no progress, that's what the REQ_F_PARTIAL_IO check was
for. But it doesn't protect against a loop post the first receive, and
it's unnecessarily strict if we started out with an empty socket.

Add a somewhat random retry count, just to put an upper limit on the
potential number of retries that will be done. This should be high enough
that we won't really hit it in practice, unless something needs to be
aborted anyway.

Cc: [email protected] # v5.10+
Link: axboe/liburing#364
Signed-off-by: Jens Axboe <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
mbissaromoto pushed a commit to MotorolaMobilityLLC/kernel-msm that referenced this issue May 17, 2024
commit c16bda37594f83147b167d381d54c010024efecf upstream.

If we get woken spuriously when polling and fail the operation with
-EAGAIN again, then we generally only allow polling again if data
had been transferred at some point. This is indicated with
REQ_F_PARTIAL_IO. However, if the spurious poll triggers when the socket
was originally empty, then we haven't transferred data yet and we will
fail the poll re-arm. This either punts the socket to io-wq if it's
blocking, or it fails the request with -EAGAIN if not. Neither condition
is desirable, as the former will slow things down, while the latter
will make the application confused.

We want to ensure that a repeated poll trigger doesn't lead to infinite
work making no progress, that's what the REQ_F_PARTIAL_IO check was
for. But it doesn't protect against a loop post the first receive, and
it's unnecessarily strict if we started out with an empty socket.

Add a somewhat random retry count, just to put an upper limit on the
potential number of retries that will be done. This should be high enough
that we won't really hit it in practice, unless something needs to be
aborted anyway.

Cc: [email protected] # v5.10+
Link: axboe/liburing#364
Signed-off-by: Jens Axboe <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
@axboe
Copy link
Owner

axboe commented Oct 2, 2024

Should be the case now.

@ioquatix
Copy link
Contributor Author

ioquatix commented Oct 2, 2024

I will try it out again.

@ioquatix
Copy link
Contributor Author

ioquatix commented Oct 4, 2024

I've done a basic test, and I'm not seeing EAGAIN any more. However, it's difficult to compare since I don't really have a system to reproduce the old behaviour. Thanks everyone, this looks pretty good to me now.

@axboe axboe closed this as completed Oct 4, 2024
guojinhui-liam pushed a commit to openvelinux/kernel that referenced this issue Jan 6, 2025
commit c16bda3 upstream.

If we get woken spuriously when polling and fail the operation with
-EAGAIN again, then we generally only allow polling again if data
had been transferred at some point. This is indicated with
REQ_F_PARTIAL_IO. However, if the spurious poll triggers when the socket
was originally empty, then we haven't transferred data yet and we will
fail the poll re-arm. This either punts the socket to io-wq if it's
blocking, or it fails the request with -EAGAIN if not. Neither condition
is desirable, as the former will slow things down, while the latter
will make the application confused.

We want to ensure that a repeated poll trigger doesn't lead to infinite
work making no progress, that's what the REQ_F_PARTIAL_IO check was
for. But it doesn't protect against a loop post the first receive, and
it's unnecessarily strict if we started out with an empty socket.

Add a somewhat random retry count, just to put an upper limit on the
potential number of retries that will be done. This should be high enough
that we won't really hit it in practice, unless something needs to be
aborted anyway.

Cc: [email protected] # v5.10+
Link: axboe/liburing#364
Signed-off-by: Jens Axboe <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Fengnan Chang <[email protected]>
Cloudef added a commit to Cloudef/zig-aio that referenced this issue Jan 28, 2025
This may be a bug in io_uring, but since EventSource's were changed to
O_NONBLOCK, io_uring may sometimes return EAGAIN.

Handle this by simply ignoring the error and hope for the best,
as the tests currently seem to pass.

EAGAIN's generally should not happen though on 6.x kernels
<axboe/liburing#364>
Cloudef added a commit to Cloudef/zig-aio that referenced this issue Jan 28, 2025
This may be a bug in io_uring, but since EventSource's were changed to
O_NONBLOCK, io_uring may sometimes return EAGAIN.

Handle this by simply ignoring the error and hope for the best,
as the tests currently seem to pass.

EAGAIN's generally should not happen though on 6.x kernels
<axboe/liburing#364>
Sythivo pushed a commit to Scythe-Technology/zig-aio that referenced this issue Jan 28, 2025
This may be a bug in io_uring, but since EventSource's were changed to
O_NONBLOCK, io_uring may sometimes return EAGAIN.

Handle this by simply ignoring the error and hope for the best,
as the tests currently seem to pass.

EAGAIN's generally should not happen though on 6.x kernels
<axboe/liburing#364>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants