-
-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove old timeout when deadline changes #232
Conversation
I had a memory issue and @franz1981 suggested netty#211 as the cause. This patch is my fix for that bug, though I don't believe my mem issue was ultimately caused by this. This PR does the legwork for adding ioringOpTimeoutRemove, and implementing a test. However two things can still be improved: - [ ] could use IORING_TIMEOUT_UPDATE (see netty#211) to save one sqe. - [ ] there may be a race in IOUringEventLoop between the addTimeout and the IORING_OP_TIMEOUT handler. If the kernel fires a deadline cqe, then we send a deadline update sqe, and only then we process the first cqe, prevDeadlineNanos is NONE even though we've submitted a new deadline. I'm not sure if this can actually happen since deadline changes should only adjust the deadline downwards, not upwards? Not sure.
transport-native-io_uring/src/test/java/io/netty/incubator/channel/uring/NativeTest.java
Show resolved
Hide resolved
…nnel/uring/NativeTest.java
Thanks a lot ! I think what you did is fine... I would defer the use of IORING_TIMEOUT_UPDATE as it would also most likely bump the kernel version requirement. Maybe we can add an issue to keep track of it |
@yawkat so with this said I think we can put this out of draft state ? |
Also seems like the new added test failed :D |
i will look at the test but it might take a few days. i think the race condition is still an issue. id like to at least think about it a bit more, consider how it can happen. |
ive introduced a "generation counter". basically, every addTimeout counts up the generation and sends it to the kernel as the extra field. When a timeout occurs, i check that the generation matches the latest timeout that was added. This ensures that a previous timeout cannot make us think that the current timeout is finished. |
well done @yawkat ! I like the epoch/sequencing approach |
@@ -248,9 +261,11 @@ private void handle(int fd, int res, int flags, byte op, short data) { | |||
pendingWakeup = false; | |||
addEventFdRead(ringBuffer.ioUringSubmissionQueue()); | |||
} else if (op == Native.IORING_OP_TIMEOUT) { | |||
if (res == Native.ERRNO_ETIME_NEGATIVE) { | |||
if (res == Native.ERRNO_ETIME_NEGATIVE && data == prevTimeoutGeneration) { | |||
prevDeadlineNanos = NONE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could trace loging it too (as we do we fd reuse :( ) if we assume to be an unwelcome behaviour
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure why not, added. i don't think it's necessarily a bad thing,it's just a simple race condition that will happen sometimes, but it can't hurt to log.
@yawkat seems like there is still a test failure |
thats weird. it passes locally every time, doesnt appear flakey. |
did you run it in a loop ? |
@yawkat ping |
fixed |
@yawkat so I think this is now ready to be merged... correct ? |
yep, only potential improvement would be to use UPDATE, but not in this pr |
@yawkat ok perfect... Can you just update the commit message to match our template ? |
Can you squash merge it with this message: Remove old timeout when deadline changes (#211) When the event loop timer deadline changes, the old timer would not be removed. This could potentially lead to memory issues. Modification: Use IORING_OP_TIMEOUT_REMOVE to remove a previously set timeout before a new one is registered. Use a generation counter to prevent race conditions between timeout add/remove in sqe/cqe. Result: Old timers are removed. At any time, there is only at most one timer per event loop. |
@yawkat done... thanks a lot ! |
thanks! |
I had a memory issue and @franz1981 suggested #211 as the cause. This patch is my fix for that bug, though I don't believe my mem issue was ultimately caused by this.
This PR does the legwork for adding ioringOpTimeoutRemove, and implementing a test. However two things can still be improved: