-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Task exception was never retrieved on v3.8.3 #6978
Comments
Initial investigation of the exception not retrieved: If an exception is raised, then Not investigated anything about the assertion error though. |
I've also just recently seen the exact error described by @w0rmr1d3r pop up, seemingly at random. Will report back if we're able to get any more clarity or fixes on our end. Thanks everyone for looking :) |
Same issue after updating to v3.8.3 |
Indeed @doublex . This started appearing in our services when upgrading to v3.8.3 |
There aren't too many changes (if you ignore formatting/typing changes), so if anyone can take a look and figure out which change is the issue, that'd be great. These 2 removed lines are the only ones that stand out to me, so maybe try adding them back in first: |
Hi @Dreamsorcerer , Thank you for providing the diff between the affected versions. I'm reading through the code and trying to understand. But you will have more context and knowledge on how this library works. My guessings: File web_protocol.py on line 505, will always pass the handler as None. Since it's set on line 472 to be the self._task_handler. Still, I lack complete context of the code, just writing what I see. Then, in the web_app.py file, line 437 -> 454, protocol will be a RequestHandler, but is received as None, since as seen in this line: aiohttp/aiohttp/web_protocol.py Line 51 in 8cf01ad
it can be "asyncio.Task[None]". I guess, that is setting it to None, somehow? Again, just tried to read through the code, but have 0 context on how the library works. My guesses might be completely wrong, apologies for that. |
Line 473 asserts it to be not None, so that's not possible. My suggestion was to try adding back in the lines 302-303 that were deleted and see if the problem is still reproducible. If it is, then maybe checkout the code at different commits until you find the commit that introduces the problem (you could use |
Hello @Dreamsorcerer , So, what you mean is, adding back those lines in the next release and see if the issue is fixed? I understand that to be version 3.9.0? Just wanted to clarify and get right the steps on this. Thank you for providing an answer and maintain this project 🚀 |
Yes, just clone the 3.9 branch, add those lines in, then test if the issue is fixed (also check the issue is reproducible before adding the line in). |
Hello @Dreamsorcerer , Hopefully in the future I might be able to put some effort on this to debug it or something. |
3.8.4 only added a moderate severity fix. If you can figure out this issue, we'll put it into the 3.9 release. |
Just here to confirm that we can see this issue happening in python 3.9, 3.10 and 3.11 with aiohttp 3.8.4 |
I can also confirm that this issue occurs in our project with aiohttp 3.8.3 and Python 3.11. |
@Dreamsorcerer I tried reverting these lines on top of v3.8.3 and this does not fix the issue. They were also the only lines standing out to me in the git history, so I'll need to take a deeper look. Looking at the logic, it seems like this originates from the My setup:
|
If you can run it from the git repo, then a |
I'd love to, but I have no idea how to build the package properly to test inside a Docker image (only solution I have to reproduce the issue at the moment). I've looked at the docs but I couldn't find a tip, I always end up with "aiohttp/_websocket.c: File not found". Any idea? |
In the meantime, I realized I hadn't installed my patched version correctly. It appears that applying the patch below as suggested fixes the issue.
Now, I saw in the commit history that this patch has already been applied, removed and reapplied (cf. the discussion here: #6719). What's the best course of action? I don't see cancellation exceptions in my setup, maybe the whole issue discussed in #6719 has been fixed in the meantime. |
OK, I've got a vague idea on what the cause may be now. Can you try removing the My theory is that on connection_lost on older releases (and optionally in 3.9) we cancel the handler. Now we are not cancelling, I suspect there may be an exception occurring on So, I'm hoping that by making that change, instead of an exception was not retrieved warning, you'll get the actual exception and then we can catch that error in future. |
This also suggests that we may be able to build an actual test, if anyone wants to give that a go? Hopefully, it just means we need an endpoint that takes a couple of seconds and then have a client disconnect prematurely to trigger the warning. |
I've tried replicating this issue in a test, but it passes just fine on the 3.8 branch. Am I approaching this in a wrong way? async def test_client_connection_lost(aiohttp_client: Any) -> None:
async def handler(_):
await asyncio.sleep(2)
return web.Response()
app = web.Application()
app.router.add_route("GET", "/", handler)
client = await aiohttp_client(app)
with pytest.raises(TimeoutError):
await asyncio.wait_for(
client.get("/"),
timeout=1,
)
await client.close() I've set a breakpoint in |
I've never managed to reproduce it, which is why I keep asking others to test/debug. |
I am currently facing this issue, I do not think it's easily reproduced in a test as you need to send some sort of client/server disconnection into the request/response handler.
I will go down back to |
If my theory is right, then we should be able to write a test. But, I'm still waiting on someone to confirm the cause by hopefully getting the traceback as described in #6978 (comment) |
I have been pretty busy since April but I'll try to reproduce the issue in the weeks to come. |
OK, I played around with the code, manually adding in exceptions etc. and I couldn't figure out any way to reproduce this. Hopefully, #7333 will help, but I don't think there's anything else I can do unless someone figures something else out. |
Once #7333 is released I can definitely give it a try. |
Still happening for v3.8.5 which includes #7333 |
Continuing the conversation that started here. I think this ticket is the correct one for my case given that the traceback is the same here.
Is there any way that I can add additional logs to test this in the current version of the library that I use in prod (3.8.5)? Maybe call some aiohttp method and print something upon any end point being hit? Appreciate your help with trying to figure this out. |
I'm not failiar with GAE, but can you just drop aiohttp into the project directory, where it loads your app from?
I can't think of an easy way. You could try and monkey patch the library, but you'd probably have to copy the entire function (e.g. .start() method), add the print in, and then assign it to the aiohttp class. |
Or failing that, copy into the app's directory as a module, and then change the imports to |
Good news, I made it work. The issue was that I didn't change the path in the app's yml file: I deployed with both changes you proposed.
Will keep posted when it happens. |
So the
This line below got printed a few times:
Result of the print every time: But I think that happened every time when there was another issue: |
That's right before the Did you add a print at |
Yes, this never got printed:
this got printed ~10 times in different places from
and every time |
Yep, so seems like it's definitely some kind of race condition. Not sure on the exact solution, will look at it later, but I suspect we can probably just add another check in the latter place. I'm a little short on time, so would be great if someone can take a look. A test would be good too, hopefully it's possible to write a test that closes it as soon as the waiter is set, and before the second bit of code runs (probably, |
Hi, any chance we can get a fix on this? |
Information is above if you want to try and create a test and fix. |
Well, that's because nobody has fixed it yet and why the issue is still open... |
There have been changes to the transport code so it could have been fixed. |
I still haven't had time to look, but what I mentioned before might be possible to reproduce in a test. |
Refamiliarising myself with the code, I think we need the test to get into the aiohttp/aiohttp/web_protocol.py Line 515 in 122597f
Then we can do something like The objective is to get it to resume running in .start() and reach this code while aiohttp/aiohttp/web_protocol.py Line 534 in 122597f
|
I'm still not clear how it's possible to get into this situation, but I can force it with a test like this: async def test_race_condition_exception(aiohttp_client: Any) -> None:
async def handler(request):
data = await request.text()
return web.Response(text=data)
orig_data_received = RequestHandler.data_received
def data_received(self, data: bytes) -> None:
self.force_close()
orig_data_received(self, data)
with mock.patch.object(RequestHandler, "data_received", data_received):
app = web.Application()
app.router.add_get("/", handler)
client = await aiohttp_client(app)
size = 1000000
with pytest.raises(ServerDisconnectedError):
async with client.get("/", data=io.BytesIO(b"1" * size)) as resp:
assert resp.status == 200 However, the exception or warning is not actually being logged, so there's no indication that it's happened. From adding print statements, I know it's hitting that assertion error though... |
Describe the bug
We bumped from v3.8.1 to v3.8.3 this morning and started seeing the following errors. No other changes were done to our project:
And
To Reproduce
No other changes introduced from our side, those errors started appearing.
Expected behavior
For that exception to be handled.
Logs/tracebacks
And
Python Version
aiohttp Version
multidict Version
yarl Version
OS
Docker image
Related component
Server
Additional context
I have added as much context as I had, if I left out anything relevant, happy to look for it and add it to the issue.
Code of Conduct
The text was updated successfully, but these errors were encountered: