Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: Do not block event loop when using middleware asynchronously #208

Closed
wants to merge 3 commits into from
Closed

Fix: Do not block event loop when using middleware asynchronously #208

wants to merge 3 commits into from

Conversation

adinsoon
Copy link

Hi, how are you?

I am using your library in the project and I appreciate your work. I'm currently working on async and enabling parallelization of custom processes in django by putting a django asgi container on gunicorn.
As the documentation indicates:

You will only get the benefits of a fully-asynchronous request stack if you have no synchronous middleware loaded into your site. If there is a piece of synchronous middleware, then Django must use a thread per request to safely emulate a synchronous environment for it.

And according to the changes you made a while ago here:
https://github.com/jrobichaud/django-structlog/pull/188/files#diff-4ddec93ed217fe3bb1b0720f34f6bb9c4df1f5d864026a069eb0843a3cbc3bfaR139
this middleware can be used both synchronously and asynchronously, which is also mentioned in the documentation here:
https://docs.djangoproject.com/en/4.1/topics/http/middleware/#asynchronous-support

In the project I'm working on, I also have other middlewares implemented this way. However, I ran into a stumbling block when I wanted to run a certain process entirely asynchronously (using API, database I/O, and so on). It turns out that the functions with the @sync_and_async_middleware decorator correctly select the async middleware variant, while for some reason the event loop is blocked by it and the tasks start to execute one after another instead of concurrently.

Assuming a very simple middleware call method and a very simple function:

     async def __call__(self, request):
         response = await self.get_response(request)
         return response
     async task_blocking_event_loop():
         print('Time to sleep!')
         await asyncio.sleep(30)
         return True

Turning on any middleware in the django container means that, for example, calling this function 5 times will take a total of 150 seconds and prints will appear every 30 seconds. It was tested by me that commenting out all the middlewares cleared the event loop and made all 5 prints appear.

So I started looking for a solution, and the simplest and most effective turned out to be the introduction of coroutine (literally) to middleware. For some reason get_response, which is a key element consumed in the middleware, blocks the event loop, and the ability to start and stop it with coroutine suddenly clears the entire event loop. Uncommented all middleware and it works! 5 tasks completed in just over 30 seconds on a single django instance.

I can of course import your SyncRequestMiddleware, AsyncRequestMiddleware to project and create custom version of request_middleware_router, but I think it's worth having such things in the main library, because maybe there will be others with the same problem. In my pull request, I proposed the least invasive idea to add coroutine there, but of course feel free to put them elsewhere, e.g. I was also thinking of overwriting init in AsyncRequestMiddleware. Maybe it will turn out that you have some other idea and a working solution?

Have a good one

@adinsoon
Copy link
Author

Oh I see the case, didn't think of it at first as I'm working on 3.9.

Support for generator-based coroutines is deprecated and is removed in Python 3.11.

This is getting a bit hard because just awaiting get_response blocks event loop. Do you see any solution here? It's tough for me to imagine awaiting the same callable twice, on the other hand, my case doesn't apply to solving it with async.run(...), because I need to stay in the same event loop

@jrobichaud
Copy link
Owner

jrobichaud commented Mar 22, 2023

Hi @adinsoon.

I am very new to async views and the first time it was integrated in django-structlog it caused issues to many users.

Its is currently very Alpha in django-structlog and I would be glad to have some help to support it correctly.

The tests are failing, could you please take a look?

How to run locally:
https://django-structlog.readthedocs.io/en/latest/running_tests.html

@adinsoon
Copy link
Author

Hi @jrobichaud, thanks for fast reply. Yes, I became familiar with those issues and then you rolled back those changes and introduced a workaround.

Tests are failing because of 3.11 deprecation. Nothing comes to my mind right now because I'm having a hard time imagining using the await syntax twice as they suggest one should do

@codecov
Copy link

codecov bot commented Mar 22, 2023

Codecov Report

Patch coverage: 100.00% and no project coverage change.

Comparison is base (e48bb06) 100.00% compared to head (7876fb7) 100.00%.

Additional details and impacted files
@@            Coverage Diff            @@
##            master      #208   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           41        41           
  Lines         1272      1274    +2     
=========================================
+ Hits          1272      1274    +2     
Impacted Files Coverage Δ
django_structlog/middlewares/request.py 100.00% <100.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@jrobichaud
Copy link
Owner

I am not sure of your last change, how does that fix the problem you have?

@adinsoon
Copy link
Author

Sorry I forgot that coroutines have been deprecated in Python 3.11.
I come with a rather funny idea, which I also tested at my project and it turns out that it works just like using a coroutine and does not block the event loop. This is quite an interesting case because this extra method does nothing but await get_response, which is then awaited again. But according to Python 3.11 documentation asyncio.coroutine should be replaced with async/await syntax. It seems that this change, quite exaggerated, also applies.

@jrobichaud
Copy link
Owner

I still don't understand why:

async def to_wait():
    #...

await to_wait()

would be different than:

async def to_wait():
    #...

async def to_wait_proxy():
    return await to_wait()

await to_wait_proxy()

@adinsoon
Copy link
Author

Same to me, to some extent to be honest.
This is what I get after calling 5 times on endpoint in my project that was adjusted to await sleep right after it get request (I cleaned up some logs to hide sensitive data)
If I "double await"/use asyncio.coroutine on get_response in middlewares:

http_worker-1  | Going to sleep.
http_worker-1  | {"event": "Request started", "get_params": "{}", "level": "info", "method": "POST", "request_body": "{'xxx': '0be63c21-71f7-4fbf-9b64-3560e202efee', 'user_id': '1'}", "request_files_sizes": "{}", "request_headers": "{'CONTENT_LENGTH': '60'}", "timestamp": "2023-03-22T19:30:52.825636Z", "url": "/verification/"}
http_worker-1  | Going to sleep.
http_worker-1  | {"event": "Request started", "get_params": "{}", "level": "info", "method": "POST", "request_body": "{'xxx': 'fc5ecbfb-a72c-4755-8712-879a745deaa6', 'user_id': '1'}", "request_files_sizes": "{}", "request_headers": "{'CONTENT_LENGTH': '60'}", "timestamp": "2023-03-22T19:30:54.582418Z", "url": "/verification/"}
http_worker-1  | Going to sleep.
http_worker-1  | {"event": "Request started", "get_params": "{}", "level": "info", "method": "POST", "request_body": "{'xxx': 'f05a5968-c026-4faf-bcb2-83511a209a29', 'user_id': '1'}", "request_files_sizes": "{}", "request_headers": "{'CONTENT_LENGTH': '60'}", "timestamp": "2023-03-22T19:30:55.986523Z", "url": "/verification/"}
http_worker-1  | Going to sleep.
http_worker-1  | {"event": "Request started", "get_params": "{}", "level": "info", "method": "POST", "request_body": "{'xxx': '289cea24-38e2-441d-83ba-f7cd7534fad4', 'user_id': '1'}", "request_files_sizes": "{}", "request_headers": "{'CONTENT_LENGTH': '60'}", "timestamp": "2023-03-22T19:30:57.899958Z", "url": "/verification/"}
http_worker-1  | Going to sleep.

Looks like event loop is not blocked

If I don't "double await"/use asyncio.coroutine on get_response and keep it default (for example uncomment unmodified django-structlog)

http_worker-1  | {"event": "Request started", "get_params": "{}", "level": "info", "method": "POST", "request_body": "{'xxx': '289cea24-38e2-441d-83ba-f7cd7534fad4', 'user_id': '1'}", "request_files_sizes": "{}", "request_headers": "{'CONTENT_LENGTH': '60'}", "request_id": "951db1a2dc33bab171c657df1ec56e89", "timestamp": "2023-03-22T19:37:29.721714Z", "url": "/verification/"}
http_worker-1  | Going to sleep.
http_worker-1  | {"action": "prepare_xxx_to_async", "id": "2", "xxx": "289cea24-38e2-441d-83ba-f7cd7534fad4", "event": "Prefetching xxx to async started", "level": "info", "request_id": "951db1a2dc33bab171c657df1ec56e89", "timestamp": "2023-03-22T19:37:59.775372Z", "user_id": "1"}

Looks like the 4 remaining requests to this endpoint have to wait until the first one is completely finished

@jrobichaud
Copy link
Owner

jrobichaud commented Mar 22, 2023

Could you provide a way I could reproduce the problem with the demo project? (it is currently using python 3.9 but I will downgrade to experiment of course)

https://django-structlog.readthedocs.io/en/latest/demo.html

@adinsoon
Copy link
Author

We will continue to investigate this topic and this strange behavior tomorrow in our company.
We are very committed to resolving this case in a correct and proper way, and if we can contribute to the development of your library, we will be very pleased. If we can figure out why this particular workaround works even though it doesn't do anything special, I'd love to share it. Because I admit that I also wonder why this way works. I was able to understand using the coroutine, but double awaiting is a coincidence idea, but it still successfully unlocks the event loop.

Yes, let me/us try to do it, i will try to provide you with conditions for reproduction

@adinsoon
Copy link
Author

adinsoon commented Mar 22, 2023

// Small update

Thanks for giving a good idea of trying to reproduce this on your demo. At the moment I managed to quickly upload some similar things that I used locally for reproduction (gunicorn, uvicorn, awaited sleep), and tried to perform few shots in Postman - so far unsuccessfully - from what I can see it does not block event loop. Very strange.

django-structlog-django_asgi-1  | Started something in async
django-structlog-django_asgi-1  | Going to sleep.
django-structlog-django_asgi-1  | Started something in async
django-structlog-django_asgi-1  | Going to sleep.
django-structlog-django_asgi-1  | Started something in async
django-structlog-django_asgi-1  | Going to sleep.
django-structlog-django_asgi-1  | Started something in async
django-structlog-django_asgi-1  | Going to sleep.
django-structlog-django_asgi-1  | Started something in async
django-structlog-django_asgi-1  | Going to sleep.
django-structlog-django_asgi-1  | Ended something in async
django-structlog-django_asgi-1  | Ended something in async
django-structlog-django_asgi-1  | Ended something in async
django-structlog-django_asgi-1  | Ended something in async
django-structlog-django_asgi-1  | Ended something in async

We will try to check tomorrow why it can work on the demo and why it get stuck on ours, and double await is fixing it, thank you for your time

@jrobichaud
Copy link
Owner

Validate also the middleware chain, it takes 1 synchronous middleware and all requests become synchronous.

@adinsoon
Copy link
Author

adinsoon commented Mar 22, 2023

In the meantime I've done a dozen more isolated trials (full clear of containers, build and init from scratch, bumping libraries, single and mixed commenting and uncommenting of middlewares), and if it's not due to fatigue, it looks like I succeeded in my project to unlock the event loop without these fancy changes.
It's very possible that we used your middleware version 4.1.1 in this way:

     ... # ...
     ... "django_structlog.middlewares.RequestMiddleware",

being convinced that it does the same thing that is currently done by:

     ... # ...
     ... "django_structlog.middlewares.request_middleware_router"

This is the most logical explanation i can see here.
When I tested both of these versions, it turned out that the event loop is not blocked on the second. In turn, the first caused block, but adding these oddities to it resulted in unlocking the event loop too.
I'll look at it again when I'm well rested, but it's a good step towards closing this pull request. It's great that you were critical of this idea.

//

Validate also the middleware chain, it takes 1 synchronous middleware and all requests become synchronous.

and I guess that's why it could work synchronously, as I can see in the library, RequestMiddleware inherits from SyncRequestMiddleware, so we are dealing with the presence of synchronous middleware, so everything else goes synchronously. it makes sense.

@jrobichaud
Copy link
Owner

jrobichaud commented Mar 22, 2023

I first integrated the router by default then it broke the setup of several projects.

The root cause is that the middleware clickjacking appears to be coded as synchronous but it uses the MiddlewareMixin that claims the middle ware is both synchronous and asynchronous.

https://docs.djangoproject.com/fr/4.1/_modules/django/middleware/clickjacking/

https://github.com/django/django/blob/main/django/utils/deprecation.py#L98

It means that making django-structlog async as default was exposing clickjacking's problem.

These people use asgi but synchronously because of this.

Edit: I think they fixed the clickjacking middleware lately.

@jrobichaud
Copy link
Owner

Looking at the code again, I don't see why it would not work with clickjacking.

I'll look into this again eventually.

@jrobichaud
Copy link
Owner

@adinsoon , could you please test #209. I think I managed to implement async support correctly this time.

I am not sure it affects your problem but it does fix the problem django-structlog had with 'django.middleware.clickjacking.XFrameOptionsMiddleware',

I replicated the code of MiddlewareMixin.

@adinsoon
Copy link
Author

@jrobichaud
Thank you very much for your commitment. Your pull request looks solid.

I merged your changes into forked demo I sent you yesterday and ran some tests. Everything works fine

django-structlog-django_asgi-1  | Started something in async
django-structlog-django_asgi-1  | Going to sleep.
django-structlog-django_asgi-1  | Started something in async
django-structlog-django_asgi-1  | Going to sleep.
django-structlog-django_asgi-1  | Started something in async
django-structlog-django_asgi-1  | Going to sleep.
django-structlog-django_asgi-1  | Started something in async
django-structlog-django_asgi-1  | Going to sleep.
django-structlog-django_asgi-1  | Started something in async
django-structlog-django_asgi-1  | Going to sleep.
django-structlog-django_asgi-1  | Ended something in async
django-structlog-django_asgi-1  | Ended something in async
django-structlog-django_asgi-1  | Ended something in async
django-structlog-django_asgi-1  | Ended something in async
django-structlog-django_asgi-1  | Ended something in async

I see more changes coming to asyncio in python 3.12. It's nice that probably you took it into account and used imports from asgiref

I'm willing to bump version of your library in my project after you release it and see how it works, then we'll have a more advanced testing environment 😀

@adinsoon
Copy link
Author

The tests I carried out today allowed me to confirm that the event loop was blocked by using the wrong middleware version and forcing synchronization in the instance. Due to fixing the problem, this pull request is no longer needed and I decide to close it. Thank you very much for your help. Take care

@adinsoon adinsoon closed this Mar 23, 2023
@adinsoon adinsoon deleted the fix/do-not-block-event-loop-when-using-middleware-asynchronously branch March 23, 2023 17:18
@jrobichaud
Copy link
Owner

I'll release a version by tomorrow. I'll warn you when it's done.

@jrobichaud
Copy link
Owner

@adinsoon 5.0.0 is out.

@adinsoon
Copy link
Author

@jrobichaud Awesome, thanks. However, I noticed a problem that I will try to describe to you in that pull request

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants