-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve concurrency on Windows #286
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #286 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 6 6
Lines 517 543 +26
=========================================
+ Hits 517 543 +26 ☔ View full report in Codecov by Sentry. |
async def _process_queue(cls) -> None: | ||
while not cls._stop_event.is_set(): | ||
coro, future = await cls._coro_queue.get() | ||
asyncio.create_task(cls._handle_coro(coro, future)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may not apply here but, for the record, when implementing session management for scrapy-zyte-api, I found that I needed to keep live references to task objects until finished to prevent them from being garbage-collected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, thank you very much for the reference. I haven't seen that in my tests but it might be that my runs were not long enough. I won't include it right away just to keep things simple but I'll keep it in mind in case someone reports it as an issue.
"""Utility class to redirect coroutines to an asyncio event loop running | ||
in a different thread. This allows to use a ProactorEventLoop, which is | ||
supported by Playwright on Windows. | ||
class _ThreadedLoopAdapter: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn’t this still have Windows, Proactor or something of the sort in the name? (I’m looking at def start
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I renamed it like this because it's technically possible to use it on other platforms as well. The concurrency issue also happens on Linux with this threaded loop approach, it's not an intrinsic Windows issue, so I did most of the development in Linux and only then I tried it on Windows to make sure it works.
Closes #282
Alternative approach to #285
This yields full concurrency (
'playwright/page_count/max_concurrent': 32
in stats) with the sample spider from #285: