-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🏗️ Arbitrarily cap unarchiving to 2 workers #2384
🏗️ Arbitrarily cap unarchiving to 2 workers #2384
Conversation
Codecov Report
@@ Coverage Diff @@
## master #2384 +/- ##
========================================
- Coverage 74.8% 74.7% -0.1%
========================================
Files 516 516
Lines 20033 20034 +1
Branches 1971 1971
========================================
Hits 14985 14985
Misses 4531 4531
- Partials 517 518 +1
Flags with carried forward coverage won't be shown. Click here to find out more.
|
@@ -76,7 +76,7 @@ async def unarchive_dir( | |||
all tree leafs, which might include files or empty folders | |||
""" | |||
with zipfile.ZipFile(archive_to_extract, mode="r") as zip_file_handler: | |||
with ProcessPoolExecutor() as pool: | |||
with ProcessPoolExecutor(max_workers=2) as pool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please move this value in a constant named something like MAX_UNARCHIVING_WORKER_COUNT
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a library, i would add it optional to the unarchive_dir
with a default to 2
async def unarchive_dir(archive_to_extract: Path, destination_folder: Path, *, max_workers_processes: int =2) -> Set[Path]:
# ...
with ProcessPoolExecutor(max_workers=max_workers_processes) as pool:
# ...
E.g. if used inside a celery worker it we might want to take all processors offered by the container.
What do these changes do?
Currently, extracting zip files is done with a ProcessPool using all available cpu cores. This can be pretty heavy and, for instance on AWS, can make a cluster node completely unresponsive. I suggest to cap at 2. Once this is in, all dynamic services that consume that need to be updated.