Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🏗️ Arbitrarily cap unarchiving to 2 workers #2384

Merged
merged 2 commits into from
Jun 16, 2021

Conversation

mguidon
Copy link
Member

@mguidon mguidon commented Jun 15, 2021

What do these changes do?

Currently, extracting zip files is done with a ProcessPool using all available cpu cores. This can be pretty heavy and, for instance on AWS, can make a cluster node completely unresponsive. I suggest to cap at 2. Once this is in, all dynamic services that consume that need to be updated.

@codecov
Copy link

codecov bot commented Jun 15, 2021

Codecov Report

Merging #2384 (11ca7cb) into master (a18f686) will decrease coverage by 0.0%.
The diff coverage is 100.0%.

Impacted file tree graph

@@           Coverage Diff            @@
##           master   #2384     +/-   ##
========================================
- Coverage    74.8%   74.7%   -0.1%     
========================================
  Files         516     516             
  Lines       20033   20034      +1     
  Branches     1971    1971             
========================================
  Hits        14985   14985             
  Misses       4531    4531             
- Partials      517     518      +1     
Flag Coverage Δ
integrationtests 67.3% <ø> (-0.1%) ⬇️
unittests 68.2% <100.0%> (+<0.1%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
.../service-library/src/servicelib/archiving_utils.py 89.6% <100.0%> (+0.1%) ⬆️
...ce_webserver/resource_manager/garbage_collector.py 72.0% <0.0%> (-0.9%) ⬇️
.../director/src/simcore_service_director/producer.py 61.0% <0.0%> (+0.2%) ⬆️

@mguidon mguidon requested review from GitHK and pcrespov June 15, 2021 19:27
@@ -76,7 +76,7 @@ async def unarchive_dir(
all tree leafs, which might include files or empty folders
"""
with zipfile.ZipFile(archive_to_extract, mode="r") as zip_file_handler:
with ProcessPoolExecutor() as pool:
with ProcessPoolExecutor(max_workers=2) as pool:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please move this value in a constant named something like MAX_UNARCHIVING_WORKER_COUNT

Copy link
Member

@pcrespov pcrespov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a library, i would add it optional to the unarchive_dir with a default to 2

async def unarchive_dir(archive_to_extract: Path, destination_folder: Path, *, max_workers_processes: int =2) -> Set[Path]:
# ...
with ProcessPoolExecutor(max_workers=max_workers_processes) as pool:
 # ...

E.g. if used inside a celery worker it we might want to take all processors offered by the container.

@mguidon mguidon merged commit cfdf4f8 into ITISFoundation:master Jun 16, 2021
@sanderegg sanderegg added this to the Marmoset milestone Jun 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants