Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨Computational backend: connect dv2 to clusters keeper for on-demand clusters (🗃️ + ⚠️devops) #4703

Conversation

sanderegg
Copy link
Member

@sanderegg sanderegg commented Sep 4, 2023

What do these changes do?

This PR brings a connection between the director-v2 and the clusters-keeper.
Now a user may have an on-demand cluster for his/her computations.
Current limitations are:

  • only 1 machine of 1 type

clusters-keeper

  • fixes issue where clusters-keeper would not start in devel mode
  • moved RPC interface into models-library
  • 🗃️ added column use_on_demand_clusters into comp_runs table
  • new ENV CLUSTERS_KEEPER_MAX_MISSED_HEARTBEATS_BEFORE_CLUSTER_TERMINATION defaults to 5
  • new ENV CLUSTERS_KEEPER_TASK_INTERVAL defaults to 60

director-v2

  • added exceptions for OnDemandCluster not ready or clusters-keeper not available
  • added rabbitMQ RPC client to contact clusters-keeper when on demand cluster is necessary
  • added WAITING_FOR_CLUSTER RunningState and in StateType (🗃️ associated)
  • added use_on_demand_clusters field in ComputationCreate body to instruct dv2 to get or create a cluster on the fly
  • currently clusters-keeper is disabled by default in the ENV variables

frontend

  • added WAITING_FOR_CLUSTER RunningState

Related issue/s

How to test

  • need access to S3 (AWS or privately available)
  • need access to docker registry to run services
  • define a GPU oriented machine
  • all this goes in a well defined .env file that I can provide
  • pretty early testing... expect bugs and this is not yet available to all

DevOps Checklist

  • new ENV CLUSTERS_KEEPER_MAX_MISSED_HEARTBEATS_BEFORE_CLUSTER_TERMINATION defaults to 5
  • new ENV CLUSTERS_KEEPER_TASK_INTERVAL defaults to 60

@sanderegg sanderegg added a:director-v2 issue related with the director-v2 service a:clusters-keeper labels Sep 4, 2023
@sanderegg sanderegg added this to the Baklava milestone Sep 4, 2023
@sanderegg sanderegg self-assigned this Sep 4, 2023
@codecov
Copy link

codecov bot commented Sep 4, 2023

Codecov Report

Merging #4703 (1af9f61) into master (1a3e766) will increase coverage by 0.9%.
The diff coverage is 77.0%.

Impacted file tree graph

@@           Coverage Diff            @@
##           master   #4703     +/-   ##
========================================
+ Coverage    86.0%   86.9%   +0.9%     
========================================
  Files        1111    1113      +2     
  Lines       46518   46612     +94     
  Branches     1012    1013      +1     
========================================
+ Hits        40025   40546    +521     
+ Misses       6275    5839    -436     
- Partials      218     227      +9     
Flag Coverage Δ
integrationtests 65.2% <62.9%> (+1.3%) ⬆️
unittests 84.6% <76.0%> (-0.1%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Changed Coverage Δ
...ages/models-library/src/models_library/clusters.py 100.0% <ø> (ø)
.../src/simcore_postgres_database/models/comp_runs.py 100.0% <ø> (ø)
...eeper/src/simcore_service_clusters_keeper/_meta.py 91.3% <ø> (ø)
...imcore_service_clusters_keeper/modules/clusters.py 100.0% <ø> (ø)
...ore_service_director_v2/api/routes/computations.py 93.2% <ø> (+22.2%) ⬆️
...rc/simcore_service_director_v2/core/application.py 97.2% <ø> (ø)
...e_director_v2/modules/db/repositories/comp_runs.py 94.9% <ø> (ø)
...tor-v2/src/simcore_service_director_v2/utils/db.py 100.0% <ø> (ø)
...rc/simcore_service_webserver/db_listener/_utils.py 100.0% <ø> (ø)
...odels_library/api_schemas_directorv2/comp_tasks.py 80.0% <50.0%> (-7.5%) ⬇️
... and 23 more

... and 20 files with indirect coverage changes

@sanderegg sanderegg force-pushed the comp-backend/connect-dv2-to-clusters-keeper branch 3 times, most recently from 146a9e0 to 601ec01 Compare September 4, 2023 19:50
@sanderegg sanderegg force-pushed the comp-backend/connect-dv2-to-clusters-keeper branch from feba364 to d5682f8 Compare September 11, 2023 08:11
@sanderegg sanderegg force-pushed the comp-backend/connect-dv2-to-clusters-keeper branch from d5682f8 to 8e7fe2b Compare September 11, 2023 11:54
@sanderegg sanderegg changed the title ✨Computational backend: connect dv2 to clusters keeper for on-demand clusters ✨Computational backend: connect dv2 to clusters keeper for on-demand clusters (🗃️) Sep 11, 2023
@sanderegg sanderegg force-pushed the comp-backend/connect-dv2-to-clusters-keeper branch from 8e7fe2b to 5e3936b Compare September 11, 2023 16:59
@sanderegg sanderegg marked this pull request as ready for review September 11, 2023 17:13
@sanderegg sanderegg requested a review from mguidon September 11, 2023 17:15
services/docker-compose.yml Outdated Show resolved Hide resolved
@sanderegg sanderegg changed the title ✨Computational backend: connect dv2 to clusters keeper for on-demand clusters (🗃️) ✨Computational backend: connect dv2 to clusters keeper for on-demand clusters (🗃️ + ⚠️devops) Sep 12, 2023
@sanderegg sanderegg force-pushed the comp-backend/connect-dv2-to-clusters-keeper branch from 9705fd1 to e899656 Compare September 12, 2023 06:52
@YuryHrytsuk
Copy link
Contributor

Could you in later cases add new env in DevOps Checklist so that it is clear and easily seen 🙏

@sanderegg
Copy link
Member Author

Could you in later cases add new env in DevOps Checklist so that it is clear and easily seen 🙏

@YuryHrytsuk
done

@sanderegg sanderegg force-pushed the comp-backend/connect-dv2-to-clusters-keeper branch from 2f25e51 to 1af9f61 Compare September 12, 2023 09:36
@codeclimate
Copy link

codeclimate bot commented Sep 12, 2023

Code Climate has analyzed commit 1af9f61 and detected 0 issues on this pull request.

View more on Code Climate.

@sonarqubecloud
Copy link

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

Copy link
Member

@pcrespov pcrespov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉 Cool. Some minor comments/questions

@sanderegg sanderegg enabled auto-merge (squash) September 12, 2023 12:11
@sanderegg sanderegg disabled auto-merge September 12, 2023 13:50
@sanderegg sanderegg merged commit 6cd0ce8 into ITISFoundation:master Sep 12, 2023
@sanderegg sanderegg deleted the comp-backend/connect-dv2-to-clusters-keeper branch September 12, 2023 13:50
@matusdrobuliak66 matusdrobuliak66 mentioned this pull request Sep 22, 2023
50 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
a:clusters-keeper a:director-v2 issue related with the director-v2 service
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants