Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 no longer fail when removing a volume from a node that does no longer exist #7036

Merged
merged 10 commits into from
Jan 23, 2025

Conversation

GitHK
Copy link
Contributor

@GitHK GitHK commented Jan 14, 2025

What do these changes do?

As part of the service shutdown proceed, director-v2 will try to remove the used volumes of the service from the node via the agent. If the node is missing for whatever reason, the below error would be raised.

This fix allows the director-v2 to no longer fail when the node with the required running agent instance is missing.

TaskClientResultError: Task simcore_service_director_v2.api.routes.dynamic_scheduler._task_cleanup_service_docker_resources.a8f28f40-2b63-4c86-99b7-fc6e71241cc6 finished with exception: 'Could not find a remote method named: 
'docker_node_id_r77hxq9vws8tvo8wmyumug2y9-service_agent-swarm_stack_name_production-simcore.remove_volumes_without_backup_for_service'. Message from remote server was returned: IncomingMessage:{'app_id': None,
 'body_size': 73,
 'cluster_id': '',
 'consumer_tag': None,
 'content_encoding': None,
 'content_type': 'application/python-pickle',
 'correlation_id': '3d52bc9a-8805-43af-ae9b-b65ba374f012',
 'delivery_mode': <DeliveryMode.NOT_PERSISTENT: 1>,
 'delivery_tag': None,
 'exchange': '',
 'expiration': 3600.0,
 'headers': {'From': 'amq_617014d185f64dd48bac7fcb1e899202'},
 'message_id': 'c632f220bf3a44489ac00bca99977605',
 'priority': 5,
 'redelivered': None,
 'reply_to': 'amq_617014d185f64dd48bac7fcb1e899202',
 'routing_key': 'docker_node_id_r77hxq9vws8tvo8wmyumug2y9-service_agent-swarm_stack_name_production-simcore.remove_volumes_without_backup_for_service',
 'timestamp': datetime.datetime(2025, 1, 14, 8, 53, 58, tzinfo=datetime.timezone.utc),
 'type': 'call',
 'user_id': None}. '
  File "/home/scu/.venv/lib/python3.11/site-packages/servicelib/long_running_tasks/_task.py", line 422, in _progress_task
    return await handler(progress, **task_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/home/scu/.venv/lib/python3.11/site-packages/simcore_service_director_v2/api/routes/dynamic_scheduler.py", line 246, in _task_cleanup_service_docker_resources
    await dynamic_sidecars_scheduler.remove_service_sidecar_proxy_docker_networks_and_volumes(

  File "/home/scu/.venv/lib/python3.11/site-packages/simcore_service_director_v2/modules/dynamic_sidecar/scheduler/_task.py", line 63, in remove_service_sidecar_proxy_docker_networks_and_volumes
    return await self.scheduler.remove_service_sidecar_proxy_docker_networks_and_volumes(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/home/scu/.venv/lib/python3.11/site-packages/simcore_service_director_v2/modules/dynamic_sidecar/scheduler/_core/_scheduler.py", line 215, in remove_service_sidecar_proxy_docker_networks_and_volumes
    await service_remove_sidecar_proxy_docker_networks_and_volumes(

  File "/home/scu/.venv/lib/python3.11/site-packages/simcore_service_director_v2/modules/dynamic_sidecar/scheduler/_core/_events_utils.py", line 242, in service_remove_sidecar_proxy_docker_networks_and_volumes
    await remove_volumes_without_backup_for_service(

  File "/home/scu/.venv/lib/python3.11/site-packages/servicelib/logging_utils.py", line 305, in _async_wrapper
    result = await func_or_coro(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/home/scu/.venv/lib/python3.11/site-packages/servicelib/rabbitmq/rpc_interfaces/agent/volumes.py", line 24, in remove_volumes_without_backup_for_service
    result = await rabbitmq_rpc_client.request(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/home/scu/.venv/lib/python3.11/site-packages/servicelib/rabbitmq/_client_rpc.py", line 99, in request
    raise RemoteMethodNotRegisteredError(

Related issue/s

How to test

Dev-ops checklist

@GitHK GitHK self-assigned this Jan 14, 2025
@GitHK GitHK added a:director-v2 issue related with the director-v2 service t:maintenance Some planned maintenance work labels Jan 14, 2025
@GitHK GitHK added this to the Singularity milestone Jan 14, 2025
@GitHK GitHK marked this pull request as ready for review January 14, 2025 09:15
Copy link

codecov bot commented Jan 14, 2025

Codecov Report

Attention: Patch coverage is 33.33333% with 2 lines in your changes missing coverage. Please review.

Project coverage is 70.04%. Comparing base (e2aeff5) to head (0194592).
Report is 1 commits behind head on master.

❗ There is a different number of reports uploaded between BASE (e2aeff5) and HEAD (0194592). Click for more details.

HEAD has 28 uploads less than BASE
Flag BASE (e2aeff5) HEAD (0194592)
unittests 29 1
Additional details and impacted files
@@             Coverage Diff             @@
##           master    #7036       +/-   ##
===========================================
- Coverage   87.08%   70.04%   -17.05%     
===========================================
  Files        1597      665      -932     
  Lines       62455    32320    -30135     
  Branches     2050      262     -1788     
===========================================
- Hits        54389    22637    -31752     
- Misses       7729     9623     +1894     
+ Partials      337       60      -277     
Flag Coverage Δ
integrationtests 64.76% <33.33%> (-1.98%) ⬇️
unittests 84.58% <33.33%> (-0.81%) ⬇️
Components Coverage Δ
api ∅ <ø> (∅)
pkg_aws_library ∅ <ø> (∅)
pkg_dask_task_models_library ∅ <ø> (∅)
pkg_models_library ∅ <ø> (∅)
pkg_notifications_library ∅ <ø> (∅)
pkg_postgres_database ∅ <ø> (∅)
pkg_service_integration ∅ <ø> (∅)
pkg_service_library ∅ <ø> (∅)
pkg_settings_library ∅ <ø> (∅)
pkg_simcore_sdk 77.37% <ø> (-8.02%) ⬇️
agent ∅ <ø> (∅)
api_server ∅ <ø> (∅)
autoscaling ∅ <ø> (∅)
catalog ∅ <ø> (∅)
clusters_keeper ∅ <ø> (∅)
dask_sidecar ∅ <ø> (∅)
datcore_adapter ∅ <ø> (∅)
director ∅ <ø> (∅)
director_v2 91.27% <33.33%> (-0.03%) ⬇️
dynamic_scheduler ∅ <ø> (∅)
dynamic_sidecar 89.59% <ø> (-0.19%) ⬇️
efs_guardian ∅ <ø> (∅)
invitations ∅ <ø> (∅)
osparc_gateway_server ∅ <ø> (∅)
payments ∅ <ø> (∅)
resource_usage_tracker ∅ <ø> (∅)
storage ∅ <ø> (∅)
webclient ∅ <ø> (∅)
webserver 59.18% <ø> (-25.36%) ⬇️

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e2aeff5...0194592. Read the comment docs.

Copy link
Member

@sanderegg sanderegg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this not be a warning??

@GitHK GitHK enabled auto-merge (squash) January 15, 2025 12:56
Copy link
Contributor

@matusdrobuliak66 matusdrobuliak66 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@matusdrobuliak66
Copy link
Contributor

Q: @GitHK So sometimes when I run manually director-v2 CLI to stop the hanging sidecars and it fails that the node ID is not in the DB anymore, this will fix it?

@matusdrobuliak66 matusdrobuliak66 mentioned this pull request Jan 20, 2025
58 tasks
@GitHK
Copy link
Contributor Author

GitHK commented Jan 21, 2025

Q: @GitHK So sometimes when I run manually director-v2 CLI to stop the hanging sidecars and it fails that the node ID is not in the DB anymore, this will fix it?

@matusdrobuliak66 that is a different issue and unrelated to this one

@GitHK GitHK merged commit a4e0406 into ITISFoundation:master Jan 23, 2025
96 of 103 checks passed
@GitHK GitHK deleted the pr-osparc-fix-issue-removing-volume branch January 23, 2025 07:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
a:director-v2 issue related with the director-v2 service t:maintenance Some planned maintenance work
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants