Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨Run dynamic services via dynamic-sidecar (🏗️ OPS + CI action) #1887

Merged
merged 794 commits into from
Aug 4, 2021

Conversation

GitHK
Copy link
Contributor

@GitHK GitHK commented Oct 15, 2020

🏗️ CI: mark integration-test-director-v2-01 and integration-test-director-v2-02 as Required; [int] director-v2 is no longer used
🏗️ OPS: must be able to receive network traffic on UUID.services.DEPLOYMENT_DNS, a wildcard certificate for *.services.DEPLOYMENT_DNS is required

What do these changes do?


💣💥Development workflow change‼️

To properly display the content of a dynamic service, the frontend needs to access it via the following DNS UUID.services.osparc.io. As IPs do not support subdomains a DNS is always required during local development.

NOTE: when starting the the services for development you will be informed with the correct address to open in your browser.

However pleas have a look below for a comparison before the only and new structure of the address:

  • http://127.0.0.1:9081 old version
  • http://127.0.0.1.nip.io:9081 new version

For more information on nip.io have a look here.


👀 How the integration with the dynamic-sidecar works

Currently the system will support two types of dynamic services:

  • running in legacy mode (old services), handled director-v0 which should no longer be created from this moment onwards
  • running via the dynamic-sidecar the scope of this PR

The below labels are used to generate services ran via the dynamic-sidecar:

  • simcore.service.settings shared with the legacy services
  • simcore.service.paths-mapping required if the service is started via dynamic-sidecar and is what actually gets checked; instructs the dynamic-sidecar where to mount special volumes like the outputs and inputs folders
  • simcore.service.compose-spec if provided the dynamic-sidecar will use it instead of generating one
  • simcore.service.container-http-entrypoint required when simcore.service.compose-spec is present, instructs Treafik to send network traffic to the appropriate container inside the provided docker-compose spec

Platform services involved

  • webserver receives requests regarding the dynamic service and forwards them to the director-v2
  • director-v2 using simcore.service.paths-mapping label checks if target dynamic service in question is a legacy service, in which case the request is redirected to director-v0; otherwise the request is handled by the director-v2
  • director-v0 left virtually untouched, only extended with an endpoint to extract labels from docker images.
  • dynamic-sidecar is the interface between the oSPARC platform and the dynamic service

🔍 A closer look at running via dynamic-sidecar

Dynamic services started via dynamic-sidecar are handled by director-v2, which dose the following when a service is started:

  1. Gets the labels from the docker image, extracts and assembles the following docker items:
    • a service spec for the traefik-proxy (has access to the oSPARC platform)
    • a service spec for the dynamic-sidecar (has access to the oSPARC platform); all the resource reservations are summed up and assigned to this service
    • an attachable overlay network (no access to the oSPARC platform), has internet access
  2. The dynamic-sidecar is started.
  3. Once the dynamic-sidecar is in running state, it's assigned node id will be recovered and passed as a constraint to the traefik-proxy. Both services are guaranteed to be started on the same node. This is required to send traffic to containers started via the docker-compose up command by the dynamic-sidecar. Also please note tha the traefik-proxy is running in docker mode and not swarm mode for this reason.
  4. Finally the service is added to the monitor module (part of director-v2). More about the monitor:
    • is a subsystem of the director-v2 stated at boot time and runs in the background
    • upon boot, it will automatically pickup on running dynamic services started by director-v2 and monitor them
    • it enforces a list of MonitorEvents; each event has a condition upon which is triggered and an action (some code to run)
    • the MonitorEvents are used to easily add and extend functionality related to the lifecycle of the dynamic service
    • Every 5 seconds the monitor will check all currently monitored dynamic-sidecar are healthy and run apply_monitoring on each of them. If an error occurs the status of the service will be marked to FAILED and the fronted will show inside the logs what is wrong with the service.
  5. For this PR the following MonitorEvents are defined:
    • CreateServices is where:
      • the creation fo the service spec for the treafik-proxy and dynamic-sidecar takes place; also created the attachable overlay network
      • starts the dynamic-sidecar and waits for it to be running or raises an error
      • starts the treafik-proxy on the same node as the dynamic-sidecar and connects the previously created network to it
      • this event will no longer trigger once is completed
    • ServicesInspect fetches the status for each container started by the dynamic-sidecar via API calls to the dynamic-sidecar service; this information is used to compute the state of the service for the frontend
    • RunDockerComposeUp checks:
      • if the dynamic-sidecar is in a running state
      • if not in running state, will do nothing for now
      • if in a running state, will call the API on the dynamic-sidecar to invoke the docker-compose up command
      • marks was_compose_spec_submitted as True, and this step will be no longer executed
  6. The status of a service is a bit more complex to determine and is computed by the monitor:
    • while the dynamic-sidecar is not in a running state, the state form the dynamic-sidecar service is mapped to the state that the frontend expects
    • when the dynamic-sidecar is in a runningstate, the task status of all tasks created via the docker-compose up command are fetched and mapped to the state that the frontend expects; because multiple containers can be started, the statues are merged and the "lowest state" takes priority, further details can be found in parse_docker_status.py::extract_containers_minimim_statuses. Example: if container-A is in "running" and container-B is in "pulling", the "pulling" state will be reported.

🚧 How to transform a webapp into a dynamic service started via dynamic-sidecar

A service started via the dynamic-sidecar always requires a docker-compose spec. If one is not provided, it will be generated on the fly.
The below show the required simcore.service labels.

With automatic docker-compose spec generation

Labels for dy-static-file-server-dynamic-sidecar

simcore.service.settings: '[{"name": "resources", "type": "Resources", "value": {"mem_limit":17179869184,
  "cpu_limit": 1000000000}}, {"name": "ports", "type": "int", "value": 80}, {"name":
  "constraints", "type": "string", "value": ["node.platform.os == linux"]}]'
simcore.service.paths-mapping: '{"outputs_path": "/www/outputs", "inputs_path": "/www/inputs"}'
With provided docker-compose spec

Labels for dy-static-file-server-dynamic-sidecar-compose-spec

simcore.service.settings: '[{"name": "resources", "type": "Resources", "value": {"mem_limit":17179869184,
  "cpu_limit": 1000000000}}, {"name": "ports", "type": "int", "value": 80}, {"name":
  "constraints", "type": "string", "value": ["node.platform.os == linux"]}]'
simcore.service.paths-mapping: '{"outputs_path": "/www/outputs", "inputs_path": "/www/inputs"}'
simcore.service.compose-spec: '{"services":{
 "dy-static-file-server-dynamic-sidecar-compose-spec": {"environment":["MOCK_VALUE=TheMockedValue"],
    "image":"${SIMCORE_REGISTRY}/simcore/services/dynamic/dy-static-file-server-dynamic-sidecar-compose-spec:${SERVICE_VERSION}",
    "init":true},
  "some-side-service":{"command":"top","image":"busybox:latest","init":true}},
  "version":"3.7"}'
simcore.service.container-http-entrypoint: dy-static-file-server-dynamic-sidecar-compose-spec

Related issues

How to test

make-build-devel
make-up

Use version 1.0.5 of dy-static-file-server, dy-static-file-server-dynamic-sidecar and dy-static-file-server-dynamic-sidecar-compose to test if they run. Permissions must be manually granted.

Note: it takes longer to start the dynamic-sidecar in development mode than in production, packages are installed, sources are mounted and permissions are changed.

Checklist

  • Did you change any service's API? Then make sure to bundle document and upgrade version (make openapi-specs, git commit ... and then make version-*)
  • Unit tests for the changes exist
  • Runs in the swarm

@GitHK GitHK changed the title Adding service sidecar WIP:Adding service sidecar Oct 15, 2020
@GitHK GitHK changed the title WIP:Adding service sidecar WIP: Adding service sidecar Oct 15, 2020
@codecov
Copy link

codecov bot commented Oct 15, 2020

Codecov Report

Merging #1887 (822f74e) into master (3299ea6) will increase coverage by 6.0%.
The diff coverage is 75.9%.

Impacted file tree graph

@@           Coverage Diff            @@
##           master   #1887     +/-   ##
========================================
+ Coverage    70.0%   76.1%   +6.0%     
========================================
  Files         574     579      +5     
  Lines       21654   22147    +493     
  Branches     2082    2143     +61     
========================================
+ Hits        15168   16854   +1686     
+ Misses       6017    4713   -1304     
- Partials      469     580    +111     
Flag Coverage Δ
integrationtests 67.0% <67.7%> (?)
unittests 69.3% <53.6%> (-0.6%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...ary/src/models_library/settings/docker_registry.py 0.0% <0.0%> (ø)
...ary/src/models_library/settings/services_common.py 0.0% <0.0%> (ø)
...-v2/src/simcore_service_director_v2/core/errors.py 57.5% <0.0%> (ø)
...ce_webserver/resource_manager/garbage_collector.py 74.4% <34.3%> (+15.1%) ⬆️
...simcore_service_director_v2/modules/director_v0.py 75.2% <51.7%> (-12.8%) ⬇️
...erver/src/simcore_service_webserver/director_v2.py 75.0% <64.1%> (ø)
...e_service_director_v2/api/dependencies/__init__.py 66.6% <66.6%> (ø)
...rvice_webserver/projects/projects_node_handlers.py 77.9% <66.6%> (-1.6%) ⬇️
...v2/modules/dynamic_sidecar/docker_service_specs.py 67.5% <67.5%> (ø)
..._director_v2/modules/dynamic_sidecar/client_api.py 85.5% <77.5%> (-4.0%) ⬇️
... and 107 more

@GitHK GitHK self-assigned this Oct 15, 2020
Copy link
Member

@pcrespov pcrespov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to comply with osparc-simcore conventions, I highly recommend you to create this new service from one of our cookiecutters. You can always run it in a different folder and copy&paste the pieces you want. In any case, by just answering the questions , it would produce something with all little details correct

For aiohttp-based services we have this
https://github.com/ITISFoundation/cookiecutter-simcore-pyservice

For fastapi-base service (probably you need thisone) I started creating this (not completely finished but almost)
https://github.com/pcrespov/cookiecutter-simcore-py-fastapi

@GitHK

This comment has been minimized.

@GitHK
Copy link
Contributor Author

GitHK commented Dec 2, 2020

Regarding my above post, please comment, edit and contribute. I would like some feedback @pcrespov @sanderegg @mguidon

@GitHK GitHK requested review from sanderegg and mguidon December 2, 2020 17:11
@sanderegg

This comment has been minimized.

@mguidon
Copy link
Member

mguidon commented Dec 2, 2020

I just read through your documentation. I think you are trying to solve too many problems at once. I would suggest to ignore the scheduling and the "bring the service to the data" aspects for now and concentrate on replacing the way we create dynamic services. Take for instance the sim4life application which currently consists of two docker images and make sure this works. I agree with Sylvain that for a starter, let the director create the sidecar service in the swarm and then the sidecar can spawn the dynamic services using the forwarded docker socket.
In short, Definition of Done: sim4life runs completely isolated and can be reached via /retrieve /state etc

@mguidon
Copy link
Member

mguidon commented Dec 3, 2020

And a question regarding uuid.services.osparc.io : Is that not going to affect certificates for subdomains? Or are you creating them on the fly.

@GitHK
Copy link
Contributor Author

GitHK commented Dec 3, 2020

And a question regarding uuid.services.osparc.io : Is that not going to affect certificates for subdomains? Or are you creating them on the fly.

I've previously checked with @Surfict and issuing a wildcard certificate for *.service.osparc.io should do the trick. I'll ask him to confirm this for all our 5 deployments.

@GitHK GitHK requested a review from pcrespov December 4, 2020 06:20
Copy link
Member

@pcrespov pcrespov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will define the concept of a service-sidecar which is nothing more then a service running as an interface between the oSPARC platform and dynamic service itself.

OK

The dynamic service will not have access to the services inside the oSPARC platform, it will be placed on a separate network together with the service-sidecar and a reverse proxy.

OK

The reverse proxy's job is to make the service visible from the outside world on address similar to: uuid.services.osparc.io removing the need of supporting /x/uuid and all the issues its issues

  • Less intrusive: Avoids enforcing the dynamic services to serve in a specific base-url. Sometimes the service did not implement this as an option and we had to modify the service code!
  • Access Layer: Can be used to add an auth access layer which is currently missing. This somehow needs to be connected with the auth module which is currently in the web-server (it is asking to have it’s own service!)

Setup

  • I like more the pod pattern that SAN proposes. Essentially it performs a similar setup as you describe above but It seems to me more compact. Resources are reserved in one go and the ability to deploy like docker-compose for development/testing the dynamic-service in an equivalent context without having to deploy the entire osparc-simcore stack is very powerful. It would even allow integration tests during the submission workflow of new services into the platform.
  • How would this setup handle a dynamic-service with multiple containers like s4l that has two containers? Or replicas of the same in case some service has the ability to scale up?

Running with sidecars

  • This is the most important part since it will avoid modifying the code of dynamic services which IMO is priority 1!
  • SAN definition is more compact: sidecar is responsible of start/stop the dynamic service, push/pull the state of the dynamic service, push/pull the data from the inputs/outputs using a shared volume, monitor the service, get the logs and push them to rabbit (although I could even imagine another sidecar for the logs)


@GitHK GitHK changed the title WIP: Adding service sidecar WIP: Adding service sidecar (aka dynamic-sidecar functionality) Feb 23, 2021
@GitHK GitHK requested a review from pcrespov February 23, 2021 08:06
Copy link
Member

@pcrespov pcrespov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please all new feature should have certain degree of quality testing !

I will continue reviewing ...

Copy link
Member

@sanderegg sanderegg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some comments... but this is huge...5000 I think the record is broken ;) not possible to go with smaller PR?
Like adding the dynamic sidecar first, with tests, then connections inside the director-v2, etc??

.env-devel Outdated Show resolved Hide resolved
.github/workflows/ci-testing-deploy.yml Outdated Show resolved Hide resolved
.github/workflows/ci-testing-deploy.yml Outdated Show resolved Hide resolved
.github/workflows/ci-testing-deploy.yml Outdated Show resolved Hide resolved
Makefile Outdated Show resolved Hide resolved
@GitHK GitHK requested review from sanderegg and pcrespov May 3, 2021 18:13
@GitHK GitHK added a:director issue related with the director service a:director-v2 issue related with the director-v2 service a:dynamic-sidecar dynamic-sidecar service t:enhancement Improvement or request on an existing feature labels May 4, 2021
Copy link
Member

@sanderegg sanderegg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here some early comments

Copy link
Member

@sanderegg sanderegg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some more comments

@GitHK GitHK changed the title WIP: Adding service sidecar (aka dynamic-sidecar functionality) WIP: Connecting basic dynamic-sidecar (only boots services) May 4, 2021
@GitHK GitHK requested a review from sanderegg August 3, 2021 06:40
Copy link
Member

@odeimaiz odeimaiz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The js part looks good to me 💪

Makefile Outdated Show resolved Hide resolved
# If the same message appears in the log multiple times in a row (for the same
# service) something might be wrong with the service.
logger.warning(
"No container present for %s. Usually not an issue.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure that warning is helpful in any ways... you said it takes 50 secs to start the sidecar, does that mean we will see this message 10 times for each services?

Copy link
Contributor Author

@GitHK GitHK Aug 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetStatus.action() will start running after CreateSidecars.action() runs. Sometimes the containers are not present at that time, on the next iteration the containers are already there.
I wanted a warning message here to help with debugging (especially if something happens in production/staging)
Any suggestions for a better message are welcomed.

@GitHK GitHK requested a review from sanderegg August 3, 2021 09:23
Copy link
Member

@sanderegg sanderegg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! Let's go wild!

@sanderegg sanderegg merged commit f90894b into ITISFoundation:master Aug 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
a:director issue related with the director service a:director-v2 issue related with the director-v2 service a:dynamic-sidecar dynamic-sidecar service a:webserver issue related to the webserver service t:enhancement Improvement or request on an existing feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Proposal for the sidecar evolution Stage 1
5 participants