Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

♻️ Is2757/gc as a service implementation (⚠️⚠️ devops) #2834

Merged

Conversation

pcrespov
Copy link
Member

@pcrespov pcrespov commented Feb 14, 2022

What do these changes do?

Adds to the stack a webserver-garbage-collector service (GC) that will take the sole responsibility of running periodically "garbage collector" task.
This service uses the webserver image and starts the app with a custom config that only enables the plugins necessary for his task.

Highlights

  • .env-webserver-garbage-collector lists disabled envs. This listing is used at the end in the docker-compose specs to override default's web-server envs
  • GC service exposes v0/ route for healthcheck

⚠️⚠️ devops

  • Please check carefully changes in docker-compose specs files
  • adds .env-webserver-garbage-collector that adds the plugins disabled. Noticed how this listing is added at the end to override the default webserver config

NOTE for all: we should sync with ops and refactor the strategy with env files vs docker-compose's environment section. Currently it is all bit fuzzy

Related issue/s

How to test

  1. build osparc stack make build
  2. build locust stack cd tests/performance; make build
  3. deploy stack make up-prod
  4. open both the web-server and the web-server-GC logs in portainer.

Screen Shot 2022-03-14 at 22 36 25

5. In the latter, you should see appended logo as
INFO: STARTING UP <Application 0x7fe1b6eb28b0>...
 _    _        _
| |  | |      | |
| |  | |  ___ | |__   ___   ___  _ __ __   __ ___  _ __
| |/\| | / _ \| '_ \ / __| / _ \| '__|\ \ / // _ \| '__|
\  /\  /|  __/| |_) |\__ \|  __/| |    \ V /|  __/| |
 \/  \/  \___||_.__/ |___/ \___||_|     \_/  \___||_|     v0.7.0

with 
    (        (
    )\ )     )\
   (()/(   (((_)
    /(_))_ )\___
   (_)) __((/ __|
     | (_ || (__
      \___| \___|

  1. spawn locust make up target=platform_ping_test.py
  2. Open http://127.0.0.1:8089 and setup 100 users at 10 user/sec
  3. Monitor both GC log and performance. Opened in parallel a project with 7 jupyter-labs (see open/close notes)
    image
  4. Repeat the experiment with a master (e2p from master 3/14/2022, 7:07:46 PM - 3/14/2022, 7:08:46 PM)
    image

Checklist

  • Check with locus the effect on webserver performance
  • Openapi changes? make openapi-specs, git commit ... and then make version-*)
  • Database migration script? cd packages/postgres-database, make setup-commit, sc-pg review -m "my changes"
  • Unit tests for the changes exist
    • add check on envs for GC vs no-GC (ideas?)
  • Runs in the swarm
  • Documentation reflects the changes
  • New module? Add your github username to .github/CODEOWNERS. FIXED

@codecov
Copy link

codecov bot commented Feb 14, 2022

Codecov Report

Merging #2834 (7b9955d) into master (be6034c) will increase coverage by 0.0%.
The diff coverage is 85.7%.

Impacted file tree graph

@@          Coverage Diff           @@
##           master   #2834   +/-   ##
======================================
  Coverage    79.3%   79.4%           
======================================
  Files         673     673           
  Lines       27618   27624    +6     
  Branches     3216    3218    +2     
======================================
+ Hits        21911   21934   +23     
+ Misses       4958    4940   -18     
- Partials      749     750    +1     
Flag Coverage Δ
integrationtests 65.6% <64.2%> (-0.1%) ⬇️
unittests 74.9% <78.5%> (+0.1%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...imcore_service_webserver/garbage_collector_core.py 69.1% <ø> (+4.2%) ⬆️
...erver/src/simcore_service_webserver/director_v2.py 91.3% <77.7%> (-8.7%) ⬇️
.../web/server/src/simcore_service_webserver/_meta.py 100.0% <100.0%> (ø)
...erver/src/simcore_service_webserver/application.py 98.7% <100.0%> (+<0.1%) ⬆️
...tor_v2/modules/dynamic_sidecar/scheduler/events.py 93.7% <0.0%> (-1.2%) ⬇️
.../director/src/simcore_service_director/producer.py 62.1% <0.0%> (-0.7%) ⬇️
...ector_v2/modules/dynamic_sidecar/scheduler/task.py 86.0% <0.0%> (-0.5%) ⬇️
.../src/simcore_service_webserver/director_v2_core.py 67.9% <0.0%> (+1.1%) ⬆️
...c/simcore_service_catalog/core/background_tasks.py 68.4% <0.0%> (+2.1%) ⬆️
...log/src/simcore_service_catalog/api/routes/dags.py 65.1% <0.0%> (+2.3%) ⬆️
... and 4 more

@pcrespov pcrespov self-assigned this Feb 17, 2022
@pcrespov pcrespov added this to the R.Schumann milestone Feb 17, 2022
@pcrespov pcrespov force-pushed the is2757/gc-as-a-service-implementation branch from c7a10bc to e6459ee Compare February 21, 2022 17:37
@pcrespov pcrespov force-pushed the is2757/gc-as-a-service-implementation branch from e6459ee to 8faf620 Compare March 2, 2022 21:06
@pcrespov pcrespov changed the title ✨ Is2757/gc as a service implementation ✨ Is2757/gc as a service implementation (⚠️⚠️ devops) Mar 2, 2022
@pcrespov pcrespov removed this from the R.Schumann milestone Mar 14, 2022
@pcrespov pcrespov force-pushed the is2757/gc-as-a-service-implementation branch 2 times, most recently from ca29a29 to c680e31 Compare March 14, 2022 19:21
@pcrespov pcrespov added t:maintenance Some planned maintenance work changelog:♻️refactor labels Mar 14, 2022
@pcrespov pcrespov changed the title ✨ Is2757/gc as a service implementation (⚠️⚠️ devops) ♻️ Is2757/gc as a service implementation (⚠️⚠️ devops) Mar 14, 2022
@pcrespov pcrespov marked this pull request as ready for review March 14, 2022 21:54
Copy link
Member

@sanderegg sanderegg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! looking forward to seeing the results

WEBSERVER_GARBAGE_COLLECTOR: '{"GARBAGE_COLLECTOR_INTERVAL_S": 30}'
env_file:
- ../.env
- ../.env-webserver-garbage-collector
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this one overrides what is in .env?
So I guess we should review these contents soon.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that what I suggest in the note above as well ... but I want to do it carefully and taking into account the devops repo

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We definately need to clean up the env-var handling. They come in from all over the place, and it is hard to detangle since some env-vars (IPs etc.) are overwritten by makefiles, overwritten from the simcore repo, etc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel it would be clearer to just put all the variable without using the *webserver-environment (so, having two times the same thing) and set theme here to null, not in the env-webserver-garbage-collector as it adds another layer in top of 3 another layers.. It would be repeating but easier to understand

interval: 30s
timeout: 120s
retries: 3
start_period: 30s
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should probably also review these, in prod the start period is pretty long and usually the services are up in less than 10s

Copy link
Contributor

@GitHK GitHK left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking care of it.

constraints:
- node.platform.os == linux
labels:
- io.simcore.zone=${TRAEFIK_SIMCORE_ZONE}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this required, since it does not require and endpoint?

Copy link
Member Author

@pcrespov pcrespov Mar 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, traffik should not. I removed it

Nonetheless, notice that the rest plugin is active because this service needs one of the routes /v0/ for the swarm healthcheck.

@mrnicegyu11 @Surfict i guess you also need to make sure in ops that this service has correct deploy rules?

@@ -130,6 +130,9 @@ def _default_app_config_for_integration_tests(
test_environ["WEBSERVER_LOGLEVEL"] = "WARNING"
test_environ["OSPARC_SIMCORE_REPO_ROOTDIR"] = f"{osparc_simcore_root_dir}"

# NOTE: removed from webserver_environ after split with GC
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this note is not too clear for me. did you not split the GC already?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, not very clear.

@GitHK The basic idea is that since we have this new plugin functionality, we will need to split the integration tests in the future and address every different "configured service" separately

SEE details in #2896

@KZzizzle
Copy link
Contributor

nice Pedro thanks for doing this!

@pcrespov pcrespov force-pushed the is2757/gc-as-a-service-implementation branch from c89d7e1 to 20ebf05 Compare March 15, 2022 16:09
@pcrespov pcrespov force-pushed the is2757/gc-as-a-service-implementation branch from 75e1b77 to a87813f Compare March 15, 2022 18:35
@pcrespov pcrespov added this to the E.Shackleton milestone Mar 15, 2022
@mrnicegyu11
Copy link
Member

I think the io.simcore.zone=${TRAEFIK_SIMCORE_ZONE} label might be required, as far as I know it is used for assigning services to the simcore or ops traefik instance

Copy link
Member

@mrnicegyu11 mrnicegyu11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for this

WEBSERVER_CATALOG=null
WEBSERVER_COMPUTATION=null
WEBSERVER_DIAGNOSTICS=null
#WEBSERVER_DIRECTOR_V2 from .env
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does this mean?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I mapped all the plugin variables and explicitly set to null those that need to be disconnected. For those enabled, i just commented them. So to some extent, this env-file is only to disable plugins

WEBSERVER_GARBAGE_COLLECTOR: '{"GARBAGE_COLLECTOR_INTERVAL_S": 30}'
env_file:
- ../.env
- ../.env-webserver-garbage-collector
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We definately need to clean up the env-var handling. They come in from all over the place, and it is hard to detangle since some env-vars (IPs etc.) are overwritten by makefiles, overwritten from the simcore repo, etc.

# for the moment using web-server as an all-in-one service.
# TODO: create integration tests using different configs
# SEE https://github.com/ITISFoundation/osparc-simcore/issues/2896
test_environ["WEBSERVER_GARBAGE_COLLECTION_INTERVAL_SECONDS"] = "30"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe out it to an uneven or prime number so it doesnt colide with any other (freshping etc.) intervals

Copy link
Member Author

@pcrespov pcrespov Mar 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

haha that is a nice touch :-) Not sure whether it is accurate wha you claim ... since this interval is relative to the time it takes to "garbage collect" ... but I will do so nonetheless

WEBSERVER_EMAIL=null
WEBSERVER_EXPORTER=null
WEBSERVER_FRONTEND=null
#WEBSERVER_GARBAGE_COLLECTOR explicit in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This env-file configures a webserver instance to only act as a garbage collector, correct?
It is hard to infer that from this variable being commented out (probably "true" is the default?, what does "explicit in" mean?) so maybe set it explicitly to true. Then every reader knows what is going on...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, let me clarify all this for you offline

deploy:
placement:
constraints:
- node.platform.os == linux
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this service run on the Simcore node ? (in the logic of isolating Simcore services from user nodes). If yes we will need to implement that in the devops part

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel it would be clearer to just put all the variable without using the *webserver-environment (so, having two times the same thing) and set theme here to null, not in the env-webserver-garbage-collector as it adds another layer in top of 3 another layers.. It would be repeating but easier to understand

I am not sure i understand but i am open to suggestions. Perhaps sometime offline?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this service run on the Simcore node ? (in the logic of isolating Simcore services from user nodes). If yes we will need to implement that in the devops part

yes, thisi s part of the stack. it has to work in simcore-node. These placment constratins were there before ...

WEBSERVER_GARBAGE_COLLECTOR: '{"GARBAGE_COLLECTOR_INTERVAL_S": 30}'
env_file:
- ../.env
- ../.env-webserver-garbage-collector
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel it would be clearer to just put all the variable without using the *webserver-environment (so, having two times the same thing) and set theme here to null, not in the env-webserver-garbage-collector as it adds another layer in top of 3 another layers.. It would be repeating but easier to understand

@pcrespov pcrespov merged commit 1dea3f5 into ITISFoundation:master Mar 17, 2022
@pcrespov pcrespov deleted the is2757/gc-as-a-service-implementation branch March 17, 2022 13:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
t:maintenance Some planned maintenance work
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants