Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proper Gunicorn bringup detection #2842

Merged
merged 2 commits into from
Jul 2, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ augur_export_env.sh
!docker.config.json
config.yml
reports.yml
*.pid

node_modules/
.idea/
Expand Down
9 changes: 8 additions & 1 deletion augur/api/routes/util.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,13 +5,20 @@
import sqlalchemy as s
import pandas as pd
import json
from flask import Response, current_app
from flask import Response, current_app, jsonify

from augur.application.db.lib import get_value
from augur.application.logs import AugurLogger

logger = AugurLogger("augur").get_logger()

@app.route("/api")
def get_api_version():
return jsonify({
"status": "up",
"route": AUGUR_API_VERSION
})

@app.route('/{}/repo-groups'.format(AUGUR_API_VERSION))
def get_all_repo_groups(): #TODO: make this name automatic - wrapper?
repoGroupsSQL = s.sql.text("""
Expand Down
29 changes: 24 additions & 5 deletions augur/application/cli/backend.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,10 @@
import logging
import psutil
import signal
from redis.exceptions import ConnectionError as RedisConnectionError
import uuid
import traceback
import requests
from redis.exceptions import ConnectionError as RedisConnectionError
from urllib.parse import urlparse

from augur.tasks.start_tasks import augur_collection_monitor, create_collection_status_records
Expand All @@ -38,14 +39,17 @@
@cli.command("start")
@click.option("--disable-collection", is_flag=True, default=False, help="Turns off data collection workers")
@click.option("--development", is_flag=True, default=False, help="Enable development mode, implies --disable-collection")
@click.option("--pidfile", default="main.pid", help="File to store the controlling process ID in")
@click.option('--port')
@test_connection
@test_db_connection
@with_database
@click.pass_context
def start(ctx, disable_collection, development, port):
def start(ctx, disable_collection, development, pidfile, port):
sgoggins marked this conversation as resolved.
Show resolved Hide resolved
sgoggins marked this conversation as resolved.
Show resolved Hide resolved
"""Start Augur's backend server."""

with open(pidfile, "w") as pidfile:
sgoggins marked this conversation as resolved.
Show resolved Hide resolved
sgoggins marked this conversation as resolved.
Show resolved Hide resolved
pidfile.write(str(os.getpid()))

try:
if os.environ.get('AUGUR_DOCKER_DEPLOY') != "1":
raise_open_file_limit(100000)
Expand Down Expand Up @@ -75,9 +79,25 @@
gunicorn_command = f"gunicorn -c {gunicorn_location} -b {host}:{port} augur.api.server:app --log-file gunicorn.log"
server = subprocess.Popen(gunicorn_command.split(" "))
sgoggins marked this conversation as resolved.
Show resolved Hide resolved

time.sleep(3)
logger.info("awaiting Gunicorn start")
while not server.poll():
try:
api_response = requests.get(f"http://{host}:{port}/api")
sgoggins marked this conversation as resolved.
Show resolved Hide resolved
except requests.exceptions.ConnectionError as e:
sgoggins marked this conversation as resolved.
Show resolved Hide resolved
time.sleep(0.5)
continue

if not api_response.ok:
logger.critical("Gunicorn failed to start or was not reachable. Exiting")
exit(247)
sgoggins marked this conversation as resolved.
Show resolved Hide resolved
break
else:
logger.critical("Gunicorn was shut down abnormally. Exiting")
exit(247)
sgoggins marked this conversation as resolved.
Show resolved Hide resolved

logger.info('Gunicorn webserver started...')
logger.info(f'Augur is running at: {"http" if development else "https"}://{host}:{port}')
logger.info(f"The API is available at '{api_response.json()['route']}'")

processes = start_celery_worker_processes(float(worker_vmem_cap), disable_collection)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pylint] reported by reviewdog 🐶
W0621: Redefining name 'processes' from outer scope (line 383) (redefined-outer-name)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pylint] reported by reviewdog 🐶
W0621: Redefining name 'processes' from outer scope (line 386) (redefined-outer-name)


Expand All @@ -91,7 +111,6 @@
celery_beat_process = subprocess.Popen(celery_command.split(" "))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pylint] reported by reviewdog 🐶
R1732: Consider using 'with' for resource-allocating operations (consider-using-with)


if not disable_collection:

with DatabaseSession(logger, engine=ctx.obj.engine) as session:

clean_collection_status(session)
Expand Down Expand Up @@ -154,7 +173,7 @@

frontend_worker = f"celery -A augur.tasks.init.celery_app.celery_app worker -l info --concurrency=1 -n frontend:{uuid.uuid4().hex}@%h -Q frontend"
max_process_estimate -= 1
process_list.append(subprocess.Popen(frontend_worker.split(" ")))

Check warning on line 176 in augur/application/cli/backend.py

View workflow job for this annotation

GitHub Actions / runner / pylint

[pylint] reported by reviewdog 🐶 R1732: Consider using 'with' for resource-allocating operations (consider-using-with) Raw Output: augur/application/cli/backend.py:176:24: R1732: Consider using 'with' for resource-allocating operations (consider-using-with)
sleep_time += 6

if not disable_collection:
Expand All @@ -162,21 +181,21 @@
#2 processes are always reserved as a baseline.
scheduling_worker = f"celery -A augur.tasks.init.celery_app.celery_app worker -l info --concurrency=2 -n scheduling:{uuid.uuid4().hex}@%h -Q scheduling"
max_process_estimate -= 2
process_list.append(subprocess.Popen(scheduling_worker.split(" ")))

Check warning on line 184 in augur/application/cli/backend.py

View workflow job for this annotation

GitHub Actions / runner / pylint

[pylint] reported by reviewdog 🐶 R1732: Consider using 'with' for resource-allocating operations (consider-using-with) Raw Output: augur/application/cli/backend.py:184:28: R1732: Consider using 'with' for resource-allocating operations (consider-using-with)
sleep_time += 6

#60% of estimate, Maximum value of 45 : Reduced because it can be lower
core_num_processes = determine_worker_processes(.40, 50)
logger.info(f"Starting core worker processes with concurrency={core_num_processes}")
core_worker = f"celery -A augur.tasks.init.celery_app.celery_app worker -l info --concurrency={core_num_processes} -n core:{uuid.uuid4().hex}@%h"
process_list.append(subprocess.Popen(core_worker.split(" ")))

Check warning on line 191 in augur/application/cli/backend.py

View workflow job for this annotation

GitHub Actions / runner / pylint

[pylint] reported by reviewdog 🐶 R1732: Consider using 'with' for resource-allocating operations (consider-using-with) Raw Output: augur/application/cli/backend.py:191:28: R1732: Consider using 'with' for resource-allocating operations (consider-using-with)
sleep_time += 6

#20% of estimate, Maximum value of 25
secondary_num_processes = determine_worker_processes(.39, 50)
logger.info(f"Starting secondary worker processes with concurrency={secondary_num_processes}")
secondary_worker = f"celery -A augur.tasks.init.celery_app.celery_app worker -l info --concurrency={secondary_num_processes} -n secondary:{uuid.uuid4().hex}@%h -Q secondary"
process_list.append(subprocess.Popen(secondary_worker.split(" ")))

Check warning on line 198 in augur/application/cli/backend.py

View workflow job for this annotation

GitHub Actions / runner / pylint

[pylint] reported by reviewdog 🐶 R1732: Consider using 'with' for resource-allocating operations (consider-using-with) Raw Output: augur/application/cli/backend.py:198:28: R1732: Consider using 'with' for resource-allocating operations (consider-using-with)
sleep_time += 6

#15% of estimate, Maximum value of 20
Expand All @@ -184,7 +203,7 @@
logger.info(f"Starting facade worker processes with concurrency={facade_num_processes}")
facade_worker = f"celery -A augur.tasks.init.celery_app.celery_app worker -l info --concurrency={facade_num_processes} -n facade:{uuid.uuid4().hex}@%h -Q facade"

process_list.append(subprocess.Popen(facade_worker.split(" ")))

Check warning on line 206 in augur/application/cli/backend.py

View workflow job for this annotation

GitHub Actions / runner / pylint

[pylint] reported by reviewdog 🐶 R1732: Consider using 'with' for resource-allocating operations (consider-using-with) Raw Output: augur/application/cli/backend.py:206:28: R1732: Consider using 'with' for resource-allocating operations (consider-using-with)
sleep_time += 6

time.sleep(sleep_time)
Expand All @@ -201,7 +220,7 @@
"""
Sends SIGTERM to all Augur server & worker processes
"""
logger = logging.getLogger("augur.cli")

Check warning on line 223 in augur/application/cli/backend.py

View workflow job for this annotation

GitHub Actions / runner / pylint

[pylint] reported by reviewdog 🐶 W0621: Redefining name 'logger' from outer scope (line 31) (redefined-outer-name) Raw Output: augur/application/cli/backend.py:223:4: W0621: Redefining name 'logger' from outer scope (line 31) (redefined-outer-name)

augur_stop(signal.SIGTERM, logger, ctx.obj.engine)

Expand All @@ -214,11 +233,11 @@
"""
Sends SIGKILL to all Augur server & worker processes
"""
logger = logging.getLogger("augur.cli")

Check warning on line 236 in augur/application/cli/backend.py

View workflow job for this annotation

GitHub Actions / runner / pylint

[pylint] reported by reviewdog 🐶 W0621: Redefining name 'logger' from outer scope (line 31) (redefined-outer-name) Raw Output: augur/application/cli/backend.py:236:4: W0621: Redefining name 'logger' from outer scope (line 31) (redefined-outer-name)
augur_stop(signal.SIGKILL, logger, ctx.obj.engine)


def augur_stop(signal, logger, engine):

Check warning on line 240 in augur/application/cli/backend.py

View workflow job for this annotation

GitHub Actions / runner / pylint

[pylint] reported by reviewdog 🐶 W0621: Redefining name 'signal' from outer scope (line 12) (redefined-outer-name) Raw Output: augur/application/cli/backend.py:240:15: W0621: Redefining name 'signal' from outer scope (line 12) (redefined-outer-name)

Check warning on line 240 in augur/application/cli/backend.py

View workflow job for this annotation

GitHub Actions / runner / pylint

[pylint] reported by reviewdog 🐶 W0621: Redefining name 'logger' from outer scope (line 31) (redefined-outer-name) Raw Output: augur/application/cli/backend.py:240:23: W0621: Redefining name 'logger' from outer scope (line 31) (redefined-outer-name)
"""
Stops augur with the given signal,
and cleans up collection if it was running
Expand Down Expand Up @@ -308,7 +327,7 @@
SET facade_status='Pending', facade_task_id=NULL
WHERE facade_status='Failed Clone' OR facade_status='Initializing';
"""))
#TODO: write timestamp for currently running repos.

Check warning on line 330 in augur/application/cli/backend.py

View workflow job for this annotation

GitHub Actions / runner / pylint

[pylint] reported by reviewdog 🐶 W0511: TODO: write timestamp for currently running repos. (fixme) Raw Output: augur/application/cli/backend.py:330:5: W0511: TODO: write timestamp for currently running repos. (fixme)

def assign_orphan_repos_to_default_user(session):
query = s.sql.text("""
Expand Down
Loading