Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unify Logger Config for Tasks #1709

Merged
merged 1 commit into from
Nov 26, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions backend/dataall/__init__.py
Original file line number Diff line number Diff line change
@@ -1,2 +1,13 @@
from . import core, version
from .base import utils, db, api
import logging
import os
import sys

logging.basicConfig(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly this statement here will affect all data.all backend services (lambda, ecs tasks etc).
Should we be removing ALL the per file logging configs? With a quip grep I see 206 files
grep -rail "logging.getLogger(" backend/ | wc -l

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we? I thought that the getlogger is just getting the logger, not configuring it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct it affects every file under /backend/dataall/ - I though this would simplify how we were formerly managing logs in each individual location

It also solves the issue we have where a task (like share manager task - at dataall/backend/dataall/modules/shares_base/tasks/share_manager_task.py ) was not recording the logs from SharingService (at dataall/backend/dataall/modules/shares_base/services/sharing_service.py) or the Processors and we were missing logs in CloudWatch

For Reference in dataall-sbx-backend-graphql the log group formatting before the change:

[INFO]	2024-11-19T15:28:43.058Z	a725615b-f975-4a66-8ccb-7e69830f18b9	Current maintenance window status - INACTIVE
[INFO]	2024-11-19T15:28:43.060Z	a725615b-f975-4a66-8ccb-7e69830f18b9	SSM Parameter session in central account

And after the code change (same formatting):

[INFO]	2024-11-20T22:34:02.450Z	2133b99e-3316-44de-9b9c-68f02f96611c	Current maintenance window status - INACTIVE
[INFO]	2024-11-20T22:34:02.452Z	2133b99e-3316-44de-9b9c-68f02f96611c	SSM Parameter session in central account

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To note the above differs from the format='[%(levelname)s] %(message)s', structure. I believe Lambda has its own default logging formatter that is taking precedence... this log record format of [%(levelname)s] %(message)s', is following in all of the ECS tasks for example:

Screenshot 2024-11-21 at 9 17 40 AM

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dlpzx we do configure at a lot of places
@noah-paige I am fine if you want to refactor and remove the configs from all the files keeping only the top one

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a full breakdown of where we call logger.setLevel(...) which I think is the main config we do in a number of places (was ~26 places before this PR now at 15 files):

  • Backend Files not under dataall/backend/dataall/ (5):
    • dataall/backend/api_handler.py
    • dataall/backend/aws_handler.py
    • dataall/backend/local_graphql_server.py
    • dataall/backend/search_handler.py
  • For rest of Backend Code:
    • dataall/backend/dataall/init.py
  • For CDK specific activity:
    • dataall/backend/dataall/base/cdkproxy/app.py (creates new logger named cdkapp process always with level INFO) -- chose to leave as is
  • Lambdas from data.all resource Custom Resource CDK
    • dataall/backend/dataall/modules/s3_datasets/cdk/assets/gluedatabasecustomresource/index.py
    • dataall/backend/dataall/modules/s3_datasets/cdk/assets/lakeformationdefaultsettings/index.py
  • Trigger Function Lambdas data.all pipeline
    • dataall/backend/deployment_triggers/dbmigrations_handler.py
    • dataall/backend/deployment_triggers/dbsnapshots_handler.py
    • dataall/backend/deployment_triggers/saveperms_handler.py
  • Custom Resource Lambdas data.all Deployment
    • dataall/deploy/custom_resources/cognito_config/cognito_urls.py
    • dataall/deploy/custom_resources/cognito_config/cognito_users.py
    • dataall/deploy/custom_resources/custom_authorizer/custom_authorizer_lambda.py
    • dataall/deploy/custom_resources/custom_authorizer/jwt_services.py

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dlpzx @petrkalos - the places where we configure logs is more intentional now and for the majority of backend it is from dataall/backend/dataall/__init__.py which I think is best

Some that are in different compute functions or parts of deployment will remain separate

level=os.environ.get('LOG_LEVEL', 'INFO'),
handlers=[logging.StreamHandler(sys.stdout)],
format='[%(levelname)s] %(message)s',
)
for name in ['boto3', 's3transfer', 'botocore', 'boto', 'urllib3']:
logging.getLogger(name).setLevel(logging.ERROR)
5 changes: 1 addition & 4 deletions backend/dataall/core/environment/tasks/env_stacks_updater.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,11 +12,8 @@
from dataall.base.db import get_engine
from dataall.base.utils import Parameter

root = logging.getLogger()
if not root.hasHandlers():
root.addHandler(logging.StreamHandler(sys.stdout))
log = logging.getLogger(__name__)
log.setLevel(os.environ.get('LOG_LEVEL', 'INFO'))


RETRIES = 30
SLEEP_TIME = 30
Expand Down
4 changes: 0 additions & 4 deletions backend/dataall/core/stacks/tasks/cdkproxy.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,7 @@
from dataall.base.cdkproxy.cdk_cli_wrapper import deploy_cdk_stack
from dataall.base.db import get_engine

root = logging.getLogger()
if not root.hasHandlers():
root.addHandler(logging.StreamHandler(sys.stdout))
logger = logging.getLogger(__name__)
logger.setLevel(os.environ.get('LOG_LEVEL', 'INFO'))


if __name__ == '__main__':
Expand Down
4 changes: 0 additions & 4 deletions backend/dataall/modules/catalog/tasks/catalog_indexer_task.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,11 +9,7 @@
from dataall.base.loader import load_modules, ImportMode
from dataall.base.utils.alarm_service import AlarmService

root = logging.getLogger()
if not root.hasHandlers():
root.addHandler(logging.StreamHandler(sys.stdout))
log = logging.getLogger(__name__)
log.setLevel(os.environ.get('LOG_LEVEL', 'INFO'))


class CatalogIndexerTask:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,11 +11,7 @@
from dataall.modules.omics.db.omics_repository import OmicsRepository


root = logging.getLogger()
if not root.hasHandlers():
root.addHandler(logging.StreamHandler(sys.stdout))
log = logging.getLogger(__name__)
log.setLevel(os.environ.get('LOG_LEVEL', 'INFO'))


def fetch_omics_workflows(engine):
Expand Down
4 changes: 0 additions & 4 deletions backend/dataall/modules/s3_datasets/tasks/tables_syncer.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,11 +16,7 @@
from dataall.modules.s3_datasets.indexers.dataset_indexer import DatasetIndexer
from dataall.modules.s3_datasets.services.dataset_alarm_service import DatasetAlarmService

root = logging.getLogger()
if not root.hasHandlers():
root.addHandler(logging.StreamHandler(sys.stdout))
log = logging.getLogger(__name__)
log.setLevel(os.environ.get('LOG_LEVEL', 'INFO'))


def sync_tables(engine):
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,11 +22,7 @@
from dataall.modules.shares_base.db.share_object_models import ShareObject
from dataall.modules.shares_base.services.share_notification_service import DataSharingNotificationType

root = logging.getLogger()
if not root.hasHandlers():
root.addHandler(logging.StreamHandler(sys.stdout))
log = logging.getLogger(__name__)
log.setLevel(os.environ.get('LOG_LEVEL', 'INFO'))

# TODO: review this task usage and remove if not needed

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,8 @@
import boto3
from botocore.exceptions import ClientError

root = logging.getLogger()
if not root.hasHandlers():
root.addHandler(logging.StreamHandler(sys.stdout))
log = logging.getLogger(__name__)
log.setLevel(os.environ.get('LOG_LEVEL', 'INFO'))


ENVNAME = os.getenv('envname', 'local')
region = os.getenv('AWS_REGION', 'eu-west-1')
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,11 +9,7 @@
from dataall.modules.datasets_base.db.dataset_repositories import DatasetBaseRepository


root = logging.getLogger()
if not root.hasHandlers():
root.addHandler(logging.StreamHandler(sys.stdout))
log = logging.getLogger(__name__)
log.setLevel(os.environ.get('LOG_LEVEL', 'INFO'))


def persistent_email_reminders(engine):
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,11 +12,7 @@
from dataall.modules.shares_base.services.shares_enums import ShareObjectActions
from dataall.modules.shares_base.services.sharing_service import SharingService

root = logging.getLogger()
if not root.hasHandlers():
root.addHandler(logging.StreamHandler(sys.stdout))
log = logging.getLogger(__name__)
log.setLevel(os.environ.get('LOG_LEVEL', 'INFO'))


def share_expiration_checker(engine):
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,12 +6,7 @@
from dataall.base.db import get_engine
from dataall.base.loader import load_modules, ImportMode

root = logging.getLogger()
if not root.hasHandlers():
root.addHandler(logging.StreamHandler(sys.stdout))
log = logging.getLogger(__name__)
log.setLevel(os.environ.get('LOG_LEVEL', 'INFO'))


if __name__ == '__main__':
try:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,11 +11,7 @@

from dataall.base.loader import load_modules, ImportMode

root = logging.getLogger()
if not root.hasHandlers():
root.addHandler(logging.StreamHandler(sys.stdout))
log = logging.getLogger(__name__)
log.setLevel(os.environ.get('LOG_LEVEL', 'INFO'))


class EcsBulkShareRepplyService:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,11 +11,7 @@

from dataall.base.loader import load_modules, ImportMode

root = logging.getLogger()
if not root.hasHandlers():
root.addHandler(logging.StreamHandler(sys.stdout))
log = logging.getLogger(__name__)
log.setLevel(os.environ.get('LOG_LEVEL', 'INFO'))


def verify_shares(engine):
Expand Down
Loading