Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(clp-package): Add support for loading credentials from boto3 session. #681

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 11 additions & 1 deletion components/clp-py-utils/clp_py_utils/s3_utils.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,10 @@
import re
from pathlib import Path
from typing import List, Tuple
from typing import List, Optional, Tuple

import boto3
from botocore.config import Config
from botocore.credentials import ReadOnlyCredentials
from job_orchestration.scheduler.job_config import S3InputConfig

from clp_py_utils.clp_config import S3Config
Expand Down Expand Up @@ -61,6 +62,15 @@ def generate_s3_virtual_hosted_style_url(
return f"https://{bucket_name}.s3.{region_code}.{AWS_ENDPOINT}/{object_key}"


def s3_get_frozen_credentials() -> Optional[ReadOnlyCredentials]:
session = boto3.Session()
credentials = session.get_credentials()
if credentials is None:
return None
frozen_credentials = credentials.get_frozen_credentials()
return frozen_credentials
Comment on lines +65 to +71
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Enhance error handling and logging.

The function should handle potential boto3 exceptions and log appropriate messages when credentials are not found.

Apply this diff to improve error handling:

 def s3_get_frozen_credentials() -> Optional[ReadOnlyCredentials]:
+    try:
         session = boto3.Session()
         credentials = session.get_credentials()
         if credentials is None:
+            logger.warning("No AWS credentials found in the session")
             return None
         frozen_credentials = credentials.get_frozen_credentials()
         return frozen_credentials
+    except Exception as ex:
+        logger.error(f"Failed to get AWS credentials: {ex}")
+        return None

Committable suggestion skipped: line range outside the PR's diff.



def s3_get_object_metadata(s3_input_config: S3InputConfig) -> List[FileMetadata]:
"""
Gets the metadata of all objects under the <bucket>/<key_prefix> specified by s3_input_config.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
from clp_py_utils.s3_utils import generate_s3_virtual_hosted_style_url, s3_put
from clp_py_utils.sql_adapter import SQL_Adapter
from job_orchestration.executor.compress.celery import app
from job_orchestration.executor.utils import load_session_credentials
from job_orchestration.scheduler.constants import CompressionTaskStatus
from job_orchestration.scheduler.job_config import (
ClpIoConfig,
Expand Down Expand Up @@ -162,7 +163,7 @@ def make_clp_s_command_and_env(
clp_config: ClpIoConfig,
db_config_file_path: pathlib.Path,
use_single_file_archive: bool,
) -> Tuple[List[str], Optional[Dict[str, str]]]:
) -> Tuple[Optional[List[str]], Optional[Dict[str, str]]]:
"""
Generates the command and environment variables for a clp_s compression job.
:param clp_home:
Expand All @@ -185,10 +186,17 @@ def make_clp_s_command_and_env(
# fmt: on

if InputType.S3 == clp_config.input.type:
aws_access_key_id = clp_config.input.aws_access_key_id
aws_secret_access_key = clp_config.input.aws_secret_access_key
if aws_access_key_id is None or aws_secret_access_key is None:
aws_access_key_id, aws_access_key_id = load_session_credentials(logger)
if aws_access_key_id is None or aws_secret_access_key is None:
return None, None

Comment on lines +189 to +195
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix the credential variable assignment bug.

There is a bug in the variable assignment where both variables are assigned the access key ID.

Apply this diff to fix the variable assignment:

-            aws_access_key_id, aws_access_key_id = load_session_credentials(logger)
+            aws_access_key_id, aws_secret_access_key = load_session_credentials(logger)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
aws_access_key_id = clp_config.input.aws_access_key_id
aws_secret_access_key = clp_config.input.aws_secret_access_key
if aws_access_key_id is None or aws_secret_access_key is None:
aws_access_key_id, aws_access_key_id = load_session_credentials(logger)
if aws_access_key_id is None or aws_secret_access_key is None:
return None, None
aws_access_key_id = clp_config.input.aws_access_key_id
aws_secret_access_key = clp_config.input.aws_secret_access_key
if aws_access_key_id is None or aws_secret_access_key is None:
aws_access_key_id, aws_secret_access_key = load_session_credentials(logger)
if aws_access_key_id is None or aws_secret_access_key is None:
return None, None

compression_env_vars = {
**os.environ,
"AWS_ACCESS_KEY_ID": clp_config.input.aws_access_key_id,
"AWS_SECRET_ACCESS_KEY": clp_config.input.aws_secret_access_key,
"AWS_ACCESS_KEY_ID": aws_access_key_id,
"AWS_SECRET_ACCESS_KEY": aws_secret_access_key,
}
compression_cmd.append("--auth")
compression_cmd.append("s3")
Expand Down Expand Up @@ -276,6 +284,11 @@ def run_clp(
logger.error(f"Unsupported storage engine {clp_storage_engine}")
return False, {"error_message": f"Unsupported storage engine {clp_storage_engine}"}

if compression_cmd is None:
error_msg = "Error creating compression command"
logger.error(error_msg)
return False, {"error_message": error_msg}

# Generate list of logs to compress
input_type = clp_config.input.type
logs_list_path = data_dir / f"{instance_id_str}-log-paths.txt"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,14 +8,18 @@
from celery.utils.log import get_task_logger
from clp_py_utils.clp_config import Database, S3Config, StorageEngine, StorageType, WorkerConfig
from clp_py_utils.clp_logging import set_logging_level
from clp_py_utils.s3_utils import generate_s3_virtual_hosted_style_url, s3_put
from clp_py_utils.s3_utils import (
generate_s3_virtual_hosted_style_url,
get_frozen_credentials,
s3_put,
)
from clp_py_utils.sql_adapter import SQL_Adapter
from job_orchestration.executor.query.celery import app
from job_orchestration.executor.query.utils import (
report_task_failure,
run_query_task,
)
from job_orchestration.executor.utils import load_worker_config
from job_orchestration.executor.utils import load_session_credentials, load_worker_config
from job_orchestration.scheduler.job_config import ExtractIrJobConfig, ExtractJsonJobConfig
from job_orchestration.scheduler.scheduler_data import QueryTaskStatus

Expand Down Expand Up @@ -103,8 +107,10 @@ def _make_clp_s_command_and_env_vars(
# fmt: on
aws_access_key_id, aws_secret_access_key = s3_config.get_credentials()
if aws_access_key_id is None or aws_secret_access_key is None:
logger.error("Missing credentials for accessing archives on S3")
return None, None
aws_access_key_id, aws_access_key_id = load_session_credentials(logger)
if aws_access_key_id is None or aws_secret_access_key is None:
return None, None

Comment on lines +110 to +113
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix the credential variable assignment bug.

There is a bug in the variable assignment where both variables are assigned the access key ID.

Apply this diff to fix the variable assignment:

-            aws_access_key_id, aws_access_key_id = load_session_credentials(logger)
+            aws_access_key_id, aws_secret_access_key = load_session_credentials(logger)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
aws_access_key_id, aws_access_key_id = load_session_credentials(logger)
if aws_access_key_id is None or aws_secret_access_key is None:
return None, None
aws_access_key_id, aws_secret_access_key = load_session_credentials(logger)
if aws_access_key_id is None or aws_secret_access_key is None:
return None, None

env_vars = {
**os.environ,
"AWS_ACCESS_KEY_ID": aws_access_key_id,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
report_task_failure,
run_query_task,
)
from job_orchestration.executor.utils import load_worker_config
from job_orchestration.executor.utils import load_session_credentials, load_worker_config
from job_orchestration.scheduler.job_config import SearchJobConfig

# Setup logging
Expand Down Expand Up @@ -73,8 +73,10 @@ def _make_core_clp_s_command_and_env_vars(
# fmt: on
aws_access_key_id, aws_secret_access_key = s3_config.get_credentials()
if aws_access_key_id is None or aws_secret_access_key is None:
logger.error("Missing credentials for accessing archives on S3")
return None, None
aws_access_key_id, aws_access_key_id = load_session_credentials(logger)
if aws_access_key_id is None or aws_secret_access_key is None:
return None, None

Comment on lines +76 to +79
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix the credential variable assignment bug.

There is a bug in the variable assignment where both variables are assigned the access key ID.

Apply this diff to fix the variable assignment:

-            aws_access_key_id, aws_access_key_id = load_session_credentials(logger)
+            aws_access_key_id, aws_secret_access_key = load_session_credentials(logger)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
aws_access_key_id, aws_access_key_id = load_session_credentials(logger)
if aws_access_key_id is None or aws_secret_access_key is None:
return None, None
aws_access_key_id, aws_secret_access_key = load_session_credentials(logger)
if aws_access_key_id is None or aws_secret_access_key is None:
return None, None

env_vars = {
**os.environ,
"AWS_ACCESS_KEY_ID": aws_access_key_id,
Expand Down
14 changes: 13 additions & 1 deletion components/job-orchestration/job_orchestration/executor/utils.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,10 @@
from logging import Logger
from pathlib import Path
from typing import Optional
from typing import Optional, Tuple

from clp_py_utils.clp_config import WorkerConfig
from clp_py_utils.core import read_yaml_config_file
from clp_py_utils.s3_utils import s3_get_frozen_credentials


def load_worker_config(
Expand All @@ -21,3 +22,14 @@ def load_worker_config(
except Exception:
logger.exception("Failed to load worker config")
return None


def load_session_credentials(logger: Logger) -> Tuple[Optional[str], Optional[str]]:
s3_frozen_credentials = s3_get_frozen_credentials()
if s3_frozen_credentials is None:
logger.error("Failed to get s3 credentials from local session")
return None, None
if s3_frozen_credentials.token is not None:
logger.error("Not supporting session token at the moment")
return None, None
return s3_frozen_credentials.access_key, s3_frozen_credentials.secret_key
Comment on lines +27 to +35
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add comprehensive docstring with type hints.

The function lacks documentation explaining its purpose, parameters, return values, and possible error conditions.

Add a docstring following this pattern:

 def load_session_credentials(logger: Logger) -> Tuple[Optional[str], Optional[str]]:
+    """
+    Load AWS credentials from the current session.
+    
+    Args:
+        logger: Logger instance for error reporting
+    
+    Returns:
+        Tuple[Optional[str], Optional[str]]: A tuple containing (access_key, secret_key).
+        Both values will be None if credentials cannot be loaded or contain unsupported features.
+    """
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def load_session_credentials(logger: Logger) -> Tuple[Optional[str], Optional[str]]:
s3_frozen_credentials = s3_get_frozen_credentials()
if s3_frozen_credentials is None:
logger.error("Failed to get s3 credentials from local session")
return None, None
if s3_frozen_credentials.token is not None:
logger.error("Not supporting session token at the moment")
return None, None
return s3_frozen_credentials.access_key, s3_frozen_credentials.secret_key
def load_session_credentials(logger: Logger) -> Tuple[Optional[str], Optional[str]]:
"""
Load AWS credentials from the current session.
Args:
logger: Logger instance for error reporting
Returns:
Tuple[Optional[str], Optional[str]]: A tuple containing (access_key, secret_key).
Both values will be None if credentials cannot be loaded or contain unsupported features.
"""
s3_frozen_credentials = s3_get_frozen_credentials()
if s3_frozen_credentials is None:
logger.error("Failed to get s3 credentials from local session")
return None, None
if s3_frozen_credentials.token is not None:
logger.error("Not supporting session token at the moment")
return None, None
return s3_frozen_credentials.access_key, s3_frozen_credentials.secret_key

Loading