Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement resource_dict for file executor #456

Merged
merged 3 commits into from
Oct 28, 2024

Conversation

jan-janssen
Copy link
Member

@jan-janssen jan-janssen commented Oct 28, 2024

Summary by CodeRabbit

Release Notes

  • New Features

    • Enhanced resource management in the Executor and FileExecutor classes, allowing users to specify resources using a structured resource_dict.
    • Improved error handling for unsupported configurations in the Executor class.
  • Bug Fixes

    • Updated handling of resource parameters to ensure correct execution of tasks with new naming conventions for cores and GPUs.
  • Tests

    • Updated test cases to reflect changes in resource parameterization, ensuring compatibility with the new resource_dict format.

Copy link
Contributor

coderabbitai bot commented Oct 28, 2024

Warning

Rate limit exceeded

@jan-janssen has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 3 minutes and 53 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Files that changed from the base of the PR and between 30d0c81 and 2507bf6.

Walkthrough

The pull request introduces significant changes to the resource management approach within the executor library. The Executor, FileExecutor, and related functions now utilize a unified resource_dict parameter for specifying resources, replacing individual parameters like cores_per_worker and cwd. This change is reflected across various files, including constructors and method signatures, ensuring consistency in how resources are defined and utilized. The modifications aim to streamline the configuration process for parallel processing tasks while maintaining existing functionality.

Changes

File Change Summary
executorlib/__init__.py Updated Executor class constructor and __new__ method to reflect new keys in resource_dict.
executorlib/cache/executor.py Modified FileExecutor constructor to use resource_dict instead of individual parameters.
executorlib/cache/shared.py Changed execute_tasks_h5 function to accept resource_dict and updated internal logic accordingly.
executorlib/interactive/executor.py Altered create_executor function and ExecutorWithDependencies class to accommodate new resource_dict keys.
tests/test_cache_executor_mpi.py Updated TestCacheExecutorMPI to instantiate FileExecutor with resource_dict.
tests/test_cache_executor_pysqa_flux.py Modified TestCacheExecutorPysqa to use resource_dict for FileExecutor instantiation.
tests/test_cache_executor_serial.py Adjusted TestCacheExecutorSerial to standardize resource passing using resource_dict.
tests/test_cache_hdf.py Renamed variable skip_h5io_test to skip_h5py_test and updated related logic.

Possibly related PRs

  • Remove conda support in executorlib #418: The removal of conda support in the main PR aligns with the changes in the FileExecutor and other classes that have shifted to using a resource_dict, indicating a broader refactor in how resources are managed.
  • Automatically set hostname_localhost #433: The modifications to the hostname_localhost parameter in the Executor class in the main PR are relevant to the changes in the ExecutorWithDependencies class, which also updated its handling of parameters to include a resource_dict.
  • Cache: Use explicit arguments for serialize_funct_h5() #448: The explicit use of a resource_dict in the execute_tasks_h5 function in the main PR is relevant to the changes made in the FileExecutor and other classes that now incorporate this parameter for resource management.
  • Cache: Implement resource dict for submit() #451: The implementation of a resource_dict for the submit method in the main PR is directly related to the changes in the FileExecutor, which also adopted this approach for handling resource specifications.
  • Use resource dict for internal communication #455: The updates to the Executor class to use a resource_dict in the main PR are relevant to the changes in the ExecutorWithDependencies class, which also transitioned to this new parameter structure for resource management.

🐰 "In the land of code, where rabbits hop,
A new way to manage, we just can't stop!
With resource_dict now in play,
Our tasks will run smoother every day.
So gather your cores, and let’s not delay,
For the future of coding is here to stay!" 🐇


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Outside diff range and nitpick comments (6)
executorlib/cache/executor.py (1)

39-47: Consider adding type validation for resource_dict values.

The resource dictionary initialization logic is well-implemented, preserving user-provided values while ensuring defaults. However, consider adding type validation for the dictionary values to catch configuration errors early.

Consider adding type validation:

        if resource_dict is None:
            resource_dict = {}
+       # Validate resource types
+       if "cores" in resource_dict and not isinstance(resource_dict["cores"], int):
+           raise TypeError("cores must be an integer")
+       if "cwd" in resource_dict and not (isinstance(resource_dict["cwd"], str) or resource_dict["cwd"] is None):
+           raise TypeError("cwd must be a string or None")
        resource_dict.update(
            {k: v for k, v in default_resource_dict.items() if k not in resource_dict}
        )
tests/test_cache_executor_serial.py (2)

75-75: Consider simplifying the resource specification.

The test case doesn't appear to require specific resource constraints, yet it sets both cores and cwd. Consider whether this level of specification is necessary for this test case.

-                "resource_dict": {"cores": 1, "cwd": None},
+                "resource_dict": {},

Line range hint 33-33: Consider adding tests for resource_dict edge cases.

The test suite would benefit from additional test cases that verify:

  1. Invalid resource_dict values
  2. Resource inheritance between dependent tasks
  3. Resource override scenarios

Would you like me to provide example test cases for these scenarios?

executorlib/interactive/executor.py (2)

178-180: Documentation needs clarification and consistency fixes.

The resource_dict documentation has the following issues:

  1. The description for gpus_per_core still references "per worker" which contradicts the new parameter name
  2. The rationale for changing from worker-based to core-based resource allocation isn't explained

Consider updating the documentation:

-                              - cores (int): number of MPI cores to be used for each function call
-                              - gpus_per_core (int): number of GPUs per worker - defaults to 0
+                              - cores (int): number of MPI cores to be used for each function call
+                              - gpus_per_core (int): number of GPUs allocated per core - defaults to 0

Line range hint 209-209: Add validation for required resource_dict keys.

The direct access to resource_dict["cores"] could raise a KeyError if the key is missing. Consider adding validation for required keys.

+    required_keys = ["cores", "threads_per_core", "gpus_per_core", "cwd", "openmpi_oversubscribe", "slurm_cmd_args"]
+    missing_keys = [key for key in required_keys if key not in resource_dict]
+    if missing_keys:
+        raise ValueError(f"Missing required keys in resource_dict: {missing_keys}")
     cores_per_worker = resource_dict["cores"]
executorlib/__init__.py (1)

Line range hint 176-183: LGTM with a minor suggestion for improvement.

The implementation correctly handles the new resource_dict keys and maintains proper default values. Consider extracting the default_resource_dict to a module-level constant to make it reusable and easier to maintain.

+# At module level
+DEFAULT_RESOURCE_DICT = {
+    "cores": 1,
+    "threads_per_core": 1,
+    "gpus_per_core": 0,
+    "cwd": None,
+    "openmpi_oversubscribe": False,
+    "slurm_cmd_args": [],
+}

 def __new__(
     cls,
     ...
 ):
-    default_resource_dict = {
-        "cores": 1,
-        "threads_per_core": 1,
-        "gpus_per_core": 0,
-        "cwd": None,
-        "openmpi_oversubscribe": False,
-        "slurm_cmd_args": [],
-    }
     if resource_dict is None:
         resource_dict = {}
     resource_dict.update(
-        {k: v for k, v in default_resource_dict.items() if k not in resource_dict}
+        {k: v for k, v in DEFAULT_RESOURCE_DICT.items() if k not in resource_dict}
     )
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between ceda4c0 and 6c139bb.

📒 Files selected for processing (7)
  • executorlib/init.py (1 hunks)
  • executorlib/cache/executor.py (4 hunks)
  • executorlib/cache/shared.py (4 hunks)
  • executorlib/interactive/executor.py (1 hunks)
  • tests/test_cache_executor_mpi.py (1 hunks)
  • tests/test_cache_executor_pysqa_flux.py (1 hunks)
  • tests/test_cache_executor_serial.py (4 hunks)
🔇 Additional comments (10)
tests/test_cache_executor_mpi.py (1)

32-32: LGTM! Correct implementation of resource_dict parameter.

The change from cores_per_worker=2 to resource_dict={"cores": 2} aligns with the PR objective of unifying resource specification.

tests/test_cache_executor_pysqa_flux.py (2)

34-38: LGTM! Test implementation is correct.

The test correctly validates the parallel execution behavior with the new resource_dict parameter while maintaining the existing test logic for verifying MPI functionality.


35-35: Verify resource_dict parameter consistency across test files.

The change from cores_per_worker to resource_dict={"cores": 2} aligns with the PR objective. Let's verify this pattern is consistently applied across all test files.

✅ Verification successful

Parameter change is consistently implemented across test files

The verification shows that:

  • No remaining usage of cores_per_worker parameter exists in test files
  • resource_dict with cores parameter is consistently used across all test files
  • The implementation in test_cache_executor_pysqa_flux.py aligns with the pattern used in other test files
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify consistent usage of resource_dict parameter across test files
# Expected: All FileExecutor instantiations should use resource_dict instead of cores_per_worker

# Search for any remaining usage of cores_per_worker in test files
echo "Checking for any remaining cores_per_worker usage:"
rg "cores_per_worker" "tests/"

# Search for resource_dict usage pattern in test files
echo -e "\nVerifying resource_dict usage pattern:"
rg "resource_dict.*cores.*\d+" "tests/"

Length of output: 1976

executorlib/cache/executor.py (2)

17-17: LGTM: Constructor signature change is well-implemented.

The new resource_dict parameter is properly typed as Optional[dict] and effectively consolidates resource-related parameters.


59-59: Verify execute_tasks_h5 compatibility with resource_dict.

The resource_dict is correctly passed to execute_tasks_h5, but we should verify that the function has been updated to handle this parameter correctly.

Let's verify the execute_tasks_h5 implementation:

✅ Verification successful

execute_tasks_h5 is properly updated to handle resource_dict parameter

The verification shows that execute_tasks_h5 in executorlib/cache/shared.py is correctly defined with resource_dict as a required parameter in its signature. The function's documentation also indicates it expects a dictionary of resources. The test files demonstrate proper usage of the parameter by passing resource dictionaries with expected keys.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check if execute_tasks_h5 function is updated to handle resource_dict

# Search for execute_tasks_h5 function definition and its usage of resource_dict
ast-grep --pattern 'def execute_tasks_h5($$$) { $$$ resource_dict $$$ }'

Length of output: 74


Script:

#!/bin/bash
# Let's try a different approach to find the execute_tasks_h5 function and its implementation

# First, let's find the file containing execute_tasks_h5
rg "execute_tasks_h5" -l

# Then get the full context of the function
rg "execute_tasks_h5" -B 5 -A 15

Length of output: 9445

tests/test_cache_executor_serial.py (1)

51-51: LGTM! Verify resource_dict format consistency.

The change to use resource_dict for working directory specification aligns with the PR's objective of unified resource management.

Let's verify the resource_dict format consistency across the codebase:

✅ Verification successful

Resource dictionary usage is consistent across the codebase

The verification shows that:

  • resource_dict is an optional parameter in the FileExecutor class
  • All instances of FileExecutor in the codebase either:
    • Use no resources (default constructor)
    • Properly use the resource_dict parameter with specific resources ("cwd", "cores")
  • The format is consistent across different test files (test_cache_executor_serial.py, test_cache_executor_pysqa_flux.py, test_cache_executor_mpi.py)
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for all FileExecutor instantiations to ensure consistent resource_dict usage
ast-grep --pattern 'FileExecutor($$$resource_dict=$$$)'

Length of output: 57


Script:

#!/bin/bash
# Let's try a broader search for FileExecutor instantiations
rg "FileExecutor\(" -A 2

# Also search for the class definition to understand the parameters
rg "class FileExecutor"

Length of output: 1532


Script:

#!/bin/bash
# Let's check the FileExecutor class implementation to understand the resource_dict parameter
rg -A 10 "class FileExecutor" executorlib/cache/executor.py

Length of output: 434

executorlib/interactive/executor.py (1)

Line range hint 209-267: Standardize resource handling across backends.

The resource dictionary handling differs across backends:

  • Local backend removes threads_per_core and gpus_per_core
  • Flux backend removes openmpi_oversubscribe and slurm_cmd_args
  • Slurm backend keeps all keys

This inconsistency could lead to confusion and maintenance issues.

Consider:

  1. Standardizing which keys are required vs optional for each backend
  2. Documenting backend-specific requirements in the docstring
  3. Implementing a validation function for each backend's resource requirements

Let's check if this inconsistency exists in other executor implementations:

executorlib/__init__.py (1)

141-143: Verify documentation for breaking changes in resource_dict keys.

The renaming of resource_dict keys from 'cores_per_worker'/'gpus_per_worker' to 'cores'/'gpus_per_core' represents a breaking change that could affect existing code. While the new names are more precise, we should ensure these changes are properly documented.

executorlib/cache/shared.py (2)

122-122: Verify the usage of resource_dict when calling execute_function

At line 122, resource_dict=task_resource_dict is passed to execute_function. Ensure that execute_function can handle cases where task_resource_dict might be empty or missing expected keys.

Confirm that execute_function gracefully handles incomplete resource_dict inputs.


Line range hint 53-57: Ensure all calls to execute_tasks_h5 are updated with the new resource_dict parameter

The function execute_tasks_h5 now requires a resource_dict parameter instead of cores_per_worker and cwd. Please verify that all calls to this function throughout the codebase have been updated to reflect this change.

Run the following script to search for any outdated function calls:

✅ Verification successful

All calls to execute_tasks_h5 are correctly using the new resource_dict parameter

The verification shows that all calls to execute_tasks_h5 have been properly updated:

  • In tests/test_cache_executor_serial.py, all calls use resource_dict with {"cores": 1, "cwd": None}
  • In executorlib/cache/executor.py, the call correctly passes the resource_dict parameter

No outdated calls using cores_per_worker or cwd parameters were found.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Find all usages of execute_tasks_h5 with old parameters
# Expectation: No matches should be found

rg 'execute_tasks_h5\([^\)]*(cores_per_worker|cwd)\s*='

Length of output: 57


Script:

#!/bin/bash
# Let's first find all files that contain 'execute_tasks_h5' to analyze its usage
rg -l "execute_tasks_h5"

# Then let's see the actual calls with context
rg "execute_tasks_h5" -B 2 -A 2

Length of output: 2475


Script:

#!/bin/bash
# Let's examine the actual calls in test_cache_executor_serial.py to see the parameters being passed
rg -A 10 "target=execute_tasks_h5" tests/test_cache_executor_serial.py

# Also check the call in executor.py
rg -A 10 "target=execute_tasks_h5" executorlib/cache/executor.py

Length of output: 1839

Comment on lines 32 to 35
with FileExecutor(resource_dict={"cores": 2}) as exe:
fs1 = exe.submit(mpi_funct, 1)
self.assertFalse(fs1.done())
self.assertEqual(fs1.result(), [(1, 2, 0), (1, 2, 1)])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider adding test cases for resource_dict validation.

While the basic functionality is tested, consider adding test cases that verify:

  • Invalid resource dictionary formats
  • Missing required resources
  • Additional resource parameters that FileExecutor supports

Example test case structure:

def test_executor_resource_validation(self):
    # Test invalid cores value
    with self.assertRaises(ValueError):
        with FileExecutor(resource_dict={"cores": -1}):
            pass
            
    # Test missing required resources
    with self.assertRaises(KeyError):
        with FileExecutor(resource_dict={}):
            pass

Comment on lines +28 to +30
resource_dict (dict): A dictionary of resources required by the task. With the following keys:
- cores (int): number of MPI cores to be used for each function call
- cwd (str/None): current working directory where the parallel python task is executed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Remove documentation for deprecated parameters.

The docstring still contains entries for the removed parameters cores_per_worker and cwd. These should be removed since they've been replaced by resource_dict.

Apply this diff to fix the docstring:

            resource_dict (dict): A dictionary of resources required by the task. With the following keys:
                              - cores (int): number of MPI cores to be used for each function call
                              - cwd (str/None): current working directory where the parallel python task is executed
            execute_function (callable, optional): The function to execute tasks. Defaults to execute_in_subprocess.
-           cores_per_worker (int, optional): The number of CPU cores per worker. Defaults to 1.
            terminate_function (callable, optional): The function to terminate the tasks.
-           cwd (str, optional): current working directory where the parallel python task is executed

Also applies to: 33-34

@@ -113,7 +113,7 @@ def test_executor_function_dependence_kwargs(self):
"future_queue": q,
"cache_directory": cache_dir,
"execute_function": execute_in_subprocess,
"cores_per_worker": 1,
"resource_dict": {"cores": 1, "cwd": None},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider refactoring duplicate resource configurations.

The same resource configuration {"cores": 1, "cwd": None} is repeated across multiple test methods. Consider extracting this to a class-level constant or setUp method to improve maintainability.

class TestCacheExecutorSerial(unittest.TestCase):
+    @classmethod
+    def setUpClass(cls):
+        super().setUpClass()
+        cls.default_resource_dict = {"cores": 1, "cwd": None}

     def test_executor_function(self):
         # ... existing code ...
         kwargs={
             "future_queue": q,
             "cache_directory": cache_dir,
             "execute_function": execute_in_subprocess,
-            "resource_dict": {"cores": 1, "cwd": None},
+            "resource_dict": self.default_resource_dict,
             "terminate_function": terminate_subprocess,
         }

Also applies to: 157-157

@@ -115,12 +114,12 @@ def execute_tasks_h5(
process_dict[task_key] = execute_function(
command=_get_execute_command(
file_name=file_name,
cores=cores_per_worker,
cores=task_resource_dict["cores"],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Ensure 'cores' key exists in task_resource_dict before access

At line 117, accessing task_resource_dict["cores"] without checking if the 'cores' key exists may raise a KeyError if it is missing.

Consider providing a default value or adding a check:

Option 1: Provide a default value for 'cores':

-            cores=task_resource_dict["cores"],
+            cores=task_resource_dict.get("cores", 1),

Option 2: Add a check and raise an explicit error if 'cores' is missing:

if "cores" not in task_resource_dict:
    raise KeyError("'cores' key is missing in task_resource_dict")
cores = task_resource_dict["cores"]

Comment on lines +64 to +66
resource_dict (dict): A dictionary of resources required by the task. With the following keys:
- cores (int): number of MPI cores to be used for each function call
- cwd (str/None): current working directory where the parallel python task is executed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Improve docstring formatting for resource_dict parameter

The docstring for resource_dict can be reformatted for clarity and to adhere to standard documentation practices.

Consider applying this change:

-        resource_dict (dict): A dictionary of resources required by the task. With the following keys:
-                              - cores (int): number of MPI cores to be used for each function call
-                              - cwd (str/None): current working directory where the parallel python task is executed
+        resource_dict (dict): A dictionary of resources required by the task with the following keys:
+            - **cores** (int): Number of MPI cores to be used for each function call.
+            - **cwd** (Optional[str]): Current working directory where the parallel Python task is executed.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
resource_dict (dict): A dictionary of resources required by the task. With the following keys:
- cores (int): number of MPI cores to be used for each function call
- cwd (str/None): current working directory where the parallel python task is executed
resource_dict (dict): A dictionary of resources required by the task with the following keys:
- **cores** (int): Number of MPI cores to be used for each function call.
- **cwd** (Optional[str]): Current working directory where the parallel Python task is executed.

Comment on lines +100 to +103
task_resource_dict = task_dict["resource_dict"].copy()
task_resource_dict.update(
{k: v for k, v in resource_dict.items() if k not in task_resource_dict}
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Handle missing 'resource_dict' in task_dict to prevent KeyError

At line 100, the code assumes that task_dict always contains a 'resource_dict' key. If this key is missing, it will raise a KeyError.

Apply this diff to safely handle cases where 'resource_dict' might not be present:

-task_resource_dict = task_dict["resource_dict"].copy()
+task_resource_dict = task_dict.get("resource_dict", {}).copy()

Additionally, ensure that the subsequent code accounts for the possibility of an empty task_resource_dict.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
task_resource_dict = task_dict["resource_dict"].copy()
task_resource_dict.update(
{k: v for k, v in resource_dict.items() if k not in task_resource_dict}
)
task_resource_dict = task_dict.get("resource_dict", {}).copy()
task_resource_dict.update(
{k: v for k, v in resource_dict.items() if k not in task_resource_dict}
)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (1)
tests/test_cache_hdf.py (1)

19-19: Minor: Consider rephrasing the skip message.

While the message is accurate, it could be more concise.

Consider this alternative:

-    skip_h5py_test, "h5py is not installed, so the h5io tests are skipped."
+    skip_h5py_test, "Skipping HDF tests: h5py package not installed"
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 6c139bb and 69f2244.

📒 Files selected for processing (5)
  • executorlib/cache/executor.py (3 hunks)
  • tests/test_cache_executor_mpi.py (2 hunks)
  • tests/test_cache_executor_pysqa_flux.py (1 hunks)
  • tests/test_cache_executor_serial.py (6 hunks)
  • tests/test_cache_hdf.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • tests/test_cache_executor_mpi.py
  • tests/test_cache_executor_pysqa_flux.py
🔇 Additional comments (11)
tests/test_cache_hdf.py (2)

9-9: LGTM! Variable rename improves clarity.

The rename from skip_h5io_test to skip_h5py_test better reflects that we're checking for the availability of the h5py package.

Also applies to: 11-11


Line range hint 22-78: Test coverage looks comprehensive.

The test suite thoroughly covers different argument patterns (mixed, args-only, kwargs-only) for HDF operations. The unchanged test cases continue to provide good coverage of the core functionality.

executorlib/cache/executor.py (3)

23-24: LGTM! Constructor signature changes are well-structured.

The updated signature with resource_dict and new default executor aligns well with the architectural changes.


12-16: Verify pysqa availability in deployment environments.

The change of default executor to execute_with_pysqa with fallback to execute_in_subprocess might affect existing deployments.

✅ Verification successful

pysqa is an optional dependency, fallback mechanism is appropriate

The verification reveals that pysqa is properly configured as an optional dependency in the queue extras group within pyproject.toml. The fallback to execute_in_subprocess is a well-designed pattern that ensures the library remains functional even without pysqa installed. Users who need queue system integration can explicitly install it using pip install executorlib[queue].

Key findings:

  • pysqa is specified as version 0.2.0 in the optional queue dependencies
  • The execute_with_pysqa implementation provides queue system integration when available
  • The fallback mechanism maintains backward compatibility for basic usage
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check if pysqa is listed as a dependency
# Test: Look for pysqa in requirements files
rg -l 'pysqa' --glob '{requirements,setup.py,pyproject.toml}'

Length of output: 178


Script:

#!/bin/bash
# Check for pysqa related imports and usage across the codebase
rg -l "pysqa"

# Check setup.py specifically for optional dependencies
fd "setup.py" --exec cat {}

# Look for any documentation about optional dependencies
fd "README" --extension md --exec cat {}

Length of output: 8472


Script:

#!/bin/bash
# Check pyproject.toml for pysqa dependency configuration
cat pyproject.toml

# Check the implementation of execute_with_pysqa to understand its role
cat executorlib/standalone/cache/queue.py

Length of output: 3161


63-63: Verify execute_tasks_h5 compatibility with resource_dict.

Ensure that the execute_tasks_h5 function has been updated to handle the new resource_dict parameter.

✅ Verification successful

The execute_tasks_h5 function is properly updated to handle resource_dict parameter

Based on the search results, the execute_tasks_h5 function in executorlib/cache/shared.py is already properly updated to handle the resource_dict parameter:

  1. The function signature includes resource_dict as a required parameter:
def execute_tasks_h5(
    future_queue: queue.Queue,
    cache_directory: str,
    execute_function: callable,
    resource_dict: dict,
    ...
)
  1. The function's docstring properly documents the parameter:
resource_dict (dict): A dictionary of resources required by the task. With the following keys:
                     - cores (int): number of MPI cores to be used for each function call
                     - cwd (str/None): current working directory where the parallel python task is executed
  1. The function correctly uses the parameter in its implementation:
task_resource_dict = task_dict["resource_dict"].copy()
task_resource_dict.update(
    {k: v for k, v in resource_dict.items() if k not in task_resource_dict}
)
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check if execute_tasks_h5 is updated to handle resource_dict
ast-grep --pattern 'def execute_tasks_h5($params) {
  $$$
  resource_dict
  $$$
}'

Length of output: 178


Script:

#!/bin/bash
# Search for execute_tasks_h5 function definition and its usage of resource_dict
rg "def execute_tasks_h5" -A 20

Length of output: 1648


Script:

#!/bin/bash
# Search for resource_dict usage in the codebase
rg "resource_dict" -A 3

Length of output: 37378

tests/test_cache_executor_serial.py (6)

7-10: LGTM! Import organization improvement.

The reorganization of imports improves code clarity by grouping related imports together.


17-17: LGTM! More accurate test skip condition naming.

The rename from skip_h5io_test to skip_h5py_test better reflects the actual dependency being checked.

Also applies to: 19-19, 31-31


35-35: LGTM! Explicit executor function specification.

The addition of execute_function=execute_in_subprocess makes the test behavior more explicit and aligns with the executor library changes.

Also applies to: 42-42


51-53: LGTM! Proper resource_dict implementation.

The change from direct cwd parameter to resource_dict format aligns with the new resource management approach while maintaining the same test coverage.


77-77: Duplicate resource configuration issue.

A previous review comment already addresses the duplication of resource configuration across test methods.

Also applies to: 118-118, 159-159


Line range hint 1-173: Verify complete migration to resource_dict.

Let's ensure all resource-related parameters have been migrated to the new resource_dict format.

✅ Verification successful

Let's examine the specific usage of these parameters in the relevant files to determine if there are any unconverted instances.


Migration to resource_dict is properly implemented

The code review shows that the migration to resource_dict is complete and consistent:

  • Core files show proper resource_dict parameter handling
  • Direct core/cwd parameters are only used internally after being extracted from resource_dict
  • All test files and implementation files follow the new resource_dict pattern
  • The remaining direct parameter references are either:
    • Inside the resource_dict itself (e.g., resource_dict["cores"])
    • Documentation examples
    • Internal parameter extraction from the resource_dict
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for any remaining direct resource parameter usage
rg -l "cores_per_worker|num_cores|cores=|cwd=" --type py

# Search for resource_dict usage to compare
rg -l "resource_dict" --type py

Length of output: 1333


Script:

#!/bin/bash
# Check actual usage of cores/cwd parameters in key files
rg "cores_per_worker|num_cores|cores=|cwd=" -B 2 -A 2 executorlib/interactive/shared.py executorlib/cache/shared.py executorlib/standalone/cache/spawner.py

# Check resource_dict implementation in the same files
rg "resource_dict" -B 2 -A 2 executorlib/interactive/shared.py executorlib/cache/shared.py executorlib/standalone/cache/spawner.py

Length of output: 12916

Comment on lines +43 to +51
default_resource_dict = {
"cores": 1,
"cwd": None,
}
if resource_dict is None:
resource_dict = {}
resource_dict.update(
{k: v for k, v in default_resource_dict.items() if k not in resource_dict}
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider simplifying the resource dictionary initialization.

While the current implementation is correct, it could be more concise.

Consider this simpler implementation:

-        default_resource_dict = {
-            "cores": 1,
-            "cwd": None,
-        }
-        if resource_dict is None:
-            resource_dict = {}
-        resource_dict.update(
-            {k: v for k, v in default_resource_dict.items() if k not in resource_dict}
-        )
+        resource_dict = {
+            "cores": 1,
+            "cwd": None,
+            **(resource_dict or {})
+        }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
default_resource_dict = {
"cores": 1,
"cwd": None,
}
if resource_dict is None:
resource_dict = {}
resource_dict.update(
{k: v for k, v in default_resource_dict.items() if k not in resource_dict}
)
resource_dict = {
"cores": 1,
"cwd": None,
**(resource_dict or {})
}

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Outside diff range and nitpick comments (4)
executorlib/cache/executor.py (2)

18-22: Consider adding type hints for better maintainability.

The PySQA integration with fallback is well-implemented. Consider adding type hints to make the function signature more explicit.

-    execute_with_pysqa = execute_in_subprocess
+    execute_with_pysqa: callable = execute_in_subprocess

40-45: Enhance resource_dict documentation.

The docstring should document all available resource_dict keys. Add descriptions for:

  • threads_per_core
  • gpus_per_core
  • openmpi_oversubscribe
  • slurm_cmd_args

Also, consider expanding the pysqa_config_directory description to explain its purpose and format.

executorlib/cache/shared.py (2)

53-55: Add type hints for resource_dict parameter

The type hint for resource_dict parameter is missing. Consider adding it to maintain consistency with other parameters:

-    resource_dict: dict,
+    resource_dict: dict[str, Union[int, str, None]],

Line range hint 100-123: Consider implementing a ResourceManager class

The current implementation handles resources through dictionary operations spread across multiple locations. Consider encapsulating this logic in a dedicated ResourceManager class to:

  1. Centralize resource validation and merging logic
  2. Provide type safety through a well-defined interface
  3. Make it easier to add new resource types in the future
  4. Reduce the risk of KeyError exceptions

Example implementation:

from dataclasses import dataclass
from typing import Optional

@dataclass
class ExecutionResources:
    cores: int = 1
    cwd: Optional[str] = None
    
    @classmethod
    def from_dict(cls, resource_dict: dict) -> 'ExecutionResources':
        return cls(
            cores=resource_dict.get('cores', 1),
            cwd=resource_dict.get('cwd')
        )
    
    def merge(self, other: 'ExecutionResources') -> 'ExecutionResources':
        return ExecutionResources(
            cores=self.cores if self.cores != 1 else other.cores,
            cwd=self.cwd if self.cwd is not None else other.cwd
        )

This would allow you to replace the dictionary operations with:

task_resources = ExecutionResources.from_dict(task_dict.get('resource_dict', {}))
default_resources = ExecutionResources.from_dict(resource_dict)
final_resources = task_resources.merge(default_resources)
🧰 Tools
🪛 Ruff

110-110: Use key not in dict instead of key not in dict.keys()

Remove .keys()

(SIM118)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 69f2244 and 30d0c81.

📒 Files selected for processing (5)
  • executorlib/init.py (7 hunks)
  • executorlib/cache/executor.py (2 hunks)
  • executorlib/cache/shared.py (4 hunks)
  • tests/test_cache_executor_pysqa_flux.py (2 hunks)
  • tests/test_cache_executor_serial.py (6 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/test_cache_executor_pysqa_flux.py
🔇 Additional comments (14)
executorlib/cache/executor.py (4)

10-15: LGTM! Well-organized imports for input validation.

The new input validation imports are properly structured and support the enhanced resource validation functionality.


28-32: LGTM! Well-structured constructor signature changes.

The parameter changes align with the PR objectives and follow Python conventions.


82-87: LGTM! Consistent resource management implementation.

The process setup correctly uses the new resource_dict and PySQA configuration parameters.


49-71: Consider preserving validated resource keys.

Currently, several keys are deleted after validation (threads_per_core, gpus_per_core, etc.). This could be confusing as these values might be needed later. Consider either:

  1. Keeping these values in the dictionary for reference
  2. Moving validation-only keys to a separate validation function

Let's verify the validation function implementations:

tests/test_cache_executor_serial.py (3)

7-10: LGTM! Import organization follows best practices.

The imports are properly organized and explicitly specify the required functions.


17-17: LGTM! Skip flag rename improves clarity.

The rename from skip_h5io_test to skip_h5py_test better reflects its purpose of checking h5py library availability.

Also applies to: 19-19, 31-31


35-35: Verify resource configuration completeness.

While the FileExecutor instantiation has been updated to use resource_dict, please verify if all required resources are properly configured. The current instantiations have varying levels of resource configuration:

  • Line 35: No resource configuration
  • Line 42: No resource configuration
  • Line 51-53: Only cwd configured

Also applies to: 42-42, 51-53

executorlib/__init__.py (7)

5-7: Approved: Import of _check_executor function

The code correctly imports _check_executor from executorlib.standalone.inputcheck.


8-10: Approved: Import of _check_nested_flux_executor function

The code correctly imports _check_nested_flux_executor from executorlib.standalone.inputcheck.


105-105: Approved: Added pysqa_config_directory parameter to __init__ method

The pysqa_config_directory parameter is correctly added to the __init__ method signature, enhancing configurability for the pysqa backend.


107-107: Note the change of default block_allocation value from True to False

The default value of block_allocation has been changed to False. This may affect users who rely on the previous default behavior where block_allocation was True.

Action: Ensure this change is clearly documented in release notes or documentation to inform users about the potential impact.


126-126: Approved: Added pysqa_config_directory parameter to __new__ method

The pysqa_config_directory parameter is correctly added to the __new__ method signature for consistent initialization.


128-128: Note the change of default block_allocation value from True to False in __new__ method

The default value of block_allocation in the __new__ method has been changed to False. This may impact existing code that depends on the previous default.

Action: Ensure this change is properly communicated to users to prevent unexpected behavior.


209-231: Approved: Enhanced error handling for pysqa backend configurations

The added error checks provide clear feedback for unsupported configurations when using the pysqa backend, improving robustness and user experience.

Comment on lines 77 to 84
"resource_dict": {
"cores": 1,
"threads_per_core": 1,
"gpus_per_core": 0,
"cwd": None,
"openmpi_oversubscribe": False,
"slurm_cmd_args": [],
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Enhance the resource dictionary configuration to reduce duplication.

The same detailed resource configuration is repeated across multiple test methods. This is an extension of the previously identified duplication issue, now with more complex configuration.

Consider creating a class-level helper method to generate resource dictionaries with optional overrides:

@classmethod
def get_resource_dict(cls, **overrides):
    default_resources = {
        "cores": 1,
        "threads_per_core": 1,
        "gpus_per_core": 0,
        "cwd": None,
        "openmpi_oversubscribe": False,
        "slurm_cmd_args": [],
    }
    return {**default_resources, **overrides}

Then use it in test methods:

resource_dict=self.get_resource_dict()  # for default config
resource_dict=self.get_resource_dict(cwd=custom_cwd)  # for custom values

Also applies to: 125-132, 173-180

Comment on lines 122 to 123
resource_dict=task_resource_dict,
config_directory=pysqa_config_directory,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Parameter naming inconsistency found in multiple files

The codebase shows inconsistent naming between pysqa_config_directory and config_directory. Here are the locations that need attention:

  • executorlib/cache/shared.py: Uses both pysqa_config_directory in the function signature but config_directory in the execute_function call
  • executorlib/cache/executor.py: Uses both naming conventions (pysqa_config_directory in signature, config_directory in dictionary)
  • executorlib/standalone/cache/queue.py and executorlib/standalone/cache/spawner.py: Still use the old config_directory naming

The parameter should be consistently named as pysqa_config_directory across all files since it specifically refers to the pysqa configuration directory.

🔗 Analysis chain

Ensure consistent parameter naming across the codebase

The parameter config_directory has been renamed to pysqa_config_directory to better reflect its purpose. Verify that this renaming is consistently applied across all related files and documentation.


Let me check the actual usage of config_directory in these files to determine if they need to be updated for consistency.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for any remaining instances of 'config_directory' that might need updating
rg -l 'config_directory' --type py

Length of output: 194


Script:

#!/bin/bash
# Check the context of config_directory usage in each file
for file in executorlib/__init__.py executorlib/cache/shared.py executorlib/cache/executor.py executorlib/standalone/cache/queue.py executorlib/standalone/cache/spawner.py; do
    echo "=== $file ==="
    rg 'config_directory' -B 2 -A 2 "$file"
done

Length of output: 6346

@@ -47,6 +53,7 @@ class Executor:
flux_executor (flux.job.FluxExecutor): Flux Python interface to submit the workers to flux
flux_executor_pmi_mode (str): PMI interface to use (OpenMPI v5 requires pmix) default is None (Flux only)
flux_executor_nesting (bool): Provide hierarchically nested Flux job scheduler inside the submitted function.
pysqa_config_directory (str, optional): path to the pysqa config directory (only for pysqa based backend).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Use bool instead of boolean in docstrings for consistency

In the docstring, the parameter hostname_localhost is documented with the type boolean. For consistency with Python's type annotations and clarity, consider changing the type to bool.


⚠️ Potential issue

Update block_allocation default value in docstring

The default value of block_allocation has changed from True to False in the code, but the docstring does not reflect this change. Please update the docstring to indicate the new default value.

@@ -162,6 +171,7 @@
of the individual function.
init_function (None): optional function to preset arguments for functions which are submitted later
disable_dependencies (boolean): Disable resolving future objects during the submission.
pysqa_config_directory (str, optional): path to the pysqa config directory (only for pysqa based backend).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add pysqa_config_directory to the __new__ method's docstring

The pysqa_config_directory parameter is not documented in the __new__ method's docstring. Please add it to the Args section to maintain comprehensive documentation.

Comment on lines 209 to 231
elif "pysqa_" in backend and not plot_dependency_graph:
if cache_directory is None:
cache_directory = "executorlib_cache"
if max_workers != 1:
raise ValueError(
"The number of workers cannot be controlled with the pysqa based backend."
)
if max_cores != 1:
raise ValueError(
"The number of cores cannot be controlled with the pysqa based backend."
)
if hostname_localhost is not None:
raise ValueError(
"The option to connect to hosts based on their hostname is not available with the pysqa based backend."
)
if block_allocation:
raise ValueError(
"The option block_allocation is not available with the pysqa based backend."
)
if init_function is not None:
raise ValueError(
"The option to specify an init_function is not available with the pysqa based backend."
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Refine error messages for clarity and guidance

Some error messages could be more informative to help users troubleshoot issues effectively. Consider providing explanations or guidance on correct usage.

Example Revision:

  • Original message:

    "The number of cores cannot be controlled with the pysqa based backend."
    
  • Revised message:

    "The 'pysqa' backend manages cores automatically; specifying 'max_cores' is not supported. Please remove 'max_cores' from your configuration when using 'pysqa'."
    

Comment on lines +150 to +143
- cores (int): number of MPI cores to be used for each function call
- threads_per_core (int): number of OpenMP threads to be used for each function call
- gpus_per_worker (int): number of GPUs per worker - defaults to 0
- gpus_per_core (int): number of GPUs per worker - defaults to 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Based on the search results, I can generate the final response as there are clear inconsistencies found in the codebase.

Documentation and code still contain references to old resource_dict keys

Several files still use the old keys and need to be updated:

  • notebooks/examples.ipynb: Contains multiple references to cores_per_worker and gpus_per_worker in documentation and code examples
  • README.md: Uses cores_per_worker in documentation
  • executorlib/__init__.py: Still documents the parameter as cores_per_worker and gpus_per_worker in docstrings
  • executorlib/standalone/inputcheck.py: Contains function and error messages using gpus_per_worker
  • tests/test_shared_input_check.py: Test cases use gpus_per_worker

While some files like executorlib/interactive/executor.py have been updated to use the new cores and gpus_per_core keys, there are still many places where the old terminology is used, which could lead to confusion.

🔗 Analysis chain

Ensure all references to old resource_dict keys are updated

The keys in resource_dict have been updated from cores_per_worker to cores and from gpus_per_worker to gpus_per_core. Please verify that all references to the old keys have been updated throughout the codebase and documentation to prevent inconsistencies.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Search for occurrences of deprecated resource_dict keys

# Searching for 'cores_per_worker'
rg 'cores_per_worker'

# Searching for 'gpus_per_worker'
rg 'gpus_per_worker'

Length of output: 3352

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant