Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add type checking with mypy #535

Merged
merged 4 commits into from
Dec 24, 2024
Merged

Add type checking with mypy #535

merged 4 commits into from
Dec 24, 2024

Conversation

jan-janssen
Copy link
Member

@jan-janssen jan-janssen commented Dec 24, 2024

Summary by CodeRabbit

  • New Features

    • Introduced a new GitHub Actions workflow for automated type checking using MyPy.
    • Added a method to the RaisingThread class to access keyword arguments.
    • Added a test method to verify behavior when receiving from a None socket.
  • Bug Fixes

    • Enhanced error handling in various functions to prevent potential runtime errors when dealing with None values.
  • Documentation

    • Improved type hinting across multiple files for better clarity and consistency.
  • Refactor

    • Updated parameter and return types in several functions and classes to use Optional and Callable for enhanced type safety.

Copy link
Contributor

coderabbitai bot commented Dec 24, 2024

Walkthrough

This pull request introduces a new GitHub Actions workflow for MyPy type checking and implements extensive type annotation improvements across the executorlib package. The changes focus on enhancing type safety by replacing generic callable types with Callable, adding optional type hints, and improving error handling through more robust type checking. These modifications collectively aim to maintain high-quality type annotations throughout the codebase.

Changes

File Change Summary
.github/workflows/mypy.yml New GitHub Actions workflow for MyPy type checking
executorlib/__init__.py Updated type annotations for __all__ and Executor class parameters
executorlib/backend/* Type hints added for variables and function parameters
executorlib/base/executor.py Enhanced type safety for _future_queue and _process attributes
executorlib/cache/* Improved type annotations for callable parameters and return types
executorlib/interactive/* Refined type hints for function signatures and error handling
executorlib/standalone/* Comprehensive type annotation updates across multiple files
tests/test_integration_pyiron_workflow.py Type hinting improvements in Foo class and as_dynamic_foo function
tests/test_shared_communication.py New test method test_interface_receive added to TestZMQ class
tests/test_shared_input_check.py Removed assertion for TypeError in validate_number_of_cores function

Poem

🐰 Type Checking Rabbit's Delight
With Callable types, our code shines bright
MyPy workflows, a type-safe flight
No more callable running wild
Our annotations now reconciled!

Hop hop hooray! 🎉

Possibly related PRs

Tip

CodeRabbit's docstrings feature is now available as part of our Early Access Program! Simply use the command @coderabbitai generate docstrings to have CodeRabbit automatically generate docstrings for your pull request. We would love to hear your feedback on Discord.


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (24)
executorlib/standalone/thread.py (1)

24-26: Consider adding type hints to get_kwargs.

While the new accessor method is helpful for encapsulation, adding a return type annotation (e.g., -> dict or -> Dict[str, Any]) would improve clarity for both readers and static type checkers.

executorlib/backend/cache_parallel.py (2)

4-4: Remove unused import.

The import Any is not used in this file. Removing it will prevent linting or static analysis warnings such as F401.

-import pickle
-import sys
-import time
-from typing import Any
+import pickle
+import sys
+import time
🧰 Tools
🪛 Ruff (0.8.2)

4-4: typing.Any imported but unused

Remove unused import: typing.Any

(F401)


38-41: Optional: inline the if-else statement.

You could simplify this block by replacing the if-else statement with a ternary expression:

apply_dict = backend_load_file(file_name=file_name) if mpi_rank_zero else {}

However, this is optional. If clarity is more important than brevity, you can keep the current structure.

🧰 Tools
🪛 Ruff (0.8.2)

38-41: Use ternary operator apply_dict = backend_load_file(file_name=file_name) if mpi_rank_zero else {} instead of if-else-block

Replace if-else-block with apply_dict = backend_load_file(file_name=file_name) if mpi_rank_zero else {}

(SIM108)

executorlib/standalone/interactive/backend.py (2)

2-2: Use imports selectively.

Ensure every imported name is used. If there are other members from typing that remain unused, consider removing them to reduce clutter.


17-17: Return type clarity.

Returning Any is appropriate if the function’s output can vary widely. If the return type can be narrowed, consider specifying a more specific type in the future.

executorlib/standalone/serialize.py (1)

4-4: Confirm necessity of Tuple.

Review whether you actually use Tuple from typing. If not, removing it will help maintain a clean import list.

executorlib/standalone/hdf.py (1)

27-33: Suggestion to simplify membership check.
You can streamline the condition by removing .keys():

- if data_key in group_dict.keys():
+ if data_key in group_dict:
🧰 Tools
🪛 Ruff (0.8.2)

29-29: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

executorlib/standalone/plot.py (1)

21-23: Use more specialized type hints for clarity.

Declaring the exact list and dict types (e.g., list[dict] or dict[str, Any]) can improve clarity and help catch errors during static analysis.

executorlib/standalone/interactive/spawner.py (3)

9-14: Improve constructor docstring to reflect Optional cwd.

The docstring says “cwd (str): …” but the code now specifies Optional[str]. Recommend updating the docstring to clarify that None is the default and permissible value.


80-80: Explicitly type subprocess.Popen as needed.

Declaring Optional[subprocess.Popen[bytes]] or Optional[subprocess.Popen[str]] can further improve clarity on which stream encoding is used (if known).


118-122: Order of calls can be optimized for graceful shutdown.

Calling terminate() before communicate() is usually more common for ensuring the process is signaled before collecting its output. Consider reversing these calls if the process is still expected to be running.

executorlib/cache/executor.py (2)

24-24: Document the rationale for type ignore.

Including a brief comment explaining why type checking should be ignored here helps maintainers understand the trade-offs if pysqa is unavailable.


32-33: Leverage type aliases for commonly used callables.

If the same callable signature is used across multiple classes, consider defining a type alias (e.g., TaskFunction = Callable[..., Any]) to keep code more organized and consistent.

executorlib/base/executor.py (1)

42-48: Simplify membership checks.
According to the static analysis hint (SIM118), consider removing the .keys() call for membership checks:

- if "future_queue" in meta_data_dict.keys():
+ if "future_queue" in meta_data_dict:
🧰 Tools
🪛 Ruff (0.8.2)

43-43: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

executorlib/standalone/inputcheck.py (1)

121-123: Empty error message.
Providing an explicit message when raising ValueError can improve debugging. Consider adding a short, descriptive message:

- raise ValueError("")
+ raise ValueError("Allocation is blocked, but `init_function` is set. This combination is not supported.")
executorlib/cache/shared.py (2)

24-24: Any return type might be too broad.
Consider narrowing the return type if feasible to better reflect the expected data.


77-79: Consider using more specific type hints.
Defining dictionaries as dict is somewhat vague. For example:

memory_dict: Dict[str, Future] = {}
process_dict: Dict[str, Any] = {}
file_name_dict: Dict[str, str] = {}

This aids in maintainability and readability.

executorlib/interactive/executor.py (3)

84-85: Consider more specific type hints instead of dict.
If the keys/values are known, declare them as Dict[K, V] for clarity.


231-234: Use membership tests without .keys().

- if "openmpi_oversubscribe" in resource_dict.keys():
+ if "openmpi_oversubscribe" in resource_dict:
    del resource_dict["openmpi_oversubscribe"]

Similarly for slurm_cmd_args.

🧰 Tools
🪛 Ruff (0.8.2)

231-231: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)


233-233: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)


301-306: Use membership tests without .keys().
Apply this pattern elsewhere for consistency.

🧰 Tools
🪛 Ruff (0.8.2)

301-301: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)


303-303: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)


305-305: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

executorlib/__init__.py (1)

180-180: Using dict type is fine locally, but consider more specific hints.
If known, specify key-value types for clarity, e.g., Dict[str, Union[int, bool]].

executorlib/interactive/shared.py (1)

289-291: Consider using more specific container types.
For instance:

active_task_dict: Dict[Future, int] = {}
process_lst: List[RaisingThread] = []
qtask_lst: List[Queue] = []
.github/workflows/mypy.yml (2)

22-22: Enhance MyPy configuration and fix missing newline

Consider the following improvements:

  1. Add a mypy.ini configuration file to customize type checking settings
  2. Add error codes to the output for better debugging
  3. Add missing newline at EOF
-        run: mypy --ignore-missing-imports ${{ github.event.repository.name }}
+        run: mypy --ignore-missing-imports --show-error-codes ${{ github.event.repository.name }}
+

Would you like me to help create a mypy.ini configuration file with recommended settings for your project?

🧰 Tools
🪛 yamllint (1.35.1)

[error] 22-22: no new line character at the end of file

(new-line-at-end-of-file)


1-22: Consider adding dependency caching

To improve workflow execution time, consider caching pip dependencies between runs.

Add this step after Python setup:

      - name: Cache pip dependencies
        uses: actions/cache@v3
        with:
          path: ~/.cache/pip
          key: ${{ runner.os }}-pip-${{ hashFiles('**/requirements*.txt') }}
          restore-keys: |
            ${{ runner.os }}-pip-
🧰 Tools
🪛 yamllint (1.35.1)

[error] 22-22: no new line character at the end of file

(new-line-at-end-of-file)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5cf3ecc and 1799b48.

📒 Files selected for processing (22)
  • .github/workflows/mypy.yml (1 hunks)
  • executorlib/__init__.py (5 hunks)
  • executorlib/backend/cache_parallel.py (3 hunks)
  • executorlib/backend/interactive_parallel.py (4 hunks)
  • executorlib/base/executor.py (7 hunks)
  • executorlib/cache/executor.py (5 hunks)
  • executorlib/cache/queue_spawner.py (1 hunks)
  • executorlib/cache/shared.py (4 hunks)
  • executorlib/interactive/executor.py (8 hunks)
  • executorlib/interactive/flux.py (1 hunks)
  • executorlib/interactive/shared.py (18 hunks)
  • executorlib/interactive/slurm.py (1 hunks)
  • executorlib/standalone/__init__.py (1 hunks)
  • executorlib/standalone/hdf.py (4 hunks)
  • executorlib/standalone/inputcheck.py (3 hunks)
  • executorlib/standalone/interactive/backend.py (1 hunks)
  • executorlib/standalone/interactive/communication.py (1 hunks)
  • executorlib/standalone/interactive/spawner.py (3 hunks)
  • executorlib/standalone/plot.py (1 hunks)
  • executorlib/standalone/serialize.py (2 hunks)
  • executorlib/standalone/thread.py (1 hunks)
  • tests/test_integration_pyiron_workflow.py (3 hunks)
✅ Files skipped from review due to trivial changes (1)
  • executorlib/standalone/init.py
🧰 Additional context used
🪛 Ruff (0.8.2)
executorlib/backend/cache_parallel.py

4-4: typing.Any imported but unused

Remove unused import: typing.Any

(F401)


38-41: Use ternary operator apply_dict = backend_load_file(file_name=file_name) if mpi_rank_zero else {} instead of if-else-block

Replace if-else-block with apply_dict = backend_load_file(file_name=file_name) if mpi_rank_zero else {}

(SIM108)

executorlib/standalone/serialize.py

32-32: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)


32-32: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)


32-32: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

executorlib/interactive/executor.py

169-169: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)


231-231: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)


233-233: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)


301-301: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)


303-303: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)


305-305: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

executorlib/standalone/hdf.py

29-29: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

executorlib/base/executor.py

43-43: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)


49-49: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)


65-65: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

executorlib/interactive/shared.py

26-26: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

🪛 yamllint (1.35.1)
.github/workflows/mypy.yml

[error] 22-22: no new line character at the end of file

(new-line-at-end-of-file)

🔇 Additional comments (73)
executorlib/standalone/interactive/backend.py (2)

6-7: Signature looks good with explicit Callable and Any.

Switching from callable to Callable improves type clarity. Returning Any is acceptable if the function truly might return any type.


13-13: Docstring alignment with new type hints.

The docstring accurately reflects the updated parameter type. This consistency is good practice.

executorlib/standalone/serialize.py (1)

38-38: Docstring consistency.

The docstring now reflects that fn is a Callable, which matches the updated signature.

executorlib/backend/interactive_parallel.py (5)

4-4: Good addition of type hints.
Importing Optional here clearly communicates that certain values may be None.


7-7: Dependency import is appropriate.
Introducing zmq aligns with usage of zmq.Context and zmq.Socket types.


29-29: No issues with custom MPI pickle initialization.
The # type: ignore comment is acceptable to override MyPy errors here. Ensure that the relevant code is tested for MPI rank-based operations.


38-39: Proper addition of optional context and socket.
Declaring context: Optional[zmq.Context] = None and socket: Optional[zmq.Socket] = None is a clear improvement in type safety.


54-54: Initialization of the input_dict.
Using input_dict: dict = {} ensures a valid dictionary is always broadcasted.

executorlib/standalone/hdf.py (5)

2-2: Expanded usage of type hints.
Importing Any, Optional, and Tuple aligns well with the updated function signatures below.


18-18: Allowing file_name to be optional.
Changing file_name to Optional[str] handles cases where the file might not be specified.


26-26: Defensive check for None value.
Validating that file_name is not None before file operations helps avoid runtime errors.


63-63: Flexible return type for get_output.
Returning Any for the second tuple element is appropriate if the output can be any Python object.


97-105: Optional file name logic is correct.
Falling back to None when the file name is not provided is a valid design choice.

executorlib/interactive/slurm.py (1)

78-78: Improved flexibility for cwd.
Accepting None for cwd expands usability in contexts where a working directory isn't strictly required.

executorlib/cache/queue_spawner.py (1)

19-19: Optional queue ID return type.
Changing the return type to Optional[int] accommodates scenarios where a job submission may not occur or fails. Confirm that callers handle a None return.

executorlib/cache/executor.py (4)

2-2: Good practice importing Callable and Optional explicitly.

This fosters clarity and consistency across the codebase.


46-47: Update docstring to match new default.

Docstring references execute_in_subprocess as the default. Ensure it aligns with the actual default in the code (execute_with_pysqa) unless it is correctly changed later.


84-86: Allowing None for max_workers and max_cores broadens flexibility.

Make sure the rest of the code handles None as intended when scheduling tasks.


96-96: Confirm the constraints on init_function.

The function currently raises an error if init_function is specified. Ensure that’s the intended design or consider implementing the init functionality instead of raising an error.

executorlib/interactive/flux.py (1)

120-126: Prevent attribute errors by checking self._future.

This guard is a welcome improvement, ensuring methods aren’t called on None.

executorlib/base/executor.py (6)

8-8: Imported types look good.
Type imports from typing for Callable, List, Optional, and Union increase clarity and make the code easier to maintain.


30-31: Flexible typing for _future_queue and _process.
Marking these fields as Optional[...] is a good improvement, allowing for clearer checks before usage.


56-56: Consistent return type for future_queue.
Returning Optional[queue.Queue] matches how _future_queue is managed and prevents runtime errors.


100-110: Queue usage verified.
Verifying _future_queue is not None before put is solid and prevents errors.


128-132: Graceful shutdown.
The checks for None before canceling or shutting down maintain robust error handling.


155-158: Returning a qsize or zero.
Returning 0 if _future_queue is None avoids potential AttributeError and is consistent with the type signature.

executorlib/standalone/inputcheck.py (3)

103-118: Optional pmi handling is correct.
The updated checks ensure that invalid or missing pmi values raise the appropriate errors.


Line range hint 181-200: Core validation logic.
Using cores_per_worker=1 by default is sensible. The branching structure for max_cores and max_workers looks robust.


203-203: File existence checks are thorough.
Raising ValueError if file_name is None or missing on disk is a clear and concise approach.

executorlib/standalone/interactive/communication.py (3)

178-187: Allowing None sockets for interface_send.
Ensures safe handling if the socket is missing. This prevents potential AttributeError calls on socket.


190-200: Safe fallback for interface_receive.
Returning an empty dict when socket is None avoids runtime errors and is gracefully handled.


203-213: Robust interface_shutdown.
The checks prevent calls on None objects. This gracefully accommodates partial or no socket/context usage.

tests/test_integration_pyiron_workflow.py (3)

12-12: Switch to Callable import is consistent with type-hinting conventions.


25-25: Constructor type annotation.
Improving from callable to Callable is a good practice for explicit method signatures.


48-48: Decorator signature refined.
The updated as_dynamic_foo(fnc: Callable) approach clarifies usage and aligns with the rest of the codebase.

executorlib/cache/shared.py (3)

6-6: Good use of explicit imports for better type clarity.
No issues found.


52-54: Switch from callable to Callable is a best practice.
This enhances static type checking and clarity.


68-69: Docstring updates reflect the new type annotations accurately.
Everything looks consistent.

executorlib/interactive/executor.py (11)

92-92: # type: ignore may mask useful warnings.
Use it sparingly and justify its presence.


103-103: Well-documented function parameter.
No issues found.


149-149: # type: ignore on exit call.
Please verify if ignoring type checks here is truly needed.


176-176: Switch from Optional[callable] to Optional[Callable] is correct.
No issues found.


221-221: Clarity enhancement: cores_per_worker assignment.
No further issues.


225-227: Good checks for oversubscribe flags.
Logic is correct.


229-229: Retrieving SLURM command args with a default.
No issues found.


250-250: Ensuring threads_per_core has default.
Looks good.


279-279: Similar logic for SLURM.
Matches existing pattern.


297-297: Retrieving GPU data with default zero.
Implementation is correct.


299-299: Consistent approach for slurm_cmd_args.
No issues here.

executorlib/__init__.py (4)

1-1: Imports for Callable and Optional.
This is consistent with updated type usage.


19-19: Explicitly defining __all__ type.
Improves clarity for exported symbols.


103-103: Parameter type updated to Optional[Callable].
Enhances type correctness.


126-126: Same improvement in __new__.
Stays consistent.

executorlib/interactive/shared.py (20)

8-8: Comprehensive type imports.
No issues found.


34-34: Accurate docstring for fn parameter.
Matches the function signature.


53-57: Creating and passing a new Future.
Implementation looks correct.


74-83: Enhanced shutdown logic with queue checks.
Ensures _future_queue is valid. Properly handles wait conditions.


87-87: Type for _set_process method parameter.
Clear usage of List[RaisingThread].


95-96: Starting all processes in a loop.
Good approach. No issues.


137-137: spawner: type[BaseSpawner] = MpiExecSpawner
Type annotation helps clarity.


190-190: Same spawner annotation in InteractiveStepExecutor.
Consistency maintained.


208-208: Matches usage in execute_parallel_tasks.
No issues found.


228-228: Slight docstring refinement describing init_function.
Looks consistent.


265-265: spawner annotation repeated for separate tasks.
Remains consistent.


303-303: Local queue creation for tasks.
No issues found.


461-461: Helper function _update_futures_in_input improved.
Signature is well-defined.


473-473: get_result handles both single Future and list of Futures.
No issues found.


481-481: Inline comprehension is concise and clear.
Implementation looks good.


492-492: Docstring clarifies the expected task dictionary shape.
Enhances maintainability.


518-518: Same approach with typed spawner.
Ensures consistent usage.


528-528: Docstring describing dictionary structure is helpful.
No issues found.


590-590: Docstring details dictionary shape for _execute_task.
Keeps usage clear.


618-618: Docstring clarifies caching approach.
Implementation and usage are correct.

@@ -29,13 +29,13 @@ def cloudpickle_register(ind: int = 2):


def serialize_funct_h5(
fn: callable, fn_args: list = [], fn_kwargs: dict = {}, resource_dict: dict = {}
fn: Callable, fn_args: list = [], fn_kwargs: dict = {}, resource_dict: dict = {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Avoid mutable default arguments.

Using mutable types (list or dict) in function defaults can lead to subtle bugs due to shared references. Consider initializing them inside the function:

-def serialize_funct_h5(
-    fn: Callable, fn_args: list = [], fn_kwargs: dict = {}, resource_dict: dict = {}
-) -> Tuple[str, dict]:
+from typing import Callable, Tuple, Optional, Dict, List

+def serialize_funct_h5(
+    fn: Callable,
+    fn_args: Optional[List] = None,
+    fn_kwargs: Optional[Dict] = None,
+    resource_dict: Optional[Dict] = None,
+) -> Tuple[str, dict]:
+    if fn_args is None:
+        fn_args = []
+    if fn_kwargs is None:
+        fn_kwargs = {}
+    if resource_dict is None:
+        resource_dict = {}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
fn: Callable, fn_args: list = [], fn_kwargs: dict = {}, resource_dict: dict = {}
from typing import Callable, Tuple, Optional, Dict, List
def serialize_funct_h5(
fn: Callable,
fn_args: Optional[List] = None,
fn_kwargs: Optional[Dict] = None,
resource_dict: Optional[Dict] = None,
) -> Tuple[str, dict]:
if fn_args is None:
fn_args = []
if fn_kwargs is None:
fn_kwargs = {}
if resource_dict is None:
resource_dict = {}
🧰 Tools
🪛 Ruff (0.8.2)

32-32: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)


32-32: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)


32-32: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

@@ -62,7 +62,7 @@ def future_queue(self) -> queue.Queue:
"""
return self._future_queue

def submit(self, fn: callable, *args, resource_dict: dict = {}, **kwargs) -> Future:
def submit(self, fn: Callable, *args, resource_dict: dict = {}, **kwargs) -> Future: # type: ignore
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Avoid mutable defaults for resource_dict.
It’s a best practice to avoid using mutable objects (dict, list) as default parameters.

-def submit(self, fn: Callable, *args, resource_dict: dict = {}, **kwargs) -> Future:
+def submit(self, fn: Callable, *args, resource_dict: Optional[dict] = None, **kwargs) -> Future:
    ...
+    if resource_dict is None:
+        resource_dict = {}

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools
🪛 Ruff (0.8.2)

65-65: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

@@ -166,14 +166,14 @@ def create_executor(
backend: str = "local",
max_cores: Optional[int] = None,
cache_directory: Optional[str] = None,
resource_dict: Optional[dict] = None,
resource_dict: dict = {},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Avoid using mutable default argument for resource_dict.
This can lead to subtle bugs. Consider:

- def create_executor(..., resource_dict: dict = {}, ...):
+ def create_executor(..., resource_dict: Optional[dict] = None, ...):
    if resource_dict is None:
        resource_dict = {}

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools
🪛 Ruff (0.8.2)

169-169: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

@@ -23,15 +23,15 @@


class ExecutorBroker(ExecutorBase):
def submit(self, fn: callable, *args, resource_dict: dict = {}, **kwargs) -> Future:
def submit(self, fn: Callable, *args, resource_dict: dict = {}, **kwargs) -> Future: # type: ignore
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Avoid mutable argument default (resource_dict: dict = {}).
This can cause unexpected behavior when reused. Consider:

- def submit(self, fn: Callable, *args, resource_dict: dict = {}, **kwargs) -> Future:
+ def submit(self, fn: Callable, *args, resource_dict: Optional[dict] = None, **kwargs) -> Future:

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools
🪛 Ruff (0.8.2)

26-26: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

Comment on lines +15 to +16
python-version: "3.13"
architecture: x64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Consider using a stable Python version

Python 3.13 is currently in development/alpha and not recommended for CI environments. Consider using the latest stable version (3.12) instead.

-          python-version: "3.13"
+          python-version: "3.12"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
python-version: "3.13"
architecture: x64
python-version: "3.12"
architecture: x64

- name: Checkout
uses: actions/checkout@v4
- name: Install mypy
run: pip install mypy
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Pin MyPy version for consistent CI results

Installing MyPy without version pinning could lead to inconsistent results as new versions are released.

-        run: pip install mypy
+        run: pip install 'mypy==1.8.0'
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
run: pip install mypy
run: pip install 'mypy==1.8.0'

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
executorlib/base/executor.py (1)

65-65: ⚠️ Potential issue

Fix mutable default argument in submit method

While the type annotations and null checks are good improvements, the mutable default argument issue remains.

-def submit(self, fn: Callable, *args, resource_dict: dict = {}, **kwargs) -> Future:
+def submit(self, fn: Callable, *args, resource_dict: Optional[dict] = None, **kwargs) -> Future:
     ...
+    if resource_dict is None:
+        resource_dict = {}

Also applies to: 100-110

🧰 Tools
🪛 Ruff (0.8.2)

65-65: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

🧹 Nitpick comments (2)
executorlib/backend/cache_parallel.py (1)

Line range hint 28-32: Document reason for type ignore

The type ignore comment is valid as MPI libraries often lack proper type hints. However, it would be helpful to document why this type ignore is necessary.

Consider adding a brief comment explaining the type ignore:

-    MPI.pickle.__init__(  # type: ignore
+    # type: ignore - MPI.pickle lacks proper type hints but this usage is safe
+    MPI.pickle.__init__(
executorlib/base/executor.py (1)

42-43: Optimize dictionary key checks

While the change to get_kwargs() improves encapsulation, the dictionary key checks can be simplified.

-if "future_queue" in meta_data_dict.keys():
+if "future_queue" in meta_data_dict:

Also applies to: 48-49

🧰 Tools
🪛 Ruff (0.8.2)

43-43: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1799b48 and 6a73342.

📒 Files selected for processing (4)
  • executorlib/backend/cache_parallel.py (3 hunks)
  • executorlib/base/executor.py (7 hunks)
  • executorlib/standalone/hdf.py (4 hunks)
  • executorlib/standalone/inputcheck.py (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • executorlib/standalone/inputcheck.py
🧰 Additional context used
🪛 Ruff (0.8.2)
executorlib/backend/cache_parallel.py

4-4: typing.Any imported but unused

Remove unused import: typing.Any

(F401)

executorlib/base/executor.py

43-43: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)


49-49: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)


65-65: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

executorlib/standalone/hdf.py

29-29: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

🔇 Additional comments (9)
executorlib/backend/cache_parallel.py (1)

38-42: LGTM! Type-safe initialization of apply_dict

The initialization of apply_dict as an empty dictionary before the conditional improves type safety by ensuring consistent types across all code paths. This is particularly important for the MPI broadcast operation.

Let's verify that this change doesn't affect MPI behavior:

✅ Verification successful

Initialization of apply_dict as empty dictionary is safe for MPI broadcast

The codebase search reveals consistent patterns of MPI broadcast operations with dictionaries:

  • In interactive_parallel.py, a similar pattern exists where a dictionary is broadcast and then accessed
  • Both cases show the broadcast dictionary being immediately used to access keys/values
  • The empty dictionary initialization won't affect the broadcast operation since:
    1. Only rank 0 populates the actual data
    2. Other ranks receive the populated dictionary through broadcast
    3. The empty dict is overwritten by the broadcast operation
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for similar MPI broadcast patterns in the codebase to confirm this is a safe change
rg -A 5 "MPI\.COMM_WORLD\.bcast.*dict.*" --type py

Length of output: 1068

executorlib/standalone/hdf.py (3)

2-2: LGTM: Type imports are complete and well-organized

The added type imports properly support the type annotations introduced in this PR.


63-63: LGTM: Improved type precision

The return type change from object to Any better expresses the function's capability to return any type as the second tuple element.


97-102: LGTM: Robust null handling implementation

The changes properly handle the optional input parameter and maintain type safety throughout the function.

Let's verify the callers can handle None return values:

✅ Verification successful

Let me check the handling of the return value in the queue_spawner.py file.


Null handling is properly implemented across the codebase

The verification shows that the callers handle the None return value correctly:

  • In queue_spawner.py, the code explicitly checks if queue_id is None and handles this case appropriately
  • The test cases in test_cache_hdf.py verify the positive case (returning 123)
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for get_queue_id calls to ensure proper null handling
rg "get_queue_id\(" -A 2

Length of output: 990


Script:

#!/bin/bash
# Get more context around the queue_spawner usage
rg "queue_id = get_queue_id" -B 3 -A 10 executorlib/cache/queue_spawner.py

Length of output: 602

executorlib/base/executor.py (5)

8-8: LGTM: Appropriate typing imports added

The new imports are correctly specified and necessary for the type hints being added throughout the file.


56-56: LGTM: Return type accurately reflects nullable state

The Optional type annotation correctly indicates that the queue might be None.


128-132: LGTM: Improved null safety and type checking

The added null checks for _future_queue and type check for _process improve the robustness of the shutdown process.


155-158: LGTM: Proper null handling in length calculation

The method now safely handles the case where _future_queue is None by returning 0.


30-31: LGTM: Type annotations accurately reflect nullable states and usage patterns

The type annotations have been improved to:

  • Handle nullable states with Optional
  • Support both single and multiple RaisingThread instances

Let's verify the usage patterns of _process:

✅ Verification successful

Let me gather more information to verify the usage patterns of _process and its type consistency.


Type annotation for _process correctly reflects its usage patterns

The verification confirms that the type annotation Optional[Union[RaisingThread, List[RaisingThread]]] accurately matches the implementation:

  • In executorlib/base/executor.py, _process is used both as a single RaisingThread and as a list of RaisingThread instances:
    • Single thread usage: self._process.start()
    • List usage: isinstance(self._process, list) with iteration over self._process
  • Null state is handled: Multiple self._process = None assignments
  • Type-specific operations are properly guarded:
    • List operations are protected by isinstance(self._process, list)
    • Single thread operations are protected by isinstance(self._process, RaisingThread)

Note: The type annotations in other files (like spawner.py and communication.py) are different as they belong to different classes with different purposes.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Verify that _process is used consistently with its type annotation
ast-grep --pattern 'self._process = $_' 

Length of output: 978


Script:

#!/bin/bash
# Check the class definition and inheritance for RaisingThread usage
ast-grep --pattern 'class $_ {
  $$$
  _process: $_
  $$$
}'

# Check assignments of list type to _process
ast-grep --pattern 'self._process = [$$$]'

# Get more context around the process assignments
rg "self._process" -B 2 -A 2

Length of output: 6771

@@ -1,6 +1,7 @@
import pickle
import sys
import time
from typing import Any
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Remove unused import

The Any type from typing module is imported but never used in the code.

Apply this diff to remove the unused import:

-from typing import Any
🧰 Tools
🪛 Ruff (0.8.2)

4-4: typing.Any imported but unused

Remove unused import: typing.Any

(F401)

Comment on lines +18 to +33
def dump(file_name: Optional[str], data_dict: dict) -> None:
"""
Dump data dictionary into HDF5 file

Args:
file_name (str): file name of the HDF5 file as absolute path
data_dict (dict): dictionary containing the python function to be executed {"fn": ..., "args": (), "kwargs": {}}
"""
with h5py.File(file_name, "a") as fname:
for data_key, data_value in data_dict.items():
if data_key in group_dict.keys():
fname.create_dataset(
name="/" + group_dict[data_key],
data=np.void(cloudpickle.dumps(data_value)),
)
if file_name is not None:
with h5py.File(file_name, "a") as fname:
for data_key, data_value in data_dict.items():
if data_key in group_dict.keys():
fname.create_dataset(
name="/" + group_dict[data_key],
data=np.void(cloudpickle.dumps(data_value)),
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Making file_name optional in dump() needs coordinated changes

The verification reveals an inconsistency in the codebase:

  • All current callers of dump() provide a file_name explicitly using keyword arguments
  • The paired load() function still expects file_name: str (non-optional)
  • Making dump() accept None without similar changes to load() creates an asymmetric API

This change could lead to confusion where:

  1. Data could be silently not dumped when None is passed
  2. The corresponding load() call would fail as it doesn't handle None

Recommendations:

  • Either keep file_name as required str in both functions
  • Or make both dump() and load() handle Optional[str] consistently
  • Document the behavior when None is passed to help users understand the implications
🔗 Analysis chain

Type safety improvement and minor optimization needed

The type annotation change to Optional[str] is well-implemented with proper null checking. However, there's a minor optimization opportunity in the dictionary key check.

Apply this optimization:

-                if data_key in group_dict.keys():
+                if data_key in group_dict:

Let's verify the impact of making file_name optional:


Let me verify the usage of the load function since it might be affected by the optional file_name parameter change.


Let me try a different approach to check the load function implementation.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for direct calls to dump() to ensure callers can handle None
rg -l "dump\(" | xargs rg "dump\([^,]+" -A 1

Length of output: 1367


Script:

#!/bin/bash
# Check the load function implementation and its usage
ast-grep --pattern 'def load($$$)'

Length of output: 36


Script:

#!/bin/bash
# Search for load function implementation and its usage
rg "def load" -A 5

Length of output: 362

🧰 Tools
🪛 Ruff (0.8.2)

29-29: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
tests/test_shared_communication.py (1)

89-90: Use a more explicit assertion for clarity.

Instead of checking only the length, you could verify that the returned object is indeed an empty dictionary. This approach prevents ambiguity if the function's return type changes in the future.

Consider applying this change:

- self.assertEqual(len(interface_receive(socket=None)), 0)
+ self.assertDictEqual(interface_receive(socket=None), {})
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6a73342 and a8ed438.

📒 Files selected for processing (3)
  • executorlib/standalone/inputcheck.py (2 hunks)
  • tests/test_shared_communication.py (1 hunks)
  • tests/test_shared_input_check.py (0 hunks)
💤 Files with no reviewable changes (1)
  • tests/test_shared_input_check.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • executorlib/standalone/inputcheck.py

@jan-janssen jan-janssen merged commit 95c9480 into main Dec 24, 2024
28 checks passed
@jan-janssen jan-janssen deleted the mypy branch December 24, 2024 15:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant