Add type checking with mypy #535

jan-janssen · 2024-12-24T14:59:09Z

Summary by CodeRabbit

New Features
- Introduced a new GitHub Actions workflow for automated type checking using MyPy.
- Added a method to the RaisingThread class to access keyword arguments.
- Added a test method to verify behavior when receiving from a None socket.
Bug Fixes
- Enhanced error handling in various functions to prevent potential runtime errors when dealing with None values.
Documentation
- Improved type hinting across multiple files for better clarity and consistency.
Refactor
- Updated parameter and return types in several functions and classes to use Optional and Callable for enhanced type safety.

for more information, see https://pre-commit.ci

coderabbitai · 2024-12-24T14:59:16Z

Walkthrough

This pull request introduces a new GitHub Actions workflow for MyPy type checking and implements extensive type annotation improvements across the executorlib package. The changes focus on enhancing type safety by replacing generic callable types with Callable, adding optional type hints, and improving error handling through more robust type checking. These modifications collectively aim to maintain high-quality type annotations throughout the codebase.

Changes

File	Change Summary
`.github/workflows/mypy.yml`	New GitHub Actions workflow for MyPy type checking
`executorlib/__init__.py`	Updated type annotations for `__all__` and `Executor` class parameters
`executorlib/backend/*`	Type hints added for variables and function parameters
`executorlib/base/executor.py`	Enhanced type safety for `_future_queue` and `_process` attributes
`executorlib/cache/*`	Improved type annotations for callable parameters and return types
`executorlib/interactive/*`	Refined type hints for function signatures and error handling
`executorlib/standalone/*`	Comprehensive type annotation updates across multiple files
`tests/test_integration_pyiron_workflow.py`	Type hinting improvements in `Foo` class and `as_dynamic_foo` function
`tests/test_shared_communication.py`	New test method `test_interface_receive` added to `TestZMQ` class
`tests/test_shared_input_check.py`	Removed assertion for `TypeError` in `validate_number_of_cores` function

Poem

🐰 Type Checking Rabbit's Delight
With Callable types, our code shines bright
MyPy workflows, a type-safe flight
No more callable running wild
Our annotations now reconciled!

Hop hop hooray! 🎉

Possibly related PRs

Cache: Use explicit arguments for serialize_funct_h5() #448: Improvements in type annotations for serialize_funct_h5 function, aligning with the current PR's type checking enhancements.
Update validate_number_of_cores() #488: Modifications to the create_executor function to enhance logic for determining max_workers, relating to type safety and clarity objectives of the main PR.

Tip

CodeRabbit's docstrings feature is now available as part of our Early Access Program! Simply use the command @coderabbitai generate docstrings to have CodeRabbit automatically generate docstrings for your pull request. We would love to hear your feedback on Discord.

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 6

🧹 Nitpick comments (24)

executorlib/standalone/thread.py (1)

24-26: Consider adding type hints to get_kwargs.

While the new accessor method is helpful for encapsulation, adding a return type annotation (e.g., -> dict or -> Dict[str, Any]) would improve clarity for both readers and static type checkers.
executorlib/backend/cache_parallel.py (2)
4-4: Remove unused import.

The import Any is not used in this file. Removing it will prevent linting or static analysis warnings such as F401.
-import pickle
-import sys
-import time
-from typing import Any
+import pickle
+import sys
+import time
🧰 Tools

🪛 Ruff (0.8.2)

4-4: typing.Any imported but unused

Remove unused import: typing.Any

(F401)

38-41: Optional: inline the if-else statement.

You could simplify this block by replacing the if-else statement with a ternary expression:
apply_dict = backend_load_file(file_name=file_name) if mpi_rank_zero else {}
However, this is optional. If clarity is more important than brevity, you can keep the current structure.

🧰 Tools

🪛 Ruff (0.8.2)

38-41: Use ternary operator apply_dict = backend_load_file(file_name=file_name) if mpi_rank_zero else {} instead of if-else-block

Replace if-else-block with apply_dict = backend_load_file(file_name=file_name) if mpi_rank_zero else {}

(SIM108)
executorlib/standalone/interactive/backend.py (2)

2-2: Use imports selectively.

Ensure every imported name is used. If there are other members from typing that remain unused, consider removing them to reduce clutter.

17-17: Return type clarity.

Returning Any is appropriate if the function’s output can vary widely. If the return type can be narrowed, consider specifying a more specific type in the future.

executorlib/standalone/serialize.py (1)

4-4: Confirm necessity of Tuple.

Review whether you actually use Tuple from typing. If not, removing it will help maintain a clean import list.
executorlib/standalone/hdf.py (1)
27-33: Suggestion to simplify membership check.
You can streamline the condition by removing .keys():
- if data_key in group_dict.keys():
+ if data_key in group_dict:
🧰 Tools

🪛 Ruff (0.8.2)

29-29: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)
executorlib/standalone/plot.py (1)

21-23: Use more specialized type hints for clarity.

Declaring the exact list and dict types (e.g., list[dict] or dict[str, Any]) can improve clarity and help catch errors during static analysis.

executorlib/standalone/interactive/spawner.py (3)

9-14: Improve constructor docstring to reflect Optional cwd.

The docstring says “cwd (str): …” but the code now specifies Optional[str]. Recommend updating the docstring to clarify that None is the default and permissible value.

80-80: Explicitly type subprocess.Popen as needed.

Declaring Optional[subprocess.Popen[bytes]] or Optional[subprocess.Popen[str]] can further improve clarity on which stream encoding is used (if known).

118-122: Order of calls can be optimized for graceful shutdown.

Calling terminate() before communicate() is usually more common for ensuring the process is signaled before collecting its output. Consider reversing these calls if the process is still expected to be running.

executorlib/cache/executor.py (2)

24-24: Document the rationale for type ignore.

Including a brief comment explaining why type checking should be ignored here helps maintainers understand the trade-offs if pysqa is unavailable.

32-33: Leverage type aliases for commonly used callables.

If the same callable signature is used across multiple classes, consider defining a type alias (e.g., TaskFunction = Callable[..., Any]) to keep code more organized and consistent.
executorlib/base/executor.py (1)
42-48: Simplify membership checks.
According to the static analysis hint (SIM118), consider removing the .keys() call for membership checks:
- if "future_queue" in meta_data_dict.keys():
+ if "future_queue" in meta_data_dict:
🧰 Tools

🪛 Ruff (0.8.2)

43-43: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)
executorlib/standalone/inputcheck.py (1)
121-123: Empty error message.
Providing an explicit message when raising ValueError can improve debugging. Consider adding a short, descriptive message:
- raise ValueError("")
+ raise ValueError("Allocation is blocked, but `init_function` is set. This combination is not supported.")
executorlib/cache/shared.py (2)
24-24: Any return type might be too broad.
Consider narrowing the return type if feasible to better reflect the expected data.

77-79: Consider using more specific type hints.
Defining dictionaries as dict is somewhat vague. For example:
memory_dict: Dict[str, Future] = {}
process_dict: Dict[str, Any] = {}
file_name_dict: Dict[str, str] = {}
This aids in maintainability and readability.
executorlib/interactive/executor.py (3)
84-85: Consider more specific type hints instead of dict.
If the keys/values are known, declare them as Dict[K, V] for clarity.

231-234: Use membership tests without .keys().
- if "openmpi_oversubscribe" in resource_dict.keys():
+ if "openmpi_oversubscribe" in resource_dict:
    del resource_dict["openmpi_oversubscribe"]
Similarly for slurm_cmd_args.

🧰 Tools

🪛 Ruff (0.8.2)

231-231: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

233-233: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

301-306: Use membership tests without .keys().
Apply this pattern elsewhere for consistency.

🧰 Tools

🪛 Ruff (0.8.2)

301-301: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

303-303: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

305-305: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)
executorlib/__init__.py (1)

180-180: Using dict type is fine locally, but consider more specific hints.
If known, specify key-value types for clarity, e.g., Dict[str, Union[int, bool]].
executorlib/interactive/shared.py (1)
289-291: Consider using more specific container types.
For instance:
active_task_dict: Dict[Future, int] = {}
process_lst: List[RaisingThread] = []
qtask_lst: List[Queue] = []
.github/workflows/mypy.yml (2)
22-22: Enhance MyPy configuration and fix missing newline

Consider the following improvements:

Add a mypy.ini configuration file to customize type checking settings

Add error codes to the output for better debugging

Add missing newline at EOF
-        run: mypy --ignore-missing-imports ${{ github.event.repository.name }}
+        run: mypy --ignore-missing-imports --show-error-codes ${{ github.event.repository.name }}
+
Would you like me to help create a mypy.ini configuration file with recommended settings for your project?

🧰 Tools

🪛 yamllint (1.35.1)

[error] 22-22: no new line character at the end of file

(new-line-at-end-of-file)

1-22: Consider adding dependency caching

To improve workflow execution time, consider caching pip dependencies between runs.

Add this step after Python setup:
      - name: Cache pip dependencies
        uses: actions/cache@v3
        with:
          path: ~/.cache/pip
          key: ${{ runner.os }}-pip-${{ hashFiles('**/requirements*.txt') }}
          restore-keys: |
            ${{ runner.os }}-pip-
🧰 Tools

🪛 yamllint (1.35.1)

[error] 22-22: no new line character at the end of file

(new-line-at-end-of-file)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5cf3ecc and 1799b48.

📒 Files selected for processing (22)

.github/workflows/mypy.yml (1 hunks)
executorlib/__init__.py (5 hunks)
executorlib/backend/cache_parallel.py (3 hunks)
executorlib/backend/interactive_parallel.py (4 hunks)
executorlib/base/executor.py (7 hunks)
executorlib/cache/executor.py (5 hunks)
executorlib/cache/queue_spawner.py (1 hunks)
executorlib/cache/shared.py (4 hunks)
executorlib/interactive/executor.py (8 hunks)
executorlib/interactive/flux.py (1 hunks)
executorlib/interactive/shared.py (18 hunks)
executorlib/interactive/slurm.py (1 hunks)
executorlib/standalone/__init__.py (1 hunks)
executorlib/standalone/hdf.py (4 hunks)
executorlib/standalone/inputcheck.py (3 hunks)
executorlib/standalone/interactive/backend.py (1 hunks)
executorlib/standalone/interactive/communication.py (1 hunks)
executorlib/standalone/interactive/spawner.py (3 hunks)
executorlib/standalone/plot.py (1 hunks)
executorlib/standalone/serialize.py (2 hunks)
executorlib/standalone/thread.py (1 hunks)
tests/test_integration_pyiron_workflow.py (3 hunks)

✅ Files skipped from review due to trivial changes (1)

executorlib/standalone/init.py

🧰 Additional context used

🪛 Ruff (0.8.2)

executorlib/backend/cache_parallel.py

4-4: typing.Any imported but unused

Remove unused import: typing.Any

(F401)

38-41: Use ternary operator apply_dict = backend_load_file(file_name=file_name) if mpi_rank_zero else {} instead of if-else-block

Replace if-else-block with apply_dict = backend_load_file(file_name=file_name) if mpi_rank_zero else {}

(SIM108)

executorlib/standalone/serialize.py

32-32: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

32-32: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

32-32: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

executorlib/interactive/executor.py

169-169: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

231-231: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

233-233: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

301-301: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

303-303: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

305-305: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

executorlib/standalone/hdf.py

29-29: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

executorlib/base/executor.py

43-43: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

49-49: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

65-65: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

executorlib/interactive/shared.py

26-26: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

🪛 yamllint (1.35.1)

.github/workflows/mypy.yml

[error] 22-22: no new line character at the end of file

(new-line-at-end-of-file)

🔇 Additional comments (73)

executorlib/standalone/interactive/backend.py (2)

6-7: Signature looks good with explicit Callable and Any.

Switching from callable to Callable improves type clarity. Returning Any is acceptable if the function truly might return any type.

13-13: Docstring alignment with new type hints.

The docstring accurately reflects the updated parameter type. This consistency is good practice.

executorlib/standalone/serialize.py (1)

38-38: Docstring consistency.

The docstring now reflects that fn is a Callable, which matches the updated signature.

executorlib/backend/interactive_parallel.py (5)

4-4: Good addition of type hints.
Importing Optional here clearly communicates that certain values may be None.

7-7: Dependency import is appropriate.
Introducing zmq aligns with usage of zmq.Context and zmq.Socket types.

29-29: No issues with custom MPI pickle initialization.
The # type: ignore comment is acceptable to override MyPy errors here. Ensure that the relevant code is tested for MPI rank-based operations.

38-39: Proper addition of optional context and socket.
Declaring context: Optional[zmq.Context] = None and socket: Optional[zmq.Socket] = None is a clear improvement in type safety.

54-54: Initialization of the input_dict.
Using input_dict: dict = {} ensures a valid dictionary is always broadcasted.

executorlib/standalone/hdf.py (5)

2-2: Expanded usage of type hints.
Importing Any, Optional, and Tuple aligns well with the updated function signatures below.

18-18: Allowing file_name to be optional.
Changing file_name to Optional[str] handles cases where the file might not be specified.

26-26: Defensive check for None value.
Validating that file_name is not None before file operations helps avoid runtime errors.

63-63: Flexible return type for get_output.
Returning Any for the second tuple element is appropriate if the output can be any Python object.

97-105: Optional file name logic is correct.
Falling back to None when the file name is not provided is a valid design choice.

executorlib/interactive/slurm.py (1)

78-78: Improved flexibility for cwd.
Accepting None for cwd expands usability in contexts where a working directory isn't strictly required.

executorlib/cache/queue_spawner.py (1)

19-19: Optional queue ID return type.
Changing the return type to Optional[int] accommodates scenarios where a job submission may not occur or fails. Confirm that callers handle a None return.

executorlib/cache/executor.py (4)

2-2: Good practice importing Callable and Optional explicitly.

This fosters clarity and consistency across the codebase.

46-47: Update docstring to match new default.

Docstring references execute_in_subprocess as the default. Ensure it aligns with the actual default in the code (execute_with_pysqa) unless it is correctly changed later.

84-86: Allowing None for max_workers and max_cores broadens flexibility.

Make sure the rest of the code handles None as intended when scheduling tasks.

96-96: Confirm the constraints on init_function.

The function currently raises an error if init_function is specified. Ensure that’s the intended design or consider implementing the init functionality instead of raising an error.

executorlib/interactive/flux.py (1)

120-126: Prevent attribute errors by checking self._future.

This guard is a welcome improvement, ensuring methods aren’t called on None.

executorlib/base/executor.py (6)

8-8: Imported types look good.
Type imports from typing for Callable, List, Optional, and Union increase clarity and make the code easier to maintain.

30-31: Flexible typing for _future_queue and _process.
Marking these fields as Optional[...] is a good improvement, allowing for clearer checks before usage.

56-56: Consistent return type for future_queue.
Returning Optional[queue.Queue] matches how _future_queue is managed and prevents runtime errors.

100-110: Queue usage verified.
Verifying _future_queue is not None before put is solid and prevents errors.

128-132: Graceful shutdown.
The checks for None before canceling or shutting down maintain robust error handling.

155-158: Returning a qsize or zero.
Returning 0 if _future_queue is None avoids potential AttributeError and is consistent with the type signature.

executorlib/standalone/inputcheck.py (3)

103-118: Optional pmi handling is correct.
The updated checks ensure that invalid or missing pmi values raise the appropriate errors.

Line range hint 181-200: Core validation logic.
Using cores_per_worker=1 by default is sensible. The branching structure for max_cores and max_workers looks robust.

203-203: File existence checks are thorough.
Raising ValueError if file_name is None or missing on disk is a clear and concise approach.

executorlib/standalone/interactive/communication.py (3)

178-187: Allowing None sockets for interface_send.
Ensures safe handling if the socket is missing. This prevents potential AttributeError calls on socket.

190-200: Safe fallback for interface_receive.
Returning an empty dict when socket is None avoids runtime errors and is gracefully handled.

203-213: Robust interface_shutdown.
The checks prevent calls on None objects. This gracefully accommodates partial or no socket/context usage.

tests/test_integration_pyiron_workflow.py (3)

12-12: Switch to Callable import is consistent with type-hinting conventions.

25-25: Constructor type annotation.
Improving from callable to Callable is a good practice for explicit method signatures.

48-48: Decorator signature refined.
The updated as_dynamic_foo(fnc: Callable) approach clarifies usage and aligns with the rest of the codebase.

executorlib/cache/shared.py (3)

6-6: Good use of explicit imports for better type clarity.
No issues found.

52-54: Switch from callable to Callable is a best practice.
This enhances static type checking and clarity.

68-69: Docstring updates reflect the new type annotations accurately.
Everything looks consistent.

executorlib/interactive/executor.py (11)

92-92: # type: ignore may mask useful warnings.
Use it sparingly and justify its presence.

103-103: Well-documented function parameter.
No issues found.

149-149: # type: ignore on exit call.
Please verify if ignoring type checks here is truly needed.

176-176: Switch from Optional[callable] to Optional[Callable] is correct.
No issues found.

221-221: Clarity enhancement: cores_per_worker assignment.
No further issues.

225-227: Good checks for oversubscribe flags.
Logic is correct.

229-229: Retrieving SLURM command args with a default.
No issues found.

250-250: Ensuring threads_per_core has default.
Looks good.

279-279: Similar logic for SLURM.
Matches existing pattern.

297-297: Retrieving GPU data with default zero.
Implementation is correct.

299-299: Consistent approach for slurm_cmd_args.
No issues here.

executorlib/__init__.py (4)

1-1: Imports for Callable and Optional.
This is consistent with updated type usage.

19-19: Explicitly defining __all__ type.
Improves clarity for exported symbols.

103-103: Parameter type updated to Optional[Callable].
Enhances type correctness.

126-126: Same improvement in __new__.
Stays consistent.

executorlib/interactive/shared.py (20)

8-8: Comprehensive type imports.
No issues found.

34-34: Accurate docstring for fn parameter.
Matches the function signature.

53-57: Creating and passing a new Future.
Implementation looks correct.

74-83: Enhanced shutdown logic with queue checks.
Ensures _future_queue is valid. Properly handles wait conditions.

87-87: Type for _set_process method parameter.
Clear usage of List[RaisingThread].

95-96: Starting all processes in a loop.
Good approach. No issues.

137-137: spawner: type[BaseSpawner] = MpiExecSpawner
Type annotation helps clarity.

190-190: Same spawner annotation in InteractiveStepExecutor.
Consistency maintained.

208-208: Matches usage in execute_parallel_tasks.
No issues found.

228-228: Slight docstring refinement describing init_function.
Looks consistent.

265-265: spawner annotation repeated for separate tasks.
Remains consistent.

303-303: Local queue creation for tasks.
No issues found.

461-461: Helper function _update_futures_in_input improved.
Signature is well-defined.

473-473: get_result handles both single Future and list of Futures.
No issues found.

481-481: Inline comprehension is concise and clear.
Implementation looks good.

492-492: Docstring clarifies the expected task dictionary shape.
Enhances maintainability.

518-518: Same approach with typed spawner.
Ensures consistent usage.

528-528: Docstring describing dictionary structure is helpful.
No issues found.

590-590: Docstring details dictionary shape for _execute_task.
Keeps usage clear.

618-618: Docstring clarifies caching approach.
Implementation and usage are correct.

coderabbitai · 2024-12-24T15:05:05Z

executorlib/standalone/serialize.py

@@ -29,13 +29,13 @@ def cloudpickle_register(ind: int = 2):


 def serialize_funct_h5(
-    fn: callable, fn_args: list = [], fn_kwargs: dict = {}, resource_dict: dict = {}
+    fn: Callable, fn_args: list = [], fn_kwargs: dict = {}, resource_dict: dict = {}


⚠️ Potential issue

Avoid mutable default arguments.

Using mutable types (list or dict) in function defaults can lead to subtle bugs due to shared references. Consider initializing them inside the function:

-def serialize_funct_h5( - fn: Callable, fn_args: list = [], fn_kwargs: dict = {}, resource_dict: dict = {} -) -> Tuple[str, dict]: +from typing import Callable, Tuple, Optional, Dict, List +def serialize_funct_h5( + fn: Callable, + fn_args: Optional[List] = None, + fn_kwargs: Optional[Dict] = None, + resource_dict: Optional[Dict] = None, +) -> Tuple[str, dict]: + if fn_args is None: + fn_args = [] + if fn_kwargs is None: + fn_kwargs = {} + if resource_dict is None: + resource_dict = {}

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

fn: Callable, fn_args: list = [], fn_kwargs: dict = {}, resource_dict: dict = {}

from typing import Callable, Tuple, Optional, Dict, List

def serialize_funct_h5(

fn: Callable,

fn_args: Optional[List] = None,

fn_kwargs: Optional[Dict] = None,

resource_dict: Optional[Dict] = None,

) -> Tuple[str, dict]:

if fn_args is None:

fn_args = []

if fn_kwargs is None:

fn_kwargs = {}

if resource_dict is None:

resource_dict = {}

🧰 Tools

🪛 Ruff (0.8.2)

32-32: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

32-32: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

32-32: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

coderabbitai · 2024-12-24T15:05:05Z

executorlib/base/executor.py

@@ -62,7 +62,7 @@ def future_queue(self) -> queue.Queue:
        """
        return self._future_queue

-    def submit(self, fn: callable, *args, resource_dict: dict = {}, **kwargs) -> Future:
+    def submit(self, fn: Callable, *args, resource_dict: dict = {}, **kwargs) -> Future:  # type: ignore


🛠️ Refactor suggestion

Avoid mutable defaults for resource_dict.
It’s a best practice to avoid using mutable objects (dict, list) as default parameters.

-def submit(self, fn: Callable, *args, resource_dict: dict = {}, **kwargs) -> Future: +def submit(self, fn: Callable, *args, resource_dict: Optional[dict] = None, **kwargs) -> Future: ... + if resource_dict is None: + resource_dict = {}

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools

🪛 Ruff (0.8.2)

65-65: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

coderabbitai · 2024-12-24T15:05:05Z

executorlib/interactive/executor.py

@@ -166,14 +166,14 @@ def create_executor(
    backend: str = "local",
    max_cores: Optional[int] = None,
    cache_directory: Optional[str] = None,
-    resource_dict: Optional[dict] = None,
+    resource_dict: dict = {},


🛠️ Refactor suggestion

Avoid using mutable default argument for resource_dict.
This can lead to subtle bugs. Consider:

- def create_executor(..., resource_dict: dict = {}, ...): + def create_executor(..., resource_dict: Optional[dict] = None, ...): if resource_dict is None: resource_dict = {}

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools

🪛 Ruff (0.8.2)

169-169: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

coderabbitai · 2024-12-24T15:05:06Z

executorlib/interactive/shared.py

@@ -23,15 +23,15 @@


 class ExecutorBroker(ExecutorBase):
-    def submit(self, fn: callable, *args, resource_dict: dict = {}, **kwargs) -> Future:
+    def submit(self, fn: Callable, *args, resource_dict: dict = {}, **kwargs) -> Future:  # type: ignore


🛠️ Refactor suggestion

Avoid mutable argument default (resource_dict: dict = {}).
This can cause unexpected behavior when reused. Consider:

- def submit(self, fn: Callable, *args, resource_dict: dict = {}, **kwargs) -> Future: + def submit(self, fn: Callable, *args, resource_dict: Optional[dict] = None, **kwargs) -> Future:

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools

🪛 Ruff (0.8.2)

26-26: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

coderabbitai · 2024-12-24T15:05:06Z

.github/workflows/mypy.yml

+          python-version: "3.13"
+          architecture: x64


⚠️ Potential issue

Consider using a stable Python version

Python 3.13 is currently in development/alpha and not recommended for CI environments. Consider using the latest stable version (3.12) instead.

- python-version: "3.13" + python-version: "3.12"

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

python-version: "3.13"

architecture: x64

python-version: "3.12"

architecture: x64

coderabbitai · 2024-12-24T15:05:06Z

.github/workflows/mypy.yml

+      - name: Checkout
+        uses: actions/checkout@v4
+      - name: Install mypy
+        run: pip install mypy


🛠️ Refactor suggestion

Pin MyPy version for consistent CI results

Installing MyPy without version pinning could lead to inconsistent results as new versions are released.

- run: pip install mypy + run: pip install 'mypy==1.8.0'

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

run: pip install mypy

run: pip install 'mypy==1.8.0'

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (1)

executorlib/base/executor.py (1)
65-65: ⚠️ Potential issue

Fix mutable default argument in submit method

While the type annotations and null checks are good improvements, the mutable default argument issue remains.
-def submit(self, fn: Callable, *args, resource_dict: dict = {}, **kwargs) -> Future:
+def submit(self, fn: Callable, *args, resource_dict: Optional[dict] = None, **kwargs) -> Future:
     ...
+    if resource_dict is None:
+        resource_dict = {}
Also applies to: 100-110

🧰 Tools

🪛 Ruff (0.8.2)

65-65: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

🧹 Nitpick comments (2)

executorlib/backend/cache_parallel.py (1)
Line range hint 28-32: Document reason for type ignore

The type ignore comment is valid as MPI libraries often lack proper type hints. However, it would be helpful to document why this type ignore is necessary.

Consider adding a brief comment explaining the type ignore:
-    MPI.pickle.__init__(  # type: ignore
+    # type: ignore - MPI.pickle lacks proper type hints but this usage is safe
+    MPI.pickle.__init__(
executorlib/base/executor.py (1)
42-43: Optimize dictionary key checks

While the change to get_kwargs() improves encapsulation, the dictionary key checks can be simplified.
-if "future_queue" in meta_data_dict.keys():
+if "future_queue" in meta_data_dict:
Also applies to: 48-49

🧰 Tools

🪛 Ruff (0.8.2)

43-43: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1799b48 and 6a73342.

📒 Files selected for processing (4)

executorlib/backend/cache_parallel.py (3 hunks)
executorlib/base/executor.py (7 hunks)
executorlib/standalone/hdf.py (4 hunks)
executorlib/standalone/inputcheck.py (3 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

executorlib/standalone/inputcheck.py

🧰 Additional context used

🪛 Ruff (0.8.2)

executorlib/backend/cache_parallel.py

4-4: typing.Any imported but unused

Remove unused import: typing.Any

(F401)

executorlib/base/executor.py

43-43: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

49-49: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

65-65: Do not use mutable data structures for argument defaults

Replace with None; initialize within function

(B006)

executorlib/standalone/hdf.py

29-29: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

🔇 Additional comments (9)

executorlib/backend/cache_parallel.py (1)

38-42: LGTM! Type-safe initialization of apply_dict

The initialization of apply_dict as an empty dictionary before the conditional improves type safety by ensuring consistent types across all code paths. This is particularly important for the MPI broadcast operation.

Let's verify that this change doesn't affect MPI behavior:

✅ Verification successful

Initialization of apply_dict as empty dictionary is safe for MPI broadcast

The codebase search reveals consistent patterns of MPI broadcast operations with dictionaries:

In interactive_parallel.py, a similar pattern exists where a dictionary is broadcast and then accessed
Both cases show the broadcast dictionary being immediately used to access keys/values
The empty dictionary initialization won't affect the broadcast operation since:
1. Only rank 0 populates the actual data
2. Other ranks receive the populated dictionary through broadcast
3. The empty dict is overwritten by the broadcast operation

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for similar MPI broadcast patterns in the codebase to confirm this is a safe change
rg -A 5 "MPI\.COMM_WORLD\.bcast.*dict.*" --type py

Length of output: 1068

executorlib/standalone/hdf.py (3)

2-2: LGTM: Type imports are complete and well-organized

The added type imports properly support the type annotations introduced in this PR.

63-63: LGTM: Improved type precision

The return type change from object to Any better expresses the function's capability to return any type as the second tuple element.

97-102: LGTM: Robust null handling implementation

The changes properly handle the optional input parameter and maintain type safety throughout the function.

Let's verify the callers can handle None return values:

✅ Verification successful

Let me check the handling of the return value in the queue_spawner.py file.

Null handling is properly implemented across the codebase

The verification shows that the callers handle the None return value correctly:

In queue_spawner.py, the code explicitly checks if queue_id is None and handles this case appropriately
The test cases in test_cache_hdf.py verify the positive case (returning 123)

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for get_queue_id calls to ensure proper null handling
rg "get_queue_id\(" -A 2

Length of output: 990

Script:

#!/bin/bash
# Get more context around the queue_spawner usage
rg "queue_id = get_queue_id" -B 3 -A 10 executorlib/cache/queue_spawner.py

Length of output: 602

executorlib/base/executor.py (5)

8-8: LGTM: Appropriate typing imports added

The new imports are correctly specified and necessary for the type hints being added throughout the file.

56-56: LGTM: Return type accurately reflects nullable state

The Optional type annotation correctly indicates that the queue might be None.

128-132: LGTM: Improved null safety and type checking

The added null checks for _future_queue and type check for _process improve the robustness of the shutdown process.

155-158: LGTM: Proper null handling in length calculation

The method now safely handles the case where _future_queue is None by returning 0.

30-31: LGTM: Type annotations accurately reflect nullable states and usage patterns

The type annotations have been improved to:

Handle nullable states with Optional
Support both single and multiple RaisingThread instances

Let's verify the usage patterns of _process:

✅ Verification successful

Let me gather more information to verify the usage patterns of _process and its type consistency.

Type annotation for _process correctly reflects its usage patterns

The verification confirms that the type annotation Optional[Union[RaisingThread, List[RaisingThread]]] accurately matches the implementation:

In executorlib/base/executor.py, _process is used both as a single RaisingThread and as a list of RaisingThread instances:
- Single thread usage: self._process.start()
- List usage: isinstance(self._process, list) with iteration over self._process
Null state is handled: Multiple self._process = None assignments
Type-specific operations are properly guarded:
- List operations are protected by isinstance(self._process, list)
- Single thread operations are protected by isinstance(self._process, RaisingThread)

Note: The type annotations in other files (like spawner.py and communication.py) are different as they belong to different classes with different purposes.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Verify that _process is used consistently with its type annotation
ast-grep --pattern 'self._process = $_'

Length of output: 978

Script:

#!/bin/bash
# Check the class definition and inheritance for RaisingThread usage
ast-grep --pattern 'class $_ {
  $$$
  _process: $_
  $$$
}'

# Check assignments of list type to _process
ast-grep --pattern 'self._process = [$$$]'

# Get more context around the process assignments
rg "self._process" -B 2 -A 2

Length of output: 6771

coderabbitai · 2024-12-24T15:35:29Z

executorlib/backend/cache_parallel.py

@@ -1,6 +1,7 @@
 import pickle
 import sys
 import time
+from typing import Any


⚠️ Potential issue

Remove unused import

The Any type from typing module is imported but never used in the code.

Apply this diff to remove the unused import:

-from typing import Any

🧰 Tools

🪛 Ruff (0.8.2)

4-4: typing.Any imported but unused

Remove unused import: typing.Any

(F401)

coderabbitai · 2024-12-24T15:35:29Z

executorlib/standalone/hdf.py

+def dump(file_name: Optional[str], data_dict: dict) -> None:
    """
    Dump data dictionary into HDF5 file

    Args:
        file_name (str): file name of the HDF5 file as absolute path
        data_dict (dict): dictionary containing the python function to be executed {"fn": ..., "args": (), "kwargs": {}}
    """
-    with h5py.File(file_name, "a") as fname:
-        for data_key, data_value in data_dict.items():
-            if data_key in group_dict.keys():
-                fname.create_dataset(
-                    name="/" + group_dict[data_key],
-                    data=np.void(cloudpickle.dumps(data_value)),
-                )
+    if file_name is not None:
+        with h5py.File(file_name, "a") as fname:
+            for data_key, data_value in data_dict.items():
+                if data_key in group_dict.keys():
+                    fname.create_dataset(
+                        name="/" + group_dict[data_key],
+                        data=np.void(cloudpickle.dumps(data_value)),
+                    )


💡 Codebase verification

Making file_name optional in dump() needs coordinated changes

The verification reveals an inconsistency in the codebase:

All current callers of dump() provide a file_name explicitly using keyword arguments

The paired load() function still expects file_name: str (non-optional)

Making dump() accept None without similar changes to load() creates an asymmetric API

This change could lead to confusion where:

Data could be silently not dumped when None is passed

The corresponding load() call would fail as it doesn't handle None

Recommendations:

Either keep file_name as required str in both functions

Or make both dump() and load() handle Optional[str] consistently

Document the behavior when None is passed to help users understand the implications

🔗 Analysis chain

Type safety improvement and minor optimization needed

The type annotation change to Optional[str] is well-implemented with proper null checking. However, there's a minor optimization opportunity in the dictionary key check.

Apply this optimization:

- if data_key in group_dict.keys(): + if data_key in group_dict:

Let's verify the impact of making file_name optional:

Let me verify the usage of the load function since it might be affected by the optional file_name parameter change.

Let me try a different approach to check the load function implementation.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash # Search for direct calls to dump() to ensure callers can handle None rg -l "dump\(" | xargs rg "dump\([^,]+" -A 1

Length of output: 1367

Script:

#!/bin/bash # Check the load function implementation and its usage ast-grep --pattern 'def load($$$)'

Length of output: 36

Script:

#!/bin/bash # Search for load function implementation and its usage rg "def load" -A 5

Length of output: 362

🧰 Tools

🪛 Ruff (0.8.2)

29-29: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

tests/test_shared_communication.py (1)
89-90: Use a more explicit assertion for clarity.

Instead of checking only the length, you could verify that the returned object is indeed an empty dictionary. This approach prevents ambiguity if the function's return type changes in the future.

Consider applying this change:
- self.assertEqual(len(interface_receive(socket=None)), 0)
+ self.assertDictEqual(interface_receive(socket=None), {})

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6a73342 and a8ed438.

📒 Files selected for processing (3)

executorlib/standalone/inputcheck.py (2 hunks)
tests/test_shared_communication.py (1 hunks)
tests/test_shared_input_check.py (0 hunks)

💤 Files with no reviewable changes (1)

tests/test_shared_input_check.py

🚧 Files skipped from review as they are similar to previous changes (1)

executorlib/standalone/inputcheck.py

jan-janssen and others added 2 commits December 24, 2024 15:58

Add type checking with mypy

1607bb0

[pre-commit.ci] auto fixes from pre-commit.com hooks

1799b48

for more information, see https://pre-commit.ci

coderabbitai bot reviewed Dec 24, 2024

View reviewed changes

Improve coverage

6a73342

coderabbitai bot reviewed Dec 24, 2024

View reviewed changes

more fixes

a8ed438

coderabbitai bot reviewed Dec 24, 2024

View reviewed changes

jan-janssen merged commit 95c9480 into main Dec 24, 2024
28 checks passed

jan-janssen deleted the mypy branch December 24, 2024 15:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add type checking with mypy #535

Add type checking with mypy #535

jan-janssen commented Dec 24, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 24, 2024 •

edited

Loading

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

coderabbitai bot left a comment

coderabbitai bot Dec 24, 2024

coderabbitai bot Dec 24, 2024

coderabbitai bot Dec 24, 2024

coderabbitai bot Dec 24, 2024

coderabbitai bot Dec 24, 2024

coderabbitai bot Dec 24, 2024

coderabbitai bot left a comment

coderabbitai bot Dec 24, 2024

coderabbitai bot Dec 24, 2024

coderabbitai bot left a comment

-    fn: Callable, fn_args: list = [], fn_kwargs: dict = {}, resource_dict: dict = {}
+from typing import Callable, Tuple, Optional, Dict, List
+def serialize_funct_h5(
+    fn: Callable,
+    fn_args: Optional[List] = None,
+    fn_kwargs: Optional[Dict] = None,
+    resource_dict: Optional[Dict] = None,
+) -> Tuple[str, dict]:
+    if fn_args is None:
+        fn_args = []
+    if fn_kwargs is None:
+        fn_kwargs = {}
+    if resource_dict is None:
+        resource_dict = {}

Add type checking with mypy #535

Add type checking with mypy #535

Conversation

jan-janssen commented Dec 24, 2024 • edited by coderabbitai bot Loading

Summary by CodeRabbit

coderabbitai bot commented Dec 24, 2024 • edited Loading

Walkthrough

Changes

Poem

Possibly related PRs

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Dec 24, 2024

Choose a reason for hiding this comment

coderabbitai bot Dec 24, 2024

Choose a reason for hiding this comment

coderabbitai bot Dec 24, 2024

Choose a reason for hiding this comment

coderabbitai bot Dec 24, 2024

Choose a reason for hiding this comment

coderabbitai bot Dec 24, 2024

Choose a reason for hiding this comment

coderabbitai bot Dec 24, 2024

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Dec 24, 2024

Choose a reason for hiding this comment

coderabbitai bot Dec 24, 2024

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

jan-janssen commented Dec 24, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 24, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)