Write cache first #492

jan-janssen · 2024-11-11T06:47:24Z

Summary by CodeRabbit

Bug Fixes
- Enhanced error handling and control flow for task execution with caching, improving reliability.
New Features
- Introduced a new test class for validating task execution with caching.
Tests
- Updated test configuration from max_cores to max_workers to better reflect executor instantiation changes.
- Added tests for caching functionality and conditional execution based on library availability.
- Implemented cleanup for cache directories post-test execution.

coderabbitai · 2024-11-11T06:47:31Z

Walkthrough

The changes in this pull request primarily enhance the _execute_task_with_cache function in executorlib/interactive/shared.py by integrating error handling and control flow directly within the function, rather than relying on a separate function call. This includes improved management of task execution and exception handling. Additionally, the test file tests/test_executor_backend_mpi.py has been updated to change the parameter name from max_cores to max_workers in the TestExecutorBackendCache class, reflecting a shift in configuration focus from cores to workers. New tests related to caching and the h5py library have also been added in tests/test_local_executor.py.

Changes

File	Change Summary
`executorlib/interactive/shared.py`	Modified `_execute_task_with_cache` to integrate execution logic and error handling directly, simplifying control flow.
`tests/test_executor_backend_mpi.py`	Changed parameter `max_cores` to `max_workers` in `test_meta_executor_parallel_cache` method of `TestExecutorBackendCache`.
`tests/test_local_executor.py`	Added `TestFuturePoolCache` class with `test_execute_task_cache` method for testing caching functionality and handling of `h5py`.

Possibly related PRs

Use os.makedirs() instead of os.mkdir() #445: Modifications to the execute_task_with_cache function, enhancing error handling and control flow.
Fix resource dict in interactive jobs #482: Changes to how the resource_dict is accessed in execute_task_with_cache, improving robustness.
Fix working directory #481: Introduction of a cache_directory parameter in execute_tasks_h5, relevant to task management and caching.
Add option to disable_dependencies for cache #477: Addition of a disable_dependencies parameter in execute_tasks_h5, affecting control flow and error handling.
Cache: Use explicit arguments for serialize_funct_h5() #448: Modifications to execute_tasks_h5 to use explicit arguments for serialize_funct_h5, relating to task execution management.

🐰 "In the land of code where rabbits play,
The tasks now run a smoother way.
With workers set to lead the race,
Errors handled with a friendly face.
Caching tasks, oh what a delight,
Hopping through code, everything's right!" 🐇

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between d740d06 and c3190a9.

📒 Files selected for processing (1)

tests/test_local_executor.py (3 hunks)

🧰 Additional context used

🪛 Ruff

tests/test_local_executor.py

21-21: h5py imported but unused; consider using importlib.util.find_spec to test for availability

(F401)

🔇 Additional comments (3)

tests/test_local_executor.py (3)

5-5: LGTM! Clean dependency management.

The conditional import of h5py with a skip flag follows best practices for optional dependencies. The shutil import is appropriately added for cleanup operations.

Also applies to: 20-25

486-487: LGTM! Proper cleanup in tearDown.

The tearDown method ensures proper cleanup of the cache directory after each test.

489-524: Consider adding more comprehensive cache tests.

While the current tests cover basic functionality and error handling, consider adding tests for:

Cache hits (verifying results come from cache on repeated executions)
Cache misses (different inputs)
Cache behavior with errors

The suggestion from the previous review for implementing a cache hit test is still valid. Would you like me to help implement these additional test cases?

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (2)

tests/test_executor_backend_mpi.py (1)
Line range hint 100-124: Make time-based assertions more robust.

The current time-based assertions could be flaky due to system load or timing variations. Consider these improvements:
             time_2 = time.time()
-            self.assertTrue(time_2 - time_1 > 1)
+            # Allow for some timing variation but ensure significant delay
+            self.assertGreater(time_2 - time_1, 0.9)
             time_3 = time.time()
             fs_2 = exe.submit(mpi_funct_sleep, 1)
             self.assertEqual(fs_2.result(), [(1, 2, 0), (1, 2, 1)])
             self.assertTrue(fs_2.done())
             time_4 = time.time()
-            self.assertTrue(time_3 - time_4 < 1)
+            # Cached result should be significantly faster
+            self.assertLess(time_4 - time_3, 0.1)
This change:

Uses a slightly lower threshold (0.9s) for the initial execution to account for timing variations

Uses a reasonable upper bound (0.1s) for cached execution

Fixes the time comparison in the cache check (was comparing in wrong order)
executorlib/interactive/shared.py (1)
627-640: Add cache directory validation

The function assumes the cache directory is writable and has sufficient space. This could lead to runtime errors if the directory is not accessible or the disk is full.

Consider adding validation:
+def _validate_cache_directory(cache_dir: str) -> None:
+    """Validate cache directory is writable and has sufficient space."""
+    if not os.access(cache_dir, os.W_OK):
+        raise PermissionError(f"Cache directory {cache_dir} is not writable")
+    import shutil
+    if shutil.disk_usage(cache_dir).free < 1024 * 1024:  # 1MB minimum
+        raise OSError(f"Insufficient disk space in cache directory {cache_dir}")

 def _execute_task_with_cache(...):
     os.makedirs(cache_directory, exist_ok=True)
+    _validate_cache_directory(cache_directory)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 78062fd and 6eb096f.

📒 Files selected for processing (2)

executorlib/interactive/shared.py (1 hunks)
tests/test_executor_backend_mpi.py (1 hunks)

🔇 Additional comments (2)

tests/test_executor_backend_mpi.py (1)

100-100: ⚠️ Potential issue

Parameter naming inconsistency detected.

The change from max_cores to max_workers in this test method creates an inconsistency with other test methods in the file that still use max_cores (e.g., test_meta_executor_serial, test_meta_executor_single). This suggests an incomplete refactoring.

Let's verify the parameter usage across the codebase:

Consider updating all test methods to use the new max_workers parameter for consistency. Here's the list of methods that need to be updated:

test_meta_executor_serial
test_meta_executor_single
test_oversubscribe
test_errors

executorlib/interactive/shared.py (1)

634-638: Verify cleanup on exception

The error handling path shuts down the interface but doesn't clean up any partially written cache files.

Let's verify if there are any other places in the codebase handling cache cleanup:

coderabbitai · 2024-11-11T06:52:36Z

executorlib/interactive/shared.py

+                data_dict["output"] = result
+                dump(file_name=file_name, data_dict=data_dict)
+                f.set_result(result)


🛠️ Refactor suggestion

Consider atomic cache writes

The current implementation writes the cache file directly, which could lead to corrupted cache files if the process is interrupted between lines 631-633.

Consider using atomic writes with a temporary file:

- data_dict["output"] = result - dump(file_name=file_name, data_dict=data_dict) - f.set_result(result) + data_dict["output"] = result + temp_file = file_name + '.tmp' + dump(file_name=temp_file, data_dict=data_dict) + os.replace(temp_file, file_name) + f.set_result(result)

Committable suggestion skipped: line range outside the PR's diff.

coderabbitai · 2024-11-11T06:52:36Z

executorlib/interactive/shared.py

+        f = task_dict.pop("future")
+        if f.set_running_or_notify_cancel():
+            try:
+                result = interface.send_and_receive_dict(input_dict=task_dict)
+                data_dict["output"] = result
+                dump(file_name=file_name, data_dict=data_dict)
+                f.set_result(result)
+            except Exception as thread_exception:
+                interface.shutdown(wait=True)
+                future_queue.task_done()
+                f.set_exception(exception=thread_exception)
+                raise thread_exception
+            else:
+                future_queue.task_done()


⚠️ Potential issue

Potential race condition in cache writing

The cache writing operation happens after getting the result but before setting the future result. This could lead to a race condition if multiple processes try to write to the same cache file simultaneously.

Consider adding file locking mechanism:

try: result = interface.send_and_receive_dict(input_dict=task_dict) data_dict["output"] = result + import fcntl + with open(file_name, 'wb') as f: + fcntl.flock(f.fileno(), fcntl.LOCK_EX) dump(file_name=file_name, data_dict=data_dict) + fcntl.flock(f.fileno(), fcntl.LOCK_UN) f.set_result(result)

Committable suggestion skipped: line range outside the PR's diff.

coderabbitai

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (2)

tests/test_local_executor.py (2)
20-25: Consider using consistent dependency checking approach

The code uses different approaches for checking library availability:

importlib.util.find_spec for mpi4py

try-except for h5py

Consider using the same approach for consistency:
-try:
-    import h5py
-
-    skip_h5py_test = False
-except ImportError:
-    skip_h5py_test = True
+skip_h5py_test = importlib.util.find_spec("h5py") is None
🧰 Tools

🪛 Ruff

21-21: h5py imported but unused; consider using importlib.util.find_spec to test for availability

(F401)

485-488: Consider adding setUp method

The test class has a tearDown method to clean the cache directory but lacks a setUp method to ensure the directory exists and is clean before each test. This could lead to test failures if the directory doesn't exist.

Consider adding:
+def setUp(self):
+    os.makedirs("./cache", exist_ok=True)
+
 def tearDown(self):
     shutil.rmtree("./cache")

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 6eb096f and d740d06.

📒 Files selected for processing (1)

tests/test_local_executor.py (3 hunks)

🧰 Additional context used

🪛 Ruff

tests/test_local_executor.py

21-21: h5py imported but unused; consider using importlib.util.find_spec to test for availability

(F401)

coderabbitai · 2024-11-11T16:51:28Z

tests/test_local_executor.py

+    @unittest.skipIf(
+        skip_h5py_test, "h5py is not installed, so the h5py tests are skipped."
+    )
+    def test_execute_task_cache(self):
+        f = Future()
+        q = Queue()
+        q.put({"fn": calc, "args": (), "kwargs": {"i": 1}, "future": f})
+        q.put({"shutdown": True, "wait": True})
+        cloudpickle_register(ind=1)
+        execute_parallel_tasks(
+            future_queue=q,
+            cores=1,
+            openmpi_oversubscribe=False,
+            spawner=MpiExecSpawner,
+            cache_directory="./cache",
+        )
+        self.assertEqual(f.result(), 1)
+        q.join()


🛠️ Refactor suggestion

Enhance cache testing coverage

The current test only verifies basic functionality. Consider adding tests for:

Cache hits (same input executed twice)

Cache misses (different inputs)

Cache behavior with errors

Verification that results are actually coming from cache

Here's a suggested test to verify cache hits:

def test_execute_task_cache_hit(self): # First execution f1 = Future() q1 = Queue() q1.put({"fn": calc, "args": (), "kwargs": {"i": 1}, "future": f1}) q1.put({"shutdown": True, "wait": True}) cloudpickle_register(ind=1) execute_parallel_tasks( future_queue=q1, cores=1, openmpi_oversubscribe=False, spawner=MpiExecSpawner, cache_directory="./cache", ) result1 = f1.result() q1.join() # Second execution (should hit cache) f2 = Future() q2 = Queue() q2.put({"fn": calc, "args": (), "kwargs": {"i": 1}, "future": f2}) q2.put({"shutdown": True, "wait": True}) execute_parallel_tasks( future_queue=q2, cores=1, openmpi_oversubscribe=False, spawner=MpiExecSpawner, cache_directory="./cache", ) result2 = f2.result() q2.join() self.assertEqual(result1, 1) self.assertEqual(result2, 1) # Verify cache file exists self.assertTrue(os.path.exists("./cache"))

Would you like me to generate additional test cases or open a GitHub issue to track this enhancement?

for more information, see https://pre-commit.ci

jan-janssen added 2 commits November 11, 2024 07:46

revert test

882aa4f

Write cache before updating future

6eb096f

coderabbitai bot reviewed Nov 11, 2024

View reviewed changes

jan-janssen added 2 commits November 11, 2024 17:45

Add more tests

d740d06

Merge remote-tracking branch 'origin/main' into write_cache_first

0c2548c

coderabbitai bot reviewed Nov 11, 2024

View reviewed changes

jan-janssen and others added 2 commits November 11, 2024 21:34

Add test for error

b156e1c

[pre-commit.ci] auto fixes from pre-commit.com hooks

c3190a9

for more information, see https://pre-commit.ci

jan-janssen merged commit d8f84a4 into main Nov 11, 2024
26 checks passed

jan-janssen deleted the write_cache_first branch November 11, 2024 20:46

This was referenced Nov 14, 2024

remove unused future object #497

Merged

Queuing system submission: check if the job is already waiting in the queue or currently running. #499

Merged

coderabbitai bot mentioned this pull request Dec 17, 2024

Get data from cache #525

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Write cache first #492

Write cache first #492

jan-janssen commented Nov 11, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 11, 2024 •

edited

Loading

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

coderabbitai bot left a comment

coderabbitai bot Nov 11, 2024

coderabbitai bot Nov 11, 2024

coderabbitai bot left a comment

coderabbitai bot Nov 11, 2024

Write cache first #492

Write cache first #492

Conversation

jan-janssen commented Nov 11, 2024 • edited by coderabbitai bot Loading

Summary by CodeRabbit

coderabbitai bot commented Nov 11, 2024 • edited Loading

Walkthrough

Changes

Possibly related PRs

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Nov 11, 2024

Choose a reason for hiding this comment

coderabbitai bot Nov 11, 2024

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Nov 11, 2024

Choose a reason for hiding this comment

jan-janssen commented Nov 11, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 11, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)