Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write cache first #492

Merged
merged 6 commits into from
Nov 11, 2024
Merged

Write cache first #492

merged 6 commits into from
Nov 11, 2024

Conversation

jan-janssen
Copy link
Member

@jan-janssen jan-janssen commented Nov 11, 2024

Summary by CodeRabbit

  • Bug Fixes

    • Enhanced error handling and control flow for task execution with caching, improving reliability.
  • New Features

    • Introduced a new test class for validating task execution with caching.
  • Tests

    • Updated test configuration from max_cores to max_workers to better reflect executor instantiation changes.
    • Added tests for caching functionality and conditional execution based on library availability.
    • Implemented cleanup for cache directories post-test execution.

Copy link
Contributor

coderabbitai bot commented Nov 11, 2024

Walkthrough

The changes in this pull request primarily enhance the _execute_task_with_cache function in executorlib/interactive/shared.py by integrating error handling and control flow directly within the function, rather than relying on a separate function call. This includes improved management of task execution and exception handling. Additionally, the test file tests/test_executor_backend_mpi.py has been updated to change the parameter name from max_cores to max_workers in the TestExecutorBackendCache class, reflecting a shift in configuration focus from cores to workers. New tests related to caching and the h5py library have also been added in tests/test_local_executor.py.

Changes

File Change Summary
executorlib/interactive/shared.py Modified _execute_task_with_cache to integrate execution logic and error handling directly, simplifying control flow.
tests/test_executor_backend_mpi.py Changed parameter max_cores to max_workers in test_meta_executor_parallel_cache method of TestExecutorBackendCache.
tests/test_local_executor.py Added TestFuturePoolCache class with test_execute_task_cache method for testing caching functionality and handling of h5py.

Possibly related PRs

🐰 "In the land of code where rabbits play,
The tasks now run a smoother way.
With workers set to lead the race,
Errors handled with a friendly face.
Caching tasks, oh what a delight,
Hopping through code, everything's right!" 🐇


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between d740d06 and c3190a9.

📒 Files selected for processing (1)
  • tests/test_local_executor.py (3 hunks)
🧰 Additional context used
🪛 Ruff
tests/test_local_executor.py

21-21: h5py imported but unused; consider using importlib.util.find_spec to test for availability

(F401)

🔇 Additional comments (3)
tests/test_local_executor.py (3)

5-5: LGTM! Clean dependency management.

The conditional import of h5py with a skip flag follows best practices for optional dependencies. The shutil import is appropriately added for cleanup operations.

Also applies to: 20-25


486-487: LGTM! Proper cleanup in tearDown.

The tearDown method ensures proper cleanup of the cache directory after each test.


489-524: Consider adding more comprehensive cache tests.

While the current tests cover basic functionality and error handling, consider adding tests for:

  1. Cache hits (verifying results come from cache on repeated executions)
  2. Cache misses (different inputs)
  3. Cache behavior with errors

The suggestion from the previous review for implementing a cache hit test is still valid. Would you like me to help implement these additional test cases?


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (2)
tests/test_executor_backend_mpi.py (1)

Line range hint 100-124: Make time-based assertions more robust.

The current time-based assertions could be flaky due to system load or timing variations. Consider these improvements:

             time_2 = time.time()
-            self.assertTrue(time_2 - time_1 > 1)
+            # Allow for some timing variation but ensure significant delay
+            self.assertGreater(time_2 - time_1, 0.9)
             time_3 = time.time()
             fs_2 = exe.submit(mpi_funct_sleep, 1)
             self.assertEqual(fs_2.result(), [(1, 2, 0), (1, 2, 1)])
             self.assertTrue(fs_2.done())
             time_4 = time.time()
-            self.assertTrue(time_3 - time_4 < 1)
+            # Cached result should be significantly faster
+            self.assertLess(time_4 - time_3, 0.1)

This change:

  1. Uses a slightly lower threshold (0.9s) for the initial execution to account for timing variations
  2. Uses a reasonable upper bound (0.1s) for cached execution
  3. Fixes the time comparison in the cache check (was comparing in wrong order)
executorlib/interactive/shared.py (1)

627-640: Add cache directory validation

The function assumes the cache directory is writable and has sufficient space. This could lead to runtime errors if the directory is not accessible or the disk is full.

Consider adding validation:

+def _validate_cache_directory(cache_dir: str) -> None:
+    """Validate cache directory is writable and has sufficient space."""
+    if not os.access(cache_dir, os.W_OK):
+        raise PermissionError(f"Cache directory {cache_dir} is not writable")
+    import shutil
+    if shutil.disk_usage(cache_dir).free < 1024 * 1024:  # 1MB minimum
+        raise OSError(f"Insufficient disk space in cache directory {cache_dir}")

 def _execute_task_with_cache(...):
     os.makedirs(cache_directory, exist_ok=True)
+    _validate_cache_directory(cache_directory)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 78062fd and 6eb096f.

📒 Files selected for processing (2)
  • executorlib/interactive/shared.py (1 hunks)
  • tests/test_executor_backend_mpi.py (1 hunks)
🔇 Additional comments (2)
tests/test_executor_backend_mpi.py (1)

100-100: ⚠️ Potential issue

Parameter naming inconsistency detected.

The change from max_cores to max_workers in this test method creates an inconsistency with other test methods in the file that still use max_cores (e.g., test_meta_executor_serial, test_meta_executor_single). This suggests an incomplete refactoring.

Let's verify the parameter usage across the codebase:

Consider updating all test methods to use the new max_workers parameter for consistency. Here's the list of methods that need to be updated:

  • test_meta_executor_serial
  • test_meta_executor_single
  • test_oversubscribe
  • test_errors
executorlib/interactive/shared.py (1)

634-638: Verify cleanup on exception

The error handling path shuts down the interface but doesn't clean up any partially written cache files.

Let's verify if there are any other places in the codebase handling cache cleanup:

Comment on lines +631 to +633
data_dict["output"] = result
dump(file_name=file_name, data_dict=data_dict)
f.set_result(result)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider atomic cache writes

The current implementation writes the cache file directly, which could lead to corrupted cache files if the process is interrupted between lines 631-633.

Consider using atomic writes with a temporary file:

-                data_dict["output"] = result
-                dump(file_name=file_name, data_dict=data_dict)
-                f.set_result(result)
+                data_dict["output"] = result
+                temp_file = file_name + '.tmp'
+                dump(file_name=temp_file, data_dict=data_dict)
+                os.replace(temp_file, file_name)
+                f.set_result(result)

Committable suggestion skipped: line range outside the PR's diff.

Comment on lines +627 to +640
f = task_dict.pop("future")
if f.set_running_or_notify_cancel():
try:
result = interface.send_and_receive_dict(input_dict=task_dict)
data_dict["output"] = result
dump(file_name=file_name, data_dict=data_dict)
f.set_result(result)
except Exception as thread_exception:
interface.shutdown(wait=True)
future_queue.task_done()
f.set_exception(exception=thread_exception)
raise thread_exception
else:
future_queue.task_done()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Potential race condition in cache writing

The cache writing operation happens after getting the result but before setting the future result. This could lead to a race condition if multiple processes try to write to the same cache file simultaneously.

Consider adding file locking mechanism:

 try:
     result = interface.send_and_receive_dict(input_dict=task_dict)
     data_dict["output"] = result
+    import fcntl
+    with open(file_name, 'wb') as f:
+        fcntl.flock(f.fileno(), fcntl.LOCK_EX)
     dump(file_name=file_name, data_dict=data_dict)
+        fcntl.flock(f.fileno(), fcntl.LOCK_UN)
     f.set_result(result)

Committable suggestion skipped: line range outside the PR's diff.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (2)
tests/test_local_executor.py (2)

20-25: Consider using consistent dependency checking approach

The code uses different approaches for checking library availability:

  • importlib.util.find_spec for mpi4py
  • try-except for h5py

Consider using the same approach for consistency:

-try:
-    import h5py
-
-    skip_h5py_test = False
-except ImportError:
-    skip_h5py_test = True
+skip_h5py_test = importlib.util.find_spec("h5py") is None
🧰 Tools
🪛 Ruff

21-21: h5py imported but unused; consider using importlib.util.find_spec to test for availability

(F401)


485-488: Consider adding setUp method

The test class has a tearDown method to clean the cache directory but lacks a setUp method to ensure the directory exists and is clean before each test. This could lead to test failures if the directory doesn't exist.

Consider adding:

+def setUp(self):
+    os.makedirs("./cache", exist_ok=True)
+
 def tearDown(self):
     shutil.rmtree("./cache")
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 6eb096f and d740d06.

📒 Files selected for processing (1)
  • tests/test_local_executor.py (3 hunks)
🧰 Additional context used
🪛 Ruff
tests/test_local_executor.py

21-21: h5py imported but unused; consider using importlib.util.find_spec to test for availability

(F401)

Comment on lines +489 to +506
@unittest.skipIf(
skip_h5py_test, "h5py is not installed, so the h5py tests are skipped."
)
def test_execute_task_cache(self):
f = Future()
q = Queue()
q.put({"fn": calc, "args": (), "kwargs": {"i": 1}, "future": f})
q.put({"shutdown": True, "wait": True})
cloudpickle_register(ind=1)
execute_parallel_tasks(
future_queue=q,
cores=1,
openmpi_oversubscribe=False,
spawner=MpiExecSpawner,
cache_directory="./cache",
)
self.assertEqual(f.result(), 1)
q.join()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Enhance cache testing coverage

The current test only verifies basic functionality. Consider adding tests for:

  1. Cache hits (same input executed twice)
  2. Cache misses (different inputs)
  3. Cache behavior with errors
  4. Verification that results are actually coming from cache

Here's a suggested test to verify cache hits:

def test_execute_task_cache_hit(self):
    # First execution
    f1 = Future()
    q1 = Queue()
    q1.put({"fn": calc, "args": (), "kwargs": {"i": 1}, "future": f1})
    q1.put({"shutdown": True, "wait": True})
    cloudpickle_register(ind=1)
    execute_parallel_tasks(
        future_queue=q1,
        cores=1,
        openmpi_oversubscribe=False,
        spawner=MpiExecSpawner,
        cache_directory="./cache",
    )
    result1 = f1.result()
    q1.join()

    # Second execution (should hit cache)
    f2 = Future()
    q2 = Queue()
    q2.put({"fn": calc, "args": (), "kwargs": {"i": 1}, "future": f2})
    q2.put({"shutdown": True, "wait": True})
    execute_parallel_tasks(
        future_queue=q2,
        cores=1,
        openmpi_oversubscribe=False,
        spawner=MpiExecSpawner,
        cache_directory="./cache",
    )
    result2 = f2.result()
    q2.join()

    self.assertEqual(result1, 1)
    self.assertEqual(result2, 1)
    # Verify cache file exists
    self.assertTrue(os.path.exists("./cache"))

Would you like me to generate additional test cases or open a GitHub issue to track this enhancement?

@jan-janssen jan-janssen merged commit d8f84a4 into main Nov 11, 2024
26 checks passed
@jan-janssen jan-janssen deleted the write_cache_first branch November 11, 2024 20:46
@coderabbitai coderabbitai bot mentioned this pull request Dec 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant