Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(torii-core): parallelization #2423

Merged
merged 21 commits into from
Sep 17, 2024
Merged

Conversation

Larkooo
Copy link
Collaborator

@Larkooo Larkooo commented Sep 13, 2024

Summary by CodeRabbit

  • New Features

    • Introduced a max_concurrent_tasks command-line argument for enhanced configurability.
    • Added a new method merge to the Sql struct for combining instances.
    • Implemented a new asynchronous method set in the ModelCache for inserting models.
  • Improvements

    • Enhanced concurrency handling and resource management in the application.
    • Updated event processing logic to allow for more efficient concurrent event handling.
    • Transitioned to shared ownership of processor instances, improving memory safety and concurrency.
    • Enhanced querying functionality in tests with flexible key comparisons for consistent results.
    • Improved test reliability by ensuring database state is fully updated before assertions.
    • Ensured sequential execution of asynchronous tasks for better control flow in tests.

@Larkooo Larkooo marked this pull request as ready for review September 14, 2024 00:28
Copy link

coderabbitai bot commented Sep 14, 2024

Walkthrough

Ohayo, sensei! This pull request introduces a new command-line argument, max_concurrent_tasks, enhancing the application's configurability and concurrency capabilities. It modifies the Engine struct to incorporate a semaphore for managing concurrent tasks and updates various processor traits to support shared ownership with Arc. Additionally, the Sql struct receives a custom Clone implementation and a new merge method for better instance management.

Changes

File Path Change Summary
bin/torii/src/main.rs Added max_concurrent_tasks to Args struct for user-defined limits on concurrent tasks. Updated Engine initialization to include this argument.
crates/torii/core/src/engine.rs Introduced a semaphore in Engine to limit concurrent tasks. Changed world and processors fields to use Arc. Refactored event processing logic for improved efficiency and concurrency.
crates/torii/core/src/processors/mod.rs Added Send + Sync bounds to EventProcessor, BlockProcessor, and TransactionProcessor traits. Updated generate_event_processors_map to accept Arc instead of Box, enhancing shared ownership.
crates/torii/core/src/sql.rs Defined a custom Clone implementation for Sql. Added a merge method for combining Sql instances, with a warning for differing world_address values.
crates/torii/core/src/sql_test.rs Transitioned processor instances from Box::new to Arc::new for improved concurrency. Modified bootstrap_engine to require provider to implement Clone.
crates/torii/graphql/src/tests/entities_test.rs Updated test assertions to allow flexible comparisons for entity keys, accommodating nondeterministic execution order.
crates/torii/graphql/src/tests/mod.rs Wrapped JsonRpcClient in Arc for shared ownership. Updated processor instantiation to use Arc instead of Box.
crates/torii/core/src/cache.rs Added an asynchronous set method to ModelCache for inserting models with thread-safe access.

Possibly related PRs


Recent review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 3e297f9 and 663ed6a.

Files selected for processing (1)
  • crates/torii/core/src/engine.rs (13 hunks)
Additional comments not posted (9)
crates/torii/core/src/engine.rs (9)

28-31: Ohayo sensei! The 'static lifetime bound looks good to me.

Adding the 'static lifetime bound to the generic type P is necessary to store the Processors struct in the Engine struct, which requires all its fields to have a 'static lifetime.


55-55: Ohayo sensei! The new max_concurrent_tasks field looks good.

Adding this field to the EngineConfig struct will allow configuring the maximum number of concurrent tasks for the parallelization feature.


92-98: Ohayo sensei! The ParallelizedEvent struct looks good to me.

The struct contains all the necessary fields to represent an event that will be processed in parallel, including the block details and the event itself.


109-109: Ohayo sensei! The tasks field looks good.

The tasks field is a HashMap that will allow grouping events into tasks based on an identifier, where each task contains a vector of events to be processed in parallel.


475-513: Ohayo sensei! The process_tasks method looks great!

The method efficiently processes the parallelized events using the following techniques:

  • A semaphore limits the number of concurrent tasks based on the max_concurrent_tasks configuration.
  • Cloning the necessary data for each task allows them to be processed independently and in parallel.
  • Using a local database for each task and merging the results back into the main database helps avoid contention and improves performance.
  • The JoinSet allows waiting for all the tasks to complete before continuing.

Overall, this implementation should significantly improve the performance of event processing.


Line range hint 654-719: Ohayo sensei! The changes to the process_event method look great!

The updates to the process_event method efficiently support parallelization:

  • Computing a task_identifier based on the event key and data allows grouping similar events together for parallel processing.
  • Pushing events with a non-zero task_identifier into the tasks HashMap enables them to be processed in parallel by the process_tasks method.
  • Processing events with a zero task_identifier immediately ensures that events without a specific grouping are still processed sequentially.

These changes should significantly improve the performance of event processing while maintaining correctness.


594-594: Ohayo sensei! Removing the transaction_hash argument looks good.

Commenting out the transaction_hash argument in the process_event method call is consistent with the change made in the process_event method itself. This confirms that the transaction_hash is no longer needed for event processing due to the introduction of parallelization.


424-426: Ohayo sensei! The call to process_tasks looks good.

Calling the process_tasks method within process_pending ensures that the parallelized events stored in the tasks field will be processed after the pending events are processed.

To verify the placement of the process_tasks call, run the following script:

Verification successful

Ohayo sensei! The placement of process_tasks is verified and looks great!

The process_tasks method is consistently called in both process_pending and process_range methods, right before setting the head and executing database changes. This ensures that all parallelized tasks are processed before finalizing the state. The implementation also uses a semaphore for managing concurrent tasks, which is a nice touch for resource management.

Keep up the good work, sensei! Your code structure is as harmonious as a well-arranged bonsai tree. 🌳

Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify that `process_tasks` is called after processing pending events.

# Test: Search for the `process_tasks` call within `process_pending`. 
# Expect: The call should be after the loop that processes pending transactions.
rg --type rust -A 10 $'process_tasks\(\)' crates/torii/core/src/engine.rs

Length of output: 853


544-544: Ohayo sensei! Please verify the removal of the transaction_hash argument.

The transaction_hash argument has been commented out in the process_event method. This change is likely related to the introduction of parallelization, where events are processed independently of their associated transactions.

To verify the impact of removing the transaction_hash argument, run the following script:

Verification successful

Ohayo sensei! The removal of the transaction_hash argument looks good, but there's more to clean up!

The transaction_hash argument has been successfully removed from the process_event method call, which is consistent with the parallelization efforts. However, there are still some loose ends to tie up:

  • Several commented-out sections still reference transaction_hash
  • The transaction_hash is still used in other parts of the file

Consider the following actions:

  • Clean up the commented-out code if it's no longer needed
  • Review the remaining uses of transaction_hash to ensure they're still necessary
Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify that the removal of `transaction_hash` does not break event processing.

# Test: Search for uses of `transaction_hash` within the `process_event` method.
# Expect: No uses of `transaction_hash` should be found, confirming that it is safe to remove.
rg --type rust -A 5 $'transaction_hash' crates/torii/core/src/engine.rs

Length of output: 6085


Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    -- I pushed a fix in commit <commit_id>, please review it.
    -- Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    -- @coderabbitai generate unit testing code for this file.
    -- @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    -- @coderabbitai generate interesting stats about this repository and render them as a table.
    -- @coderabbitai read src/utils.ts and generate unit testing code.
    -- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    -- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@Larkooo Larkooo force-pushed the parallelization-tasks branch from bb8b2fc to ff2d7a3 Compare September 16, 2024 17:04
@Larkooo Larkooo changed the title feat: first pass at parallelization feat(torii-core): parallelization Sep 16, 2024
Copy link

codecov bot commented Sep 16, 2024

Codecov Report

Attention: Patch coverage is 83.43195% with 28 lines in your changes missing coverage. Please review.

Project coverage is 68.20%. Comparing base (2d6e5de) to head (663ed6a).
Report is 5 commits behind head on main.

Files with missing lines Patch % Lines
bin/torii/src/main.rs 0.00% 19 Missing ⚠️
crates/torii/core/src/engine.rs 91.30% 6 Missing ⚠️
crates/torii/core/src/sql.rs 93.33% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2423      +/-   ##
==========================================
+ Coverage   68.05%   68.20%   +0.15%     
==========================================
  Files         364      365       +1     
  Lines       47864    48002     +138     
==========================================
+ Hits        32573    32742     +169     
+ Misses      15291    15260      -31     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@glihm glihm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We didn't switch the journal mode to WAL though, may be done in other PR to first evaluate performances as such. But this should be a good improvement.

Regarding the DB queue merging, we should ensure it's not an actual bottleneck.

@glihm glihm merged commit 91a0fd0 into dojoengine:main Sep 17, 2024
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants