Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(katana): optimisations + benchmark setup #2900

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

remybar
Copy link
Contributor

@remybar remybar commented Jan 13, 2025

Description

Parallelize some block processing and add some benches to monitor the performance of sequential/parallelized commits.

Related issue

#2719

Tests

  • Yes
  • No, because they aren't needed
  • No, because I need help

Added to documentation?

  • README.md
  • Dojo Book
  • No documentation needed

Checklist

  • I've formatted my code (scripts/prettier.sh, scripts/rust_fmt.sh, scripts/cairo_fmt.sh)
  • I've linted my code (scripts/clippy.sh, scripts/docs.sh)
  • I've commented my code
  • I've requested a review after addressing the comments

Summary by CodeRabbit

  • New Features

    • Enhanced system performance with parallel processing capabilities for commit operations.
    • Added performance benchmarking tests to evaluate processing under various modes.
  • Refactor

    • Updated dependency and workspace configurations to integrate performance profiling tools and improve overall efficiency.

@glihm glihm changed the title katana: optimisations + benchmark setup perf(katana): optimisations + benchmark setup Jan 13, 2025
@remybar remybar force-pushed the katana-core-parallelized branch from e2dacee to 5519e6b Compare January 21, 2025 10:57
Copy link
Member

@kariy kariy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This benchmark includes data setup within the measured function (ie Uncommitted::commit). We should move the preparation of the test data (all the arguments for Uncommitted::new) outside the benchmark routine to ensure we're measuring only the commit operation's performance, not including the setup overhead.

We can use the Arbitrary trait to derive random values for the test data.

I'm not sure what's the best block composition but maybe we can start with something like 20 transactions, each with 2 events, and a state update of size 100.

@kariy
Copy link
Member

kariy commented Jan 21, 2025

I think what we should do, is have two .commit methods - the serialized and parallelized versions (ie .commit and .commit_parallel) - run two benchmarks against the same test vectors using each method.

@remybar remybar force-pushed the katana-core-parallelized branch from 5519e6b to 3728e6d Compare January 24, 2025 16:03
@remybar
Copy link
Contributor Author

remybar commented Jan 24, 2025

Hi @kariy !

So, I've kept 2 versions of optimized functions (original + parallel one) to be able to create commit and commit_parallel benches.

I used Arbitrary trait to generate random values (not sure if I used the idiomatic way to do this but ...) and set vector sizes to 20 for txs and receipts and 100 for state update stuff.

Maybe it would be a good idea to have 2 sets of data for these benches:

  • a first one with a tiny block to be sure that parallelizing block processings do not add any overhead,
  • a second one with a quite big block, as you proposed, to check that the performance improvement is worth it.

WDYT ?

@remybar remybar force-pushed the katana-core-parallelized branch 2 times, most recently from e9aaea6 to 347b9b0 Compare January 24, 2025 16:11
Copy link

codecov bot commented Jan 24, 2025

Codecov Report

Attention: Patch coverage is 1.51515% with 65 lines in your changes missing coverage. Please review.

Project coverage is 57.04%. Comparing base (d11a7ec) to head (046ecc6).

Files with missing lines Patch % Lines
crates/katana/core/src/backend/mod.rs 1.51% 65 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2900      +/-   ##
==========================================
- Coverage   57.10%   57.04%   -0.07%     
==========================================
  Files         429      429              
  Lines       56868    56933      +65     
==========================================
+ Hits        32475    32476       +1     
- Misses      24393    24457      +64     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@kariy kariy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work @remybar! It's looking good. Can we also include benchmark for block with only 1 tx and very small state updates - to mimic a small block (eg on instant mining when there's only 1 tx in a block)

});
}

criterion_group!(benches, commit_benchmark);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
criterion_group!(benches, commit_benchmark);
criterion_group! {
name = benches;
config = Criterion::default().with_profiler(PProfProfiler::new(100, Output::Flamegraph(None)));
targets = commit_benchmark
}

let block =
UncommittedBlock::new(header, transactions, receipts.as_slice(), &state_updates, provider);

c.bench_function("commit", |b| {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
c.bench_function("commit", |b| {
c.bench_function("Commit.Big.Serial", |b| {

b.iter_batched(|| block.clone(), |input| commit(black_box(input)), BatchSize::SmallInput);
});

c.bench_function("commit_parallel", |b| {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
c.bench_function("commit_parallel", |b| {
c.bench_function("Commit.Big.Parallel", |b| {

@remybar remybar force-pushed the katana-core-parallelized branch from 347b9b0 to 82a0abb Compare February 7, 2025 10:42
@remybar
Copy link
Contributor Author

remybar commented Feb 7, 2025

Awesome work @remybar! It's looking good. Can we also include benchmark for block with only 1 tx and very small state updates - to mimic a small block (eg on instant mining when there's only 1 tx in a block)

Oups, I didn't receive any notification for your message 😕
Sorry for the delay, I've updated this PR with your suggestions + a small block benchmark 👍

@remybar remybar marked this pull request as ready for review February 7, 2025 11:04
Copy link

coderabbitai bot commented Feb 7, 2025

Ohayo sensei!

Walkthrough

This pull request updates dependency management and introduces benchmarking and parallel processing enhancements. It adds and modifies Cargo.toml entries across multiple packages by including the pprof, rayon, and criterion crates, and updates the katana-primitives dependency with new features. A new benchmarking module is added to measure block commit performance, while the backend logic now supports parallel commit methods using rayon for concurrent processing of events and receipts.

Changes

File(s) Change Summary
Cargo.toml (root), crates/katana/.../Cargo.toml, crates/katana/executor/Cargo.toml Added/updated dependencies: pprof (version/features → workspace), rayon (workspace), criterion (dev-dependency), and updated katana-primitives (added "arbitrary" feature). Additionally, a new benchmark entry was introduced in katana/core.
crates/katana/.../benches/commit.rs Added a new benchmarking module including BlockConfig, functions for commit and parallel commit (commit, commit_parallel, build_block, commit_benchmark) as well as random data generator helpers.
crates/katana/.../mod.rs Introduced parallel processing methods: commit_parallel, compute_receipt_commitment_parallel, and compute_event_commitment_parallel, leveraging rayon for concurrent execution.

Sequence Diagram(s)

sequenceDiagram
    participant Bench as Benchmark Suite
    participant Block as UncommittedBlock
    participant Backend as Backend Processing

    Bench->>Block: build_block(config)
    Bench->>Block: commit() / commit_parallel()
    alt Serial Commit
        Block->>Backend: commit()
    else Parallel Commit
        Block->>Backend: commit_parallel()
        Backend->>Backend: compute_event_commitment_parallel()
        Backend->>Backend: compute_receipt_commitment_parallel()
    end
Loading

Possibly related PRs

Suggested reviewers

  • kariy

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
crates/katana/core/src/backend/mod.rs (2)

367-367: Ohayo sensei! Clarify this comment or remove if it’s unneeded.
“optimisation 1” can be more descriptive. Consider including why this optimization is important.


484-503: Ohayo sensei! Great use of par_bridge for events.
Spawning parallel tasks like this can handle many events effectively. Just be mindful of overhead if event counts are small.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d11a7ec and 046ecc6.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (5)
  • Cargo.toml (1 hunks)
  • crates/katana/core/Cargo.toml (3 hunks)
  • crates/katana/core/benches/commit.rs (1 hunks)
  • crates/katana/core/src/backend/mod.rs (5 hunks)
  • crates/katana/executor/Cargo.toml (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: ensure-windows
  • GitHub Check: test
🔇 Additional comments (14)
crates/katana/core/src/backend/mod.rs (3)

24-24: Ohayo sensei! Nice addition of the rayon prelude.
This import sets the stage for parallel iterators, which supercharges concurrency.


398-441: Ohayo sensei! Bravo on the parallel commit method.
Using rayon’s scope with carefully separated variables is safe here. Each closure writes to a distinct variable, eliminating data races. If you expect many large blocks, this concurrency should boost throughput.


453-457: Ohayo sensei! Parallel receipt commitment looks good.
Parallelizing over receipts can reduce total computation time significantly.

crates/katana/core/benches/commit.rs (4)

16-24: Ohayo sensei! BlockConfig is a neat approach.
Using a small struct to bundle block parameters keeps things organized and testable.


46-52: Ohayo sensei! Good separation of serial vs. parallel commit.
Explicitly having two methods (commit vs. commit_parallel) makes benchmark comparisons more transparent.


94-128: Ohayo sensei! Great job building random transactions and receipts.
This is a succinct way to generate test data for the benchmarks. Remember to watch out for any potential fuzz issues if the random values might cause edge cases.


130-195: Ohayo sensei! Solid benchmark definitions for small and big blocks.
These distinct setups will help confirm that parallel processing overhead remains minimal for small inputs, while yielding performance gains for large ones.

crates/katana/executor/Cargo.toml (1)

39-40: Ohayo sensei! Transitioning pprof and rayon to workspace dependencies.
This unifies version management, simplifies maintenance, and helps ensure consistency across crates. Nicely done!

crates/katana/core/Cargo.toml (5)

14-14: Ohayo sensei: Enhanced Dependency for katana-primitives
Including the "arbitrary" feature in the katana-primitives dependency is a solid move to support random value generation in benchmarks and tests. Just double-check its compatibility with other components relying on this dependency.


29-29: Ohayo sensei: Introducing Rayon for Parallelism
Adding rayon.workspace = true is a welcome enhancement to enable parallel processing. This change directly supports the new parallel commit implementations and should help meet performance goals.


52-52: Ohayo sensei: Criterion for Benchmarking
The new criterion.workspace = true dev-dependency is an excellent addition for setting up performance benchmarks. It will help you precisely measure the benefits of parallelized processing.


57-57: Ohayo sensei: pprof for Profiling
Integrating pprof.workspace = true as a development tool supports detailed performance profiling and work in tandem with Criterion—and that's pretty neat!


62-65: Ohayo sensei: Benchmark Configuration Added
The new benchmark definition with name = "commit" and harness = false is well configured and aligns with the goal of comparing sequential versus parallelized commits. Ensure that the corresponding benchmark implementation in crates/katana/core/benches/commit.rs correctly exercises both commit methods.

Cargo.toml (1)

261-263: Ohayo sensei: pprof Dependency for Profiling in Root Cargo.toml
The introduction of pprof = { version = "0.13.0", features = [ "criterion", "flamegraph" ] } in the root Cargo.toml reinforces profiling support across the project. Please verify that this version meshes well with Criterion and your overall benchmarking setup.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants