Feat/Msm with gpu (metal shader language) on laptop #150

FoodChain1028 · 2024-06-02T23:21:27Z

Accumulation state algorithm in Metal is implemented at mopro-core/src/middleware/gpu_explorations/metal/.
points and operations using BLS12381 curve was integrated and we converted them into BN254.
current report

The result of metal_msm that runs 2^10 (1024) randomly-chosen points and scalars:

Run metal msm benchmarking
    ```bash
    cargo test --release --package mopro-core --lib -- middleware::gpu_explorations::metal::msm::tests::test_benchmark_msm --exact --nocapture
    ```
    
    Result:
    ```
    Average time to execute MSM with 1024 points and scalars in 1 iterations is: 7.059988125s
    Average time to execute MSM with 1024 points and scalars in 1 iterations is: 7.037813333s
    Average time to execute MSM with 1024 points and scalars in 1 iterations is: 6.948694125s
    Average time to execute MSM with 1024 points and scalars in 1 iterations is: 6.983621s
    Average time to execute MSM with 1024 points and scalars in 1 iterations is: 7.005291583s
    Average time to execute MSM with 1024 points and scalars in 1 iterations is: 7.019227875s
    Average time to execute MSM with 1024 points and scalars in 1 iterations is: 7.026516125s
    Average time to execute MSM with 1024 points and scalars in 1 iterations is: 7.037779959s
    test middleware::gpu_explorations::metal::msm::tests::test_benchmark_msm has been running for over 60 seconds
    Average time to execute MSM with 1024 points and scalars in 1 iterations is: 7.017979667s
    Average time to execute MSM with 1024 points and scalars in 1 iterations is: 7.042247291s
    Done running benchmark: Ok([7.059988125s, 7.037813333s, 6.948694125s, 6.983621s, 7.005291583s, 7.019227875s, 7.026516125s, 7.037779959s, 7.017979667s, 7.042247291s])
    test middleware::gpu_explorations::metal::msm::tests::test_benchmark_msm ... ok
    ```

cloudflare-workers-and-pages · 2024-06-03T00:16:33Z

Deploying mopro with Cloudflare Pages

Latest commit:	`e99f439`
Status:	✅ Deploy successful!
Preview URL:	https://7b727b4d.mopro.pages.dev
Branch Preview URL:	https://feat-msm-with-gpu-on-laptop.mopro.pages.dev

View logs

vivianjeng

thank you for this PR
please try to rebase the branch and fix these warnings

Warnings

warning: unused import: `G1Affine as GAffine`
 --> mopro-core/src/middleware/gpu_explorations/arkworks_pippenger.rs:1:36
  |
1 | use ark_bn254::{Fr as ScalarField, G1Affine as GAffine, G1Projective as G};
  |                                    ^^^^^^^^^^^^^^^^^^^
  |
  = note: `#[warn(unused_imports)]` on by default

warning: unused import: `AffineRepr`
 --> mopro-core/src/middleware/gpu_explorations/arkworks_pippenger.rs:2:14
  |
2 | use ark_ec::{AffineRepr, VariableBaseMSM};
  |              ^^^^^^^^^^

warning: unused import: `ark_ff::BigInt`
 --> mopro-core/src/middleware/gpu_explorations/arkworks_pippenger.rs:3:5
  |
3 | use ark_ff::BigInt;
  |     ^^^^^^^^^^^^^^

warning: unused import: `ark_serialize::CanonicalDeserialize`
 --> mopro-core/src/middleware/gpu_explorations/arkworks_pippenger.rs:4:5
  |
4 | use ark_serialize::CanonicalDeserialize;
  |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

warning: unused import: `CurveGroup`
 --> mopro-core/src/middleware/gpu_explorations/metal/msm.rs:2:26
  |
2 | use ark_ec::{AffineRepr, CurveGroup, Group, VariableBaseMSM};
  |                          ^^^^^^^^^^

warning: unused imports: `BigInteger256`, `BigInteger`, `UniformRand`
 --> mopro-core/src/middleware/gpu_explorations/metal/msm.rs:4:18
  |
4 |     biginteger::{BigInteger, BigInteger256},
  |                  ^^^^^^^^^^  ^^^^^^^^^^^^^
5 |     PrimeField, UniformRand,
  |                 ^^^^^^^^^^^

warning: unused imports: `One`, `rand`
 --> mopro-core/src/middleware/gpu_explorations/metal/msm.rs:7:30
  |
7 | use ark_std::{cfg_into_iter, rand, vec::Vec, One, Zero};
  |                              ^^^^            ^^^

warning: unused import: `ark_serialize::CanonicalDeserialize`
  --> mopro-core/src/middleware/gpu_explorations/metal/msm.rs:10:5
   |
10 | use ark_serialize::CanonicalDeserialize;
   |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

warning: unused imports: `BigInteger256`, `BigInteger`
  --> mopro-core/src/middleware/gpu_explorations/metal/tests/test_bn254.rs:10:22
   |
10 |         biginteger::{BigInteger, BigInteger256},
   |                      ^^^^^^^^^^  ^^^^^^^^^^^^^

warning: unused import: `FqConfig`
 --> mopro-core/src/middleware/gpu_explorations/metal/tests/test_msm.rs:3:25
  |
3 |     use ark_bn254::{Fq, FqConfig, Fr as ScalarField, G1Affine as GAffine, G1Projective as G};
  |                         ^^^^^^^^

warning: unused imports: `BigInteger256`, `BigInteger`
 --> mopro-core/src/middleware/gpu_explorations/metal/tests/test_msm.rs:6:22
  |
6 |         biginteger::{BigInteger, BigInteger256},
  |                      ^^^^^^^^^^  ^^^^^^^^^^^^^

warning: unused import: `One`
 --> mopro-core/src/middleware/gpu_explorations/metal/tests/test_msm.rs:9:50
  |
9 |     use ark_std::{cfg_into_iter, rand, vec::Vec, One, Zero};
  |                                                  ^^^

warning: unused import: `G1Projective as G`
 --> mopro-core/src/middleware/gpu_explorations/utils/preprocess.rs:1:57
  |
1 | use ark_bn254::{Fr as ScalarField, G1Affine as GAffine, G1Projective as G};
  |                                                         ^^^^^^^^^^^^^^^^^

warning: unused import: `Field`
 --> mopro-core/src/middleware/gpu_explorations/utils/preprocess.rs:2:14
  |
2 | use ark_ff::{Field, PrimeField};
  |              ^^^^^

warning: unused import: `Rng`
 --> mopro-core/src/middleware/gpu_explorations/utils/preprocess.rs:5:12
  |
5 |     rand::{Rng, RngCore},
  |            ^^^

warning: unused variable: `n`
   --> mopro-core/src/middleware/gpu_explorations/metal/tests/test_bn254.rs:347:29
    |
347 |             fn rand_point()(n in any::<u8>()) -> G {
    |                             ^ help: if this is intentional, prefix it with an underscore: `_n`
    |
    = note: `#[warn(unused_variables)]` on by default

warning: unused `Result` that must be used
   --> mopro-core/src/middleware/gpu_explorations/arkworks_pippenger.rs:151:9
    |
151 |         writeln!(output_file, "msm_size,num_msm,avg_processing_time(ms)");
    |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    |
    = note: this `Result` may be an `Err` variant, which should be handled
    = note: `#[warn(unused_must_use)]` on by default
    = note: this warning originates in the macro `writeln` (in Nightly builds, run with -Z macro-backtrace for more info)

warning: unused `Result` that must be used
   --> mopro-core/src/middleware/gpu_explorations/arkworks_pippenger.rs:166:17
    |
166 | /                 writeln!(
167 | |                     output_file,
168 | |                     "{},{},{}",
169 | |                     result.instance_size, result.num_instance, result.avg_processing_time
170 | |                 );
    | |_________________^
    |
    = note: this `Result` may be an `Err` variant, which should be handled
    = note: this warning originates in the macro `writeln` (in Nightly builds, run with -Z macro-backtrace for more info)

warning: call to `.clone()` on a reference in this situation does nothing
  --> mopro-core/src/middleware/gpu_explorations/metal/msm.rs:47:47
   |
47 |     let scalars_limbs = cfg_into_iter!(scalars.clone())
   |                                               ^^^^^^^^ help: remove this redundant call
   |
   = note: the type `[ark_ff::Fp<MontBackend<FrConfig, 4>, 4>]` does not implement `Clone`, so calling `clone` on `&[ark_ff::Fp<MontBackend<FrConfig, 4>, 4>]` copies the reference, which does not do anything and can be removed
   = note: `#[warn(noop_method_call)]` on by default

warning: unused `Result` that must be used
   --> mopro-core/src/middleware/gpu_explorations/metal/msm.rs:320:9
    |
320 |         writeln!(output_file, "msm_size,num_msm,avg_processing_time(ms)");
    |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    |
    = note: this `Result` may be an `Err` variant, which should be handled
    = note: this warning originates in the macro `writeln` (in Nightly builds, run with -Z macro-backtrace for more info)

warning: unused `Result` that must be used
   --> mopro-core/src/middleware/gpu_explorations/metal/msm.rs:335:17
    |
335 | /                 writeln!(
336 | |                     output_file,
337 | |                     "{},{},{}",
338 | |                     result.instance_size, result.num_instance, result.avg_processing_time
339 | |                 );
    | |_________________^
    |
    = note: this `Result` may be an `Err` variant, which should be handled
    = note: this warning originates in the macro `writeln` (in Nightly builds, run with -Z macro-backtrace for more info)

warning: `mopro-core` (lib test) generated 21 warnings (run `cargo fix --lib -p mopro-core --tests` to apply 17 suggestions)

mopro-core/Cargo.toml

vivianjeng · 2024-06-04T06:47:10Z

mopro-core/gpu_explorations/README.md

+1. cd to the `mopro/` directory.
+2. run `./scripts/build_ios.sh config-example.toml` (remember to change your ios_device_type `simulator`/`device`) to build and update the bindings.
+3. open `mopro-ios/MoproKit/Example/MoproKit.xcworkspace` in Xcode.
+4. choose your simulator/mobile device and build the project (can also use `cmd + R` as hot key).
+5. choose `MSMBenchmark` and choose the algorithms and click the button below you want to start benchmark.


We are going to remove the mopro-ios directory
so you can move the result in research/gpu-exploration-app

@FoodChain1028 would you like to help with this change

vivianjeng · 2024-06-04T06:48:12Z

mopro-core/gpu_explorations/README.md

+To run the benchmarks of the instance size of $2^{16}$ on BLS12_377 curve in `mopro-core`, replace `<algorithm_you_want_to_test>` with the algorithm name listed above.
+
+```bash
+cargo test --release --features gpu-benchmarks --package mopro-core --lib -- middleware::gpu_explorations::<algorithm_you_want_to_test>::tests::test_run_benchmark --exact --nocapture


now the trapdoortech_zprize_msm algo is commented out?

Since the trapdoortech_zprize_msm is not compatible with BN254 curve, we commented out for now. Do you think we should remove the unused msm? Such as trapdoor's and halo2's?

sorry I just saw the message
if you will use the structure to develop for BN254, you can keep it here
do you expect it will be a lot of changes to switch curves?

vivianjeng · 2024-06-04T06:50:06Z

mopro-core/gpu_explorations/benchmarks/halo2curve_multicore_msm_benchmark.txt

@@ -0,0 +1,7 @@
+msm_size,num_msm,avg_processing_time(ms)


I am interested about why halo2curve is benchmarked?

Haha, TLDR Carlos suggested that Halo2curve's msm is state-of-the-art. So we get benchmarks of it.

As you can see below, indeed, Halo2curve performs well with "asm" feature. However, this acceleration feature is only compatible with x86_64 architecture. Therefore, we didn't take this as a reference for our future work.

will they support acceleration feature in the future? or it is not possible in arm?
so it will be a legacy benchmark?

We've talked to the team and it's actually possible in arm for "asm" feature but they're not going to support this in the plan atm.

mopro-core/src/middleware/gpu_explorations/arkworks_pippenger.rs

moven0831

Complete chores for the reviews

mopro-core/Cargo.toml

moven0831 · 2024-06-06T03:07:20Z

mopro-core/gpu_explorations/README.md

+To run the benchmarks of the instance size of $2^{16}$ on BLS12_377 curve in `mopro-core`, replace `<algorithm_you_want_to_test>` with the algorithm name listed above.
+
+```bash
+cargo test --release --features gpu-benchmarks --package mopro-core --lib -- middleware::gpu_explorations::<algorithm_you_want_to_test>::tests::test_run_benchmark --exact --nocapture


Since the trapdoortech_zprize_msm is not compatible with BN254 curve, we commented out for now. Do you think we should remove the unused msm? Such as trapdoor's and halo2's?

moven0831 · 2024-06-06T03:09:31Z

mopro-core/gpu_explorations/README.md

+1. cd to the `mopro/` directory.
+2. run `./scripts/build_ios.sh config-example.toml` (remember to change your ios_device_type `simulator`/`device`) to build and update the bindings.
+3. open `mopro-ios/MoproKit/Example/MoproKit.xcworkspace` in Xcode.
+4. choose your simulator/mobile device and build the project (can also use `cmd + R` as hot key).
+5. choose `MSMBenchmark` and choose the algorithms and click the button below you want to start benchmark.


@FoodChain1028 would you like to help with this change

moven0831 · 2024-06-06T03:23:14Z

mopro-core/gpu_explorations/benchmarks/halo2curve_multicore_msm_benchmark.txt

@@ -0,0 +1,7 @@
+msm_size,num_msm,avg_processing_time(ms)


Haha, TLDR Carlos suggested that Halo2curve's msm is state-of-the-art. So we get benchmarks of it.

As you can see below, indeed, Halo2curve performs well with "asm" feature. However, this acceleration feature is only compatible with x86_64 architecture. Therefore, we didn't take this as a reference for our future work.

… others

…size

…chmarking at once

…x 10 benchmark data can be gen in 5 min

…approx. 30% faster

for offloading dependencies requirement

…r for msm

… pippenger as metal wrapper

… arkworks (CPU)

…s of metal result

…ion phase

…ation

…stency

moven0831

I've listed a few directions for future works

moven0831 · 2024-06-06T05:46:17Z

mopro-core/src/middleware/gpu_explorations/metal/msm.rs

+    // flatten scalar and base to Vec<u32> for GPU usage
+    let scalars_limbs = cfg_into_iter!(scalars)
+        .map(|s| s.into_bigint().to_u32_limbs())
+        .flatten()
+        .collect::<Vec<u32>>();
+    let bases_limbs = cfg_into_iter!(points)
+        .map(|b| {
+            let b = b.into_group();
+            b.x.to_u32_limbs()
+                .into_iter()
+                .chain(b.y.to_u32_limbs())
+                .chain(b.z.to_u32_limbs())
+                .collect::<Vec<_>>()
+        })
+        .flatten()
+        .collect::<Vec<u32>>();
+    let buckets_matrix_limbs = {
+        // buckets_size * num_windows is for parallelism on windows (variable-based MSM)
+        let matrix = vec![zero; buckets_size * window_starts.len()];
+        cfg_into_iter!(matrix)
+            .map(|b| {
+                b.x.to_u32_limbs()
+                    .into_iter()
+                    .chain(b.y.to_u32_limbs())
+                    .chain(b.z.to_u32_limbs())
+                    .collect::<Vec<_>>()
+            })
+            .flatten()
+            .collect::<Vec<u32>>()
+    };


Possible optimization direction: enhance speed for preparing inputs for GPU

can you open a new issue?
I think it can be outside of this PR
(and because this PR is too big)

moven0831 · 2024-06-06T05:48:53Z

mopro-core/src/middleware/gpu_explorations/metal/msm.rs

+    let buckets_matrix = {
+        let raw_limbs = MetalState::retrieve_contents::<u32>(&buckets_matrix_buffer);
+        cfg_into_iter!(raw_limbs)
+            .chunks(24)
+            .map(|chunk| {
+                G::new_unchecked(
+                    Fq::from_u32_limbs(&chunk[0..8]),
+                    Fq::from_u32_limbs(&chunk[8..16]),
+                    Fq::from_u32_limbs(&chunk[16..24]),
+                )
+            })
+            .collect::<Vec<_>>()
+    };
+
+    // include the last windox idx
+    let bucket_starts: Vec<usize> = (0..buckets_matrix.len() + buckets_size)
+        .step_by(buckets_size)
+        .collect();
+    let window_sums: Vec<_> = cfg_into_iter!(window_starts.clone())
+        .enumerate()
+        .map(|(idx, w_start)| {
+            // only process unit scalars once in the first window.
+            let mut res = zero;
+            if w_start == 0 {
+                for i in 0..instances_size {
+                    if scalars[i] == one {
+                        res += points[i];
+                    }
+                }
+            }
+
+            let buckets = buckets_matrix[bucket_starts[idx]..bucket_starts[idx + 1]].to_vec();
+            let mut running_sum = zero;
+            buckets.into_iter().rev().for_each(|b| {
+                running_sum += &b;
+                res += &running_sum;
+            });
+            res
+        })
+        .collect();
+
+    let lowest = *window_sums.first().unwrap();
+
+    Ok(lowest
+        + &window_sums[1..]
+            .iter()
+            .rev()
+            .fold(zero, |mut total, sum_i| {
+                total += sum_i;
+                for _ in 0..window_size {
+                    total = total.double();
+                }
+                total
+            }))
+}


Possible optimization direction: move all of this msm computation to GPU using metal

moven0831 · 2024-06-06T06:03:29Z

Hi, @vivianjeng I've rebased to main and adapted changes from the code reviews above. After the review from @FoodChain1028 , we can consider to merge it back to main.

vivianjeng

Can also add some benchmark tests with CI?
(can be very simple but at least it can work)
you can add a new .yml file if you think it is separated from mopro

vivianjeng · 2024-06-06T05:10:30Z

mopro-core/gpu_explorations/benchmarks/halo2curve_multicore_msm_benchmark.txt

@@ -0,0 +1,7 @@
+msm_size,num_msm,avg_processing_time(ms)


will they support acceleration feature in the future? or it is not possible in arm?
so it will be a legacy benchmark?

mopro-core/src/middleware/gpu_explorations/halo2curve_msm.rs

vivianjeng · 2024-06-06T06:29:19Z

mopro-core/src/middleware/gpu_explorations/trapdoortech_zprize_msm/mod.rs

@@ -62,7 +62,6 @@ where
        println!(
            "Average time to execute MSM with {} points and {} scalars in {} iterations is: {:?}",
            points.len(),
-            scalars.len(),


should be added?

mopro-core/src/middleware/gpu_explorations/utils/preprocess.rs

mopro-ffi/src/lib.rs

* Fixed (untested) issue with IOS debug mode not able to build * Script `install_deps.sh` typo * If a path with Mopro IOS App contains a space the `mopro build --platforms ios` fails as spaces in address confuse the `cd` command. * If a path PROJECT_DIR contains a space, it no longer bugs on a `cd` command due to escaping with columns * (feat: core) added support for proving / verifying the sample circuit in morpo-core * (feat: ffi) implemented call interfaces for Halo2 proofs (single input argument) * (feat: template) added Halo2CircuitView for IOS * (feat: example) Renamed Halo2 example module to `halo2-example` * (feat: example) Renamed example crate to `halo2-circuit` to be used by all * (feat: core) Circom is only compiled when halo2 flag is absent * (feat: core) Major rework to build script: 1. Seperated Circom and Halo2 build dependencies. 2. Added Halo2 build dependancies - Check that all keys have been generated. - Replace the placeholder `example` project with the user specified project by using ``paths`` override in ``.cargo/config.toml`. - Check that the user provided project meets requirements - Is a valid cargo crate - Is named as `halo2-circuit` * (feat: cfg) an example of config for Halo2 as well as updated other configs. * (fix: core) issue of build.rs not compiling due to function declaration hidden with conditional compilation. Accidentally commited script changes. Reverted later. * (feat: core) added the autogenerated `.cargo/config.toml` to gitignore * (feat: ffi) added support for halo2 feature and conditional compilation for all functions (with a stud in case a different proof system is used) * (fix: script) reverted changes to script * (feat: tmpl) added kind to template * (feat: tmpl) added Halo2 View to IOS template * (feat: core) added debug prints for build script to know where the SRS, PK and VK are read from * (feat: script) updated the scripts to support Halo2 circuits * (feat: script) added a sample Halo2 circuit to the template * (feat: script) added a sample Halo2 circuit config to the template config * (feat: lock) stable Halo2 version * (feat: example) added conditional compilation for circom example * (feat: core+example) moved circuit specific logic to halo2-circuit crate, making middleware generic * (feat: core) added deserialisation for circuit inputs * (feat: ffi) updated ffi to be compatible with core changes * (fix: core) warning cleanup * (fix: core) build.rs warning cleanup * (fix: ffi) fixed circom compilation issue * (feat: example) created a folder for halo2 example circuits and moved example fibonacci circuit there * (fix: example) remove workspace option as it fails build * (fix: ios-template) update IOS template files to work with new Halo2 circuit inputs * (feat: halo2-template) updated halo2 template with up-to-date halo2 example * (fix: lock) updated lock file * (feat: docs) added documentation on how to use Halo2 circuits * fix: fix toolchain version * fix: fix mopro test * chore: remove unused config file, simplify prepare script * feat: enable parallel in gpu-benchmarks * fix: fix println * style nav and footer * add intro elements and text * import features svgs and text * features layout * build with yarn * remove package-lock.json * fix: fix yarn.lock file * change link colors * set yarn version 3.6.3 * fix arrow image on Docs pages * restore h1 class to default, update px to rem * remove comments * fix mobile header * fix landing page for mobile * fix: remove yarnrc * fix: fix ci error * fix: fix mopro test * (fix: general) Changed "kind" parameter in config to "adapter" to be consistent with [docs](https://zkmopro.org/docs/intro). * (fix: example) Commited the Halo2 Fibonacci circuit to `mopro-core/examples`, as before it was missing. * (fix: general) Changed missed "kind" parameter in config to "adapter" to be consistent with [docs](https://zkmopro.org/docs/intro). * (fix: linter) Fixed all linter errors for `cargo fmt --all` * (fix: ffi) Circom tests are only to be run when circom flag is enabled. * (fix: examples) Removed multiply example (to be added as a different circuit example). * (fix: ffi) Resolved issue with indent. * (refactor: common) Renamed `verify_halo2_proof2` and `prove_halo2_proof2`to `..._halo2_proof` * Feat/Msm with gpu (metal shader language) on laptop (#150) * feat(msm_benchmark): integrate zprize2022 TrapdoorTech msm algo on Rust * refactor(msm_benchmark): separate arkworks_pippenger as baseline from others * refactor(benchmark): rewrite scalars and points gen to preprocess * refactor(baseline): rewrite benchmark method to 2^10 x 2^16 instance size * refactor: modify benchmark standard to match zprize works * feat(baseline): adopt zprize benchmarking method and enable multi benchmarking at once * feat(ffi): integrate trapdoor tech msm in mopro-ffi * feat(ffi): add test of trapdoor tech msm * fix(ffi): modify input for trapdoor msm * fix: update arkowrks pippenger input/ ouput * fix: update trapdoor tech zprize msm input/ ouput * fix: add the feature flag back * fix: modify msm functions input * fix: lint * feat(benchmark data): accelerate the benchmark data generation. 2^20 x 10 benchmark data can be gen in 5 min * feat: add a README file for gpu-exploration * feat(gpu-explorations): benchmark msm on BN254 curve, which leads to approx. 30% faster * feat(gpu-explorations): integrated halo2curve's msm and benchmarks * refactor(msm): disable other msm's except the arkworks 0.4 msm for offloading dependencies requirement * feat(metal): provide basic structure of metal backend and rust wrapper for msm * fix: compile pathway error * chore: fix shader path config and identify parallel part of arkworks' pippenger as metal wrapper * feat(metal): draft msm wrapper in Rust for metal backend * doc: added reference for bls-12-381 and bn254 * chore(metal): add python helper to compute BN254 params * feat(metal): introduce u256 type implementation and bn254 params * docs(metal): generate abstract addition chain instructions for further implementation * feat: add instruction for bn254 addchain * test(metal): add fixed-params tests for bn254 operations * test(metal): update u256 type and focus on add test * test(metal): fix To and From BigInt format, provide better view on add_test result * fix: make the path root-compatible * chore: added error test * feat: compiled metal lib * chore(metal): update test log for better view on the bug * fix(metal): correct logic of {to, from}_u32_limbs and addition logic in metal of [low, high, 0, 0] * fix(metal): correct data repr logic between metal and arkworks and complete uint test * fix(metal): update bn254 tests and fix logic * fix(metal): update bn254 neg test * fix(metal): use larger result arrays * test(metal): add & sub fuzzing test for Fq_bn254 * fix(metal): correct the Montgomery Mul. Constant and complete mul test * test(metal): add pow test for bn254 base field * fix(metal): fix logic for msm usage on bn254 * fix(metal): correct the metal buffer index * refactor(metal): utils module for data format between metal (GPU) and arkworks (CPU) * fix(metal): correct encode/decode logic * test(metal): add test for msm accumulation phase to ensure correctness of metal result * test(metal): add test to bn254 points arithmetics * fix: modified double_in_place * refactor(metal): add limbs_conversion module for to/from metal * feat(metal): implement arkworks msm accumulation logic in metal * refactor(metal): add Fq conversion to/from limbs for metal * feat(metal): compute msm bucket in window-wise fashion * test(metal): add msm wrapper test on metal implementation * feat(metal): implement msm with enabling GPU computation on accumulation phase * refactor(metal): update paths for metal shader files * feat: integrate metal msm into mopro-ffi * feat(metal): add mont_reduction module for gpu result conversion * test(metal): enable latest limb conversion and remove unused module * feat(metal): optimize msm bucket computation with window-wise accumulation * feat(metal): Rust wrapper for latest metal msm accumulation * chore: update the instanceSize and numInstance in metal to make consistency * chore: remove commented code for bls12_377 curve parsing * fix: correct warning for GPU explorations code * chore(gpu-benchmarks): correct minor changes --------- Co-authored-by: moven0831 <[email protected]> * (feat: ci) Added an option to run CI on when a PR is ready for review, opened, re-opened and synchronized. * Script `install_deps.sh` typo * If a path with Mopro IOS App contains a space the `mopro build --platforms ios` fails as spaces in address confuse the `cd` command. * If a path PROJECT_DIR contains a space, it no longer bugs on a `cd` command due to escaping with columns * (feat: core) added support for proving / verifying the sample circuit in morpo-core * (feat: ffi) implemented call interfaces for Halo2 proofs (single input argument) * (feat: template) added Halo2CircuitView for IOS * (feat: example) Renamed Halo2 example module to `halo2-example` * (feat: example) Renamed example crate to `halo2-circuit` to be used by all * (feat: core) Circom is only compiled when halo2 flag is absent * (feat: core) Major rework to build script: 1. Seperated Circom and Halo2 build dependencies. 2. Added Halo2 build dependancies - Check that all keys have been generated. - Replace the placeholder `example` project with the user specified project by using ``paths`` override in ``.cargo/config.toml`. - Check that the user provided project meets requirements - Is a valid cargo crate - Is named as `halo2-circuit` * (feat: cfg) an example of config for Halo2 as well as updated other configs. * (fix: core) issue of build.rs not compiling due to function declaration hidden with conditional compilation. Accidentally commited script changes. Reverted later. * (feat: core) added the autogenerated `.cargo/config.toml` to gitignore * (feat: ffi) added support for halo2 feature and conditional compilation for all functions (with a stud in case a different proof system is used) * (fix: script) reverted changes to script * (feat: tmpl) added kind to template * (feat: tmpl) added Halo2 View to IOS template * (feat: core) added debug prints for build script to know where the SRS, PK and VK are read from * (feat: script) updated the scripts to support Halo2 circuits * (feat: script) added a sample Halo2 circuit to the template * (feat: script) added a sample Halo2 circuit config to the template config * (feat: lock) stable Halo2 version * (feat: example) added conditional compilation for circom example * (feat: core+example) moved circuit specific logic to halo2-circuit crate, making middleware generic * (feat: core) added deserialisation for circuit inputs * (feat: ffi) updated ffi to be compatible with core changes * (fix: core) warning cleanup * (fix: core) build.rs warning cleanup * (fix: ffi) fixed circom compilation issue * (feat: example) created a folder for halo2 example circuits and moved example fibonacci circuit there * (fix: example) remove workspace option as it fails build * (fix: ios-template) update IOS template files to work with new Halo2 circuit inputs * (feat: halo2-template) updated halo2 template with up-to-date halo2 example * (fix: lock) updated lock file * (fix: general) Changed "kind" parameter in config to "adapter" to be consistent with [docs](https://zkmopro.org/docs/intro). * (fix: example) Commited the Halo2 Fibonacci circuit to `mopro-core/examples`, as before it was missing. * (fix: general) Changed missed "kind" parameter in config to "adapter" to be consistent with [docs](https://zkmopro.org/docs/intro). * (fix: linter) Fixed all linter errors for `cargo fmt --all` * (fix: ffi) Circom tests are only to be run when circom flag is enabled. * (fix: examples) Removed multiply example (to be added as a different circuit example). * (fix: ffi) Resolved issue with indent. * (refactor: common) Renamed `verify_halo2_proof2` and `prove_halo2_proof2`to `..._halo2_proof` * (merge: ffi) Merged changes from GPU explorations to the FFI file, moving them from `lib.rs` to `circom.rs` * (fix: ffi) Fix a compilation error in `circom.rs` * chore: update cargo.lock * (feat: web) Added a Halo2 page to the docs. * (fix: conf) Fixed a GitHub conflict issue in config * (chore: halo2) optimised use statement conditional compilation by grouping them together * (fix: examples) fixed the halo2 fibonacci example README to be up-to-date with the content of the example crate * (fix: template) moved updated fibonacci halo2 example to the template * Script `install_deps.sh` typo * If a path with Mopro IOS App contains a space the `mopro build --platforms ios` fails as spaces in address confuse the `cd` command. * If a path PROJECT_DIR contains a space, it no longer bugs on a `cd` command due to escaping with columns * (feat: core) added support for proving / verifying the sample circuit in morpo-core * (feat: ffi) implemented call interfaces for Halo2 proofs (single input argument) * (feat: template) added Halo2CircuitView for IOS * (feat: example) Renamed Halo2 example module to `halo2-example` * (feat: example) Renamed example crate to `halo2-circuit` to be used by all * (feat: core) Circom is only compiled when halo2 flag is absent * (feat: core) Major rework to build script: 1. Seperated Circom and Halo2 build dependencies. 2. Added Halo2 build dependancies - Check that all keys have been generated. - Replace the placeholder `example` project with the user specified project by using ``paths`` override in ``.cargo/config.toml`. - Check that the user provided project meets requirements - Is a valid cargo crate - Is named as `halo2-circuit` * (feat: cfg) an example of config for Halo2 as well as updated other configs. * (fix: core) issue of build.rs not compiling due to function declaration hidden with conditional compilation. Accidentally commited script changes. Reverted later. * (feat: core) added the autogenerated `.cargo/config.toml` to gitignore * (feat: ffi) added support for halo2 feature and conditional compilation for all functions (with a stud in case a different proof system is used) * (fix: script) reverted changes to script * (feat: tmpl) added kind to template * (feat: tmpl) added Halo2 View to IOS template * (feat: core) added debug prints for build script to know where the SRS, PK and VK are read from * (feat: script) updated the scripts to support Halo2 circuits * (feat: script) added a sample Halo2 circuit to the template * (feat: script) added a sample Halo2 circuit config to the template config * (feat: lock) stable Halo2 version * (feat: example) added conditional compilation for circom example * (feat: core+example) moved circuit specific logic to halo2-circuit crate, making middleware generic * (feat: core) added deserialisation for circuit inputs * (feat: ffi) updated ffi to be compatible with core changes * (fix: core) warning cleanup * (fix: core) build.rs warning cleanup * (fix: ffi) fixed circom compilation issue * (feat: example) created a folder for halo2 example circuits and moved example fibonacci circuit there * (fix: example) remove workspace option as it fails build * (fix: ios-template) update IOS template files to work with new Halo2 circuit inputs * (feat: halo2-template) updated halo2 template with up-to-date halo2 example * (fix: lock) updated lock file * (fix: general) Changed "kind" parameter in config to "adapter" to be consistent with [docs](https://zkmopro.org/docs/intro). * (fix: example) Commited the Halo2 Fibonacci circuit to `mopro-core/examples`, as before it was missing. * (fix: general) Changed missed "kind" parameter in config to "adapter" to be consistent with [docs](https://zkmopro.org/docs/intro). * (fix: linter) Fixed all linter errors for `cargo fmt --all` * (fix: ffi) Circom tests are only to be run when circom flag is enabled. * (fix: examples) Removed multiply example (to be added as a different circuit example). * (fix: ffi) Resolved issue with indent. * (refactor: common) Renamed `verify_halo2_proof2` and `prove_halo2_proof2`to `..._halo2_proof` * (feat: ffi) implemented call interfaces for Halo2 proofs (single input argument) * (feat: script) added a sample Halo2 circuit to the template * (feat: ffi) updated ffi to be compatible with core changes * (fix: example) Commited the Halo2 Fibonacci circuit to `mopro-core/examples`, as before it was missing. * (fix: examples) Removed multiply example (to be added as a different circuit example). * (fix: ffi) Resolved issue with indent. * (refactor: common) Renamed `verify_halo2_proof2` and `prove_halo2_proof2`to `..._halo2_proof` * (merge: ffi) Merged changes from GPU explorations to the FFI file, moving them from `lib.rs` to `circom.rs` * (fix: ffi) Fix a compilation error in `circom.rs` * (feat: web) Added a Halo2 page to the docs. * (chore: halo2) optimised use statement conditional compilation by grouping them together * (fix: examples) fixed the halo2 fibonacci example README to be up-to-date with the content of the example crate * (fix: template) moved updated fibonacci halo2 example to the template --------- Co-authored-by: Ya-wen, Jeng <[email protected]> Co-authored-by: CJ Rose <[email protected]> Co-authored-by: FoodChain <[email protected]> Co-authored-by: moven0831 <[email protected]>

vivianjeng reviewed Jun 4, 2024

View reviewed changes

moven0831 reviewed Jun 6, 2024

View reviewed changes

moven0831 force-pushed the feat/msm-with-gpu-on-laptop branch from df4f48f to 002fcfb Compare June 6, 2024 04:37

moven0831 and others added 26 commits June 6, 2024 12:39

feat(msm_benchmark): integrate zprize2022 TrapdoorTech msm algo on Rust

c760b2c

refactor(msm_benchmark): separate arkworks_pippenger as baseline from…

01b036d

… others

refactor(benchmark): rewrite scalars and points gen to preprocess

2f6ce16

refactor(baseline): rewrite benchmark method to 2^10 x 2^16 instance …

51828ae

…size

refactor: modify benchmark standard to match zprize works

bcc0ebe

feat(baseline): adopt zprize benchmarking method and enable multi ben…

4e19e96

…chmarking at once

feat(ffi): integrate trapdoor tech msm in mopro-ffi

3344167

feat(ffi): add test of trapdoor tech msm

af11bd4

fix(ffi): modify input for trapdoor msm

e36cab6

fix: update arkowrks pippenger input/ ouput

47a2872

fix: update trapdoor tech zprize msm input/ ouput

6c955cf

fix: add the feature flag back

6de77ea

fix: modify msm functions input

b385344

fix: lint

0ee47bc

feat(benchmark data): accelerate the benchmark data generation. 2^20 …

6127e1e

…x 10 benchmark data can be gen in 5 min

feat: add a README file for gpu-exploration

75f1b73

feat(gpu-explorations): benchmark msm on BN254 curve, which leads to …

86ace53

…approx. 30% faster

feat(gpu-explorations): integrated halo2curve's msm and benchmarks

1a714da

refactor(msm): disable other msm's except the arkworks 0.4 msm

a446f0a

for offloading dependencies requirement

feat(metal): provide basic structure of metal backend and rust wrappe…

69a39f0

…r for msm

fix: compile pathway error

79447ab

chore: fix shader path config and identify parallel part of arkworks'…

b3a401c

… pippenger as metal wrapper

feat(metal): draft msm wrapper in Rust for metal backend

52dee44

doc: added reference for bls-12-381 and bn254

92cdb5f

chore(metal): add python helper to compute BN254 params

e233fe9

feat(metal): introduce u256 type implementation and bn254 params

2438bdc

moven0831 and others added 19 commits June 6, 2024 12:39

refactor(metal): utils module for data format between metal (GPU) and…

d9a7279

… arkworks (CPU)

fix(metal): correct encode/decode logic

2270ad4

test(metal): add test for msm accumulation phase to ensure correctnes…

3bc0b75

…s of metal result

test(metal): add test to bn254 points arithmetics

d5a62e7

fix: modified double_in_place

e83098b

refactor(metal): add limbs_conversion module for to/from metal

71cde1e

feat(metal): implement arkworks msm accumulation logic in metal

3a6d791

refactor(metal): add Fq conversion to/from limbs for metal

c4fdea6

feat(metal): compute msm bucket in window-wise fashion

a9f73a9

test(metal): add msm wrapper test on metal implementation

8fef337

feat(metal): implement msm with enabling GPU computation on accumulat…

edb21d6

…ion phase

refactor(metal): update paths for metal shader files

909e0b2

feat: integrate metal msm into mopro-ffi

c1e61e0

feat(metal): add mont_reduction module for gpu result conversion

f0a5449

test(metal): enable latest limb conversion and remove unused module

9659393

feat(metal): optimize msm bucket computation with window-wise accumul…

cfe112a

…ation

feat(metal): Rust wrapper for latest metal msm accumulation

0dd64fa

chore: update the instanceSize and numInstance in metal to make consi…

a3003d3

…stency

chore: remove commented code for bls12_377 curve parsing

8eed163

moven0831 force-pushed the feat/msm-with-gpu-on-laptop branch from 002fcfb to 8eed163 Compare June 6, 2024 04:40

fix: correct warning for GPU explorations code

bf0ae37

moven0831 reviewed Jun 6, 2024

View reviewed changes

vivianjeng reviewed Jun 6, 2024

View reviewed changes

chore(gpu-benchmarks): correct minor changes

e99f439

moven0831 force-pushed the feat/msm-with-gpu-on-laptop branch from 5652d53 to e99f439 Compare June 7, 2024 12:19

vivianjeng merged commit 7e59a47 into main Jun 8, 2024
7 checks passed

vivianjeng deleted the feat/msm-with-gpu-on-laptop branch June 8, 2024 03:15

moven0831 mentioned this pull request Jun 11, 2024

feat(metal): execute whole msm in metal #155

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/Msm with gpu (metal shader language) on laptop #150

Feat/Msm with gpu (metal shader language) on laptop #150

FoodChain1028 commented Jun 2, 2024 •

edited

Loading

cloudflare-workers-and-pages bot commented Jun 3, 2024 •

edited

Loading

vivianjeng left a comment

vivianjeng Jun 4, 2024

moven0831 Jun 6, 2024

vivianjeng Jun 4, 2024

moven0831 Jun 6, 2024

vivianjeng Jun 6, 2024

vivianjeng Jun 4, 2024

moven0831 Jun 6, 2024

vivianjeng Jun 6, 2024

moven0831 Jun 7, 2024

moven0831 left a comment

moven0831 Jun 6, 2024

moven0831 Jun 6, 2024

moven0831 Jun 6, 2024

moven0831 left a comment

moven0831 Jun 6, 2024

vivianjeng Jun 6, 2024

moven0831 Jun 6, 2024

moven0831 commented Jun 6, 2024

vivianjeng left a comment

vivianjeng Jun 6, 2024

vivianjeng Jun 6, 2024

Feat/Msm with gpu (metal shader language) on laptop #150

Feat/Msm with gpu (metal shader language) on laptop #150

Conversation

FoodChain1028 commented Jun 2, 2024 • edited Loading

cloudflare-workers-and-pages bot commented Jun 3, 2024 • edited Loading

Deploying mopro with Cloudflare Pages

vivianjeng left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

moven0831 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

moven0831 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

moven0831 commented Jun 6, 2024

vivianjeng left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

FoodChain1028 commented Jun 2, 2024 •

edited

Loading

cloudflare-workers-and-pages bot commented Jun 3, 2024 •

edited

Loading