Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/Msm with gpu (metal shader language) on laptop #150

Merged
merged 66 commits into from
Jun 8, 2024

Conversation

FoodChain1028
Copy link
Collaborator

@FoodChain1028 FoodChain1028 commented Jun 2, 2024

  • Accumulation state algorithm in Metal is implemented at mopro-core/src/middleware/gpu_explorations/metal/.
  • points and operations using BLS12381 curve was integrated and we converted them into BN254.
  • current report

The result of metal_msm that runs 2^10 (1024) randomly-chosen points and scalars:

Run metal msm benchmarking
    ```bash
    cargo test --release --package mopro-core --lib -- middleware::gpu_explorations::metal::msm::tests::test_benchmark_msm --exact --nocapture
    ```
    
    Result:
    ```
    Average time to execute MSM with 1024 points and scalars in 1 iterations is: 7.059988125s
    Average time to execute MSM with 1024 points and scalars in 1 iterations is: 7.037813333s
    Average time to execute MSM with 1024 points and scalars in 1 iterations is: 6.948694125s
    Average time to execute MSM with 1024 points and scalars in 1 iterations is: 6.983621s
    Average time to execute MSM with 1024 points and scalars in 1 iterations is: 7.005291583s
    Average time to execute MSM with 1024 points and scalars in 1 iterations is: 7.019227875s
    Average time to execute MSM with 1024 points and scalars in 1 iterations is: 7.026516125s
    Average time to execute MSM with 1024 points and scalars in 1 iterations is: 7.037779959s
    test middleware::gpu_explorations::metal::msm::tests::test_benchmark_msm has been running for over 60 seconds
    Average time to execute MSM with 1024 points and scalars in 1 iterations is: 7.017979667s
    Average time to execute MSM with 1024 points and scalars in 1 iterations is: 7.042247291s
    Done running benchmark: Ok([7.059988125s, 7.037813333s, 6.948694125s, 6.983621s, 7.005291583s, 7.019227875s, 7.026516125s, 7.037779959s, 7.017979667s, 7.042247291s])
    test middleware::gpu_explorations::metal::msm::tests::test_benchmark_msm ... ok
    ```

Copy link

cloudflare-workers-and-pages bot commented Jun 3, 2024

Deploying mopro with  Cloudflare Pages  Cloudflare Pages

Latest commit: e99f439
Status: ✅  Deploy successful!
Preview URL: https://7b727b4d.mopro.pages.dev
Branch Preview URL: https://feat-msm-with-gpu-on-laptop.mopro.pages.dev

View logs

Copy link
Collaborator

@vivianjeng vivianjeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you for this PR
please try to rebase the branch and fix these warnings

Warnings
warning: unused import: `G1Affine as GAffine`
 --> mopro-core/src/middleware/gpu_explorations/arkworks_pippenger.rs:1:36
  |
1 | use ark_bn254::{Fr as ScalarField, G1Affine as GAffine, G1Projective as G};
  |                                    ^^^^^^^^^^^^^^^^^^^
  |
  = note: `#[warn(unused_imports)]` on by default

warning: unused import: `AffineRepr`
 --> mopro-core/src/middleware/gpu_explorations/arkworks_pippenger.rs:2:14
  |
2 | use ark_ec::{AffineRepr, VariableBaseMSM};
  |              ^^^^^^^^^^

warning: unused import: `ark_ff::BigInt`
 --> mopro-core/src/middleware/gpu_explorations/arkworks_pippenger.rs:3:5
  |
3 | use ark_ff::BigInt;
  |     ^^^^^^^^^^^^^^

warning: unused import: `ark_serialize::CanonicalDeserialize`
 --> mopro-core/src/middleware/gpu_explorations/arkworks_pippenger.rs:4:5
  |
4 | use ark_serialize::CanonicalDeserialize;
  |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

warning: unused import: `CurveGroup`
 --> mopro-core/src/middleware/gpu_explorations/metal/msm.rs:2:26
  |
2 | use ark_ec::{AffineRepr, CurveGroup, Group, VariableBaseMSM};
  |                          ^^^^^^^^^^

warning: unused imports: `BigInteger256`, `BigInteger`, `UniformRand`
 --> mopro-core/src/middleware/gpu_explorations/metal/msm.rs:4:18
  |
4 |     biginteger::{BigInteger, BigInteger256},
  |                  ^^^^^^^^^^  ^^^^^^^^^^^^^
5 |     PrimeField, UniformRand,
  |                 ^^^^^^^^^^^

warning: unused imports: `One`, `rand`
 --> mopro-core/src/middleware/gpu_explorations/metal/msm.rs:7:30
  |
7 | use ark_std::{cfg_into_iter, rand, vec::Vec, One, Zero};
  |                              ^^^^            ^^^

warning: unused import: `ark_serialize::CanonicalDeserialize`
  --> mopro-core/src/middleware/gpu_explorations/metal/msm.rs:10:5
   |
10 | use ark_serialize::CanonicalDeserialize;
   |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

warning: unused imports: `BigInteger256`, `BigInteger`
  --> mopro-core/src/middleware/gpu_explorations/metal/tests/test_bn254.rs:10:22
   |
10 |         biginteger::{BigInteger, BigInteger256},
   |                      ^^^^^^^^^^  ^^^^^^^^^^^^^

warning: unused import: `FqConfig`
 --> mopro-core/src/middleware/gpu_explorations/metal/tests/test_msm.rs:3:25
  |
3 |     use ark_bn254::{Fq, FqConfig, Fr as ScalarField, G1Affine as GAffine, G1Projective as G};
  |                         ^^^^^^^^

warning: unused imports: `BigInteger256`, `BigInteger`
 --> mopro-core/src/middleware/gpu_explorations/metal/tests/test_msm.rs:6:22
  |
6 |         biginteger::{BigInteger, BigInteger256},
  |                      ^^^^^^^^^^  ^^^^^^^^^^^^^

warning: unused import: `One`
 --> mopro-core/src/middleware/gpu_explorations/metal/tests/test_msm.rs:9:50
  |
9 |     use ark_std::{cfg_into_iter, rand, vec::Vec, One, Zero};
  |                                                  ^^^

warning: unused import: `G1Projective as G`
 --> mopro-core/src/middleware/gpu_explorations/utils/preprocess.rs:1:57
  |
1 | use ark_bn254::{Fr as ScalarField, G1Affine as GAffine, G1Projective as G};
  |                                                         ^^^^^^^^^^^^^^^^^

warning: unused import: `Field`
 --> mopro-core/src/middleware/gpu_explorations/utils/preprocess.rs:2:14
  |
2 | use ark_ff::{Field, PrimeField};
  |              ^^^^^

warning: unused import: `Rng`
 --> mopro-core/src/middleware/gpu_explorations/utils/preprocess.rs:5:12
  |
5 |     rand::{Rng, RngCore},
  |            ^^^

warning: unused variable: `n`
   --> mopro-core/src/middleware/gpu_explorations/metal/tests/test_bn254.rs:347:29
    |
347 |             fn rand_point()(n in any::<u8>()) -> G {
    |                             ^ help: if this is intentional, prefix it with an underscore: `_n`
    |
    = note: `#[warn(unused_variables)]` on by default

warning: unused `Result` that must be used
   --> mopro-core/src/middleware/gpu_explorations/arkworks_pippenger.rs:151:9
    |
151 |         writeln!(output_file, "msm_size,num_msm,avg_processing_time(ms)");
    |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    |
    = note: this `Result` may be an `Err` variant, which should be handled
    = note: `#[warn(unused_must_use)]` on by default
    = note: this warning originates in the macro `writeln` (in Nightly builds, run with -Z macro-backtrace for more info)

warning: unused `Result` that must be used
   --> mopro-core/src/middleware/gpu_explorations/arkworks_pippenger.rs:166:17
    |
166 | /                 writeln!(
167 | |                     output_file,
168 | |                     "{},{},{}",
169 | |                     result.instance_size, result.num_instance, result.avg_processing_time
170 | |                 );
    | |_________________^
    |
    = note: this `Result` may be an `Err` variant, which should be handled
    = note: this warning originates in the macro `writeln` (in Nightly builds, run with -Z macro-backtrace for more info)

warning: call to `.clone()` on a reference in this situation does nothing
  --> mopro-core/src/middleware/gpu_explorations/metal/msm.rs:47:47
   |
47 |     let scalars_limbs = cfg_into_iter!(scalars.clone())
   |                                               ^^^^^^^^ help: remove this redundant call
   |
   = note: the type `[ark_ff::Fp<MontBackend<FrConfig, 4>, 4>]` does not implement `Clone`, so calling `clone` on `&[ark_ff::Fp<MontBackend<FrConfig, 4>, 4>]` copies the reference, which does not do anything and can be removed
   = note: `#[warn(noop_method_call)]` on by default

warning: unused `Result` that must be used
   --> mopro-core/src/middleware/gpu_explorations/metal/msm.rs:320:9
    |
320 |         writeln!(output_file, "msm_size,num_msm,avg_processing_time(ms)");
    |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    |
    = note: this `Result` may be an `Err` variant, which should be handled
    = note: this warning originates in the macro `writeln` (in Nightly builds, run with -Z macro-backtrace for more info)

warning: unused `Result` that must be used
   --> mopro-core/src/middleware/gpu_explorations/metal/msm.rs:335:17
    |
335 | /                 writeln!(
336 | |                     output_file,
337 | |                     "{},{},{}",
338 | |                     result.instance_size, result.num_instance, result.avg_processing_time
339 | |                 );
    | |_________________^
    |
    = note: this `Result` may be an `Err` variant, which should be handled
    = note: this warning originates in the macro `writeln` (in Nightly builds, run with -Z macro-backtrace for more info)

warning: `mopro-core` (lib test) generated 21 warnings (run `cargo fix --lib -p mopro-core --tests` to apply 17 suggestions)

mopro-core/Cargo.toml Outdated Show resolved Hide resolved
mopro-core/Cargo.toml Outdated Show resolved Hide resolved
Comment on lines 57 to 61
1. cd to the `mopro/` directory.
2. run `./scripts/build_ios.sh config-example.toml` (remember to change your ios_device_type `simulator`/`device`) to build and update the bindings.
3. open `mopro-ios/MoproKit/Example/MoproKit.xcworkspace` in Xcode.
4. choose your simulator/mobile device and build the project (can also use `cmd + R` as hot key).
5. choose `MSMBenchmark` and choose the algorithms and click the button below you want to start benchmark.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are going to remove the mopro-ios directory
so you can move the result in research/gpu-exploration-app

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@FoodChain1028 would you like to help with this change

To run the benchmarks of the instance size of $2^{16}$ on BLS12_377 curve in `mopro-core`, replace `<algorithm_you_want_to_test>` with the algorithm name listed above.

```bash
cargo test --release --features gpu-benchmarks --package mopro-core --lib -- middleware::gpu_explorations::<algorithm_you_want_to_test>::tests::test_run_benchmark --exact --nocapture
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now the trapdoortech_zprize_msm algo is commented out?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the trapdoortech_zprize_msm is not compatible with BN254 curve, we commented out for now. Do you think we should remove the unused msm? Such as trapdoor's and halo2's?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry I just saw the message
if you will use the structure to develop for BN254, you can keep it here
do you expect it will be a lot of changes to switch curves?

@@ -0,0 +1,7 @@
msm_size,num_msm,avg_processing_time(ms)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am interested about why halo2curve is benchmarked?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha, TLDR Carlos suggested that Halo2curve's msm is state-of-the-art. So we get benchmarks of it.

As you can see below, indeed, Halo2curve performs well with "asm" feature. However, this acceleration feature is only compatible with x86_64 architecture. Therefore, we didn't take this as a reference for our future work.

photo_2024-06-06 11 11 09

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will they support acceleration feature in the future? or it is not possible in arm?
so it will be a legacy benchmark?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've talked to the team and it's actually possible in arm for "asm" feature but they're not going to support this in the plan atm.

Copy link
Collaborator

@moven0831 moven0831 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Complete chores for the reviews

mopro-core/Cargo.toml Outdated Show resolved Hide resolved
To run the benchmarks of the instance size of $2^{16}$ on BLS12_377 curve in `mopro-core`, replace `<algorithm_you_want_to_test>` with the algorithm name listed above.

```bash
cargo test --release --features gpu-benchmarks --package mopro-core --lib -- middleware::gpu_explorations::<algorithm_you_want_to_test>::tests::test_run_benchmark --exact --nocapture
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the trapdoortech_zprize_msm is not compatible with BN254 curve, we commented out for now. Do you think we should remove the unused msm? Such as trapdoor's and halo2's?

Comment on lines 57 to 61
1. cd to the `mopro/` directory.
2. run `./scripts/build_ios.sh config-example.toml` (remember to change your ios_device_type `simulator`/`device`) to build and update the bindings.
3. open `mopro-ios/MoproKit/Example/MoproKit.xcworkspace` in Xcode.
4. choose your simulator/mobile device and build the project (can also use `cmd + R` as hot key).
5. choose `MSMBenchmark` and choose the algorithms and click the button below you want to start benchmark.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@FoodChain1028 would you like to help with this change

@@ -0,0 +1,7 @@
msm_size,num_msm,avg_processing_time(ms)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha, TLDR Carlos suggested that Halo2curve's msm is state-of-the-art. So we get benchmarks of it.

As you can see below, indeed, Halo2curve performs well with "asm" feature. However, this acceleration feature is only compatible with x86_64 architecture. Therefore, we didn't take this as a reference for our future work.

photo_2024-06-06 11 11 09

@moven0831 moven0831 force-pushed the feat/msm-with-gpu-on-laptop branch from df4f48f to 002fcfb Compare June 6, 2024 04:37
moven0831 and others added 26 commits June 6, 2024 12:39
@moven0831 moven0831 force-pushed the feat/msm-with-gpu-on-laptop branch from 002fcfb to 8eed163 Compare June 6, 2024 04:40
Copy link
Collaborator

@moven0831 moven0831 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've listed a few directions for future works

Comment on lines +42 to +71
// flatten scalar and base to Vec<u32> for GPU usage
let scalars_limbs = cfg_into_iter!(scalars)
.map(|s| s.into_bigint().to_u32_limbs())
.flatten()
.collect::<Vec<u32>>();
let bases_limbs = cfg_into_iter!(points)
.map(|b| {
let b = b.into_group();
b.x.to_u32_limbs()
.into_iter()
.chain(b.y.to_u32_limbs())
.chain(b.z.to_u32_limbs())
.collect::<Vec<_>>()
})
.flatten()
.collect::<Vec<u32>>();
let buckets_matrix_limbs = {
// buckets_size * num_windows is for parallelism on windows (variable-based MSM)
let matrix = vec![zero; buckets_size * window_starts.len()];
cfg_into_iter!(matrix)
.map(|b| {
b.x.to_u32_limbs()
.into_iter()
.chain(b.y.to_u32_limbs())
.chain(b.z.to_u32_limbs())
.collect::<Vec<_>>()
})
.flatten()
.collect::<Vec<u32>>()
};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possible optimization direction: enhance speed for preparing inputs for GPU

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you open a new issue?
I think it can be outside of this PR
(and because this PR is too big)

Comment on lines +112 to +166
let buckets_matrix = {
let raw_limbs = MetalState::retrieve_contents::<u32>(&buckets_matrix_buffer);
cfg_into_iter!(raw_limbs)
.chunks(24)
.map(|chunk| {
G::new_unchecked(
Fq::from_u32_limbs(&chunk[0..8]),
Fq::from_u32_limbs(&chunk[8..16]),
Fq::from_u32_limbs(&chunk[16..24]),
)
})
.collect::<Vec<_>>()
};

// include the last windox idx
let bucket_starts: Vec<usize> = (0..buckets_matrix.len() + buckets_size)
.step_by(buckets_size)
.collect();
let window_sums: Vec<_> = cfg_into_iter!(window_starts.clone())
.enumerate()
.map(|(idx, w_start)| {
// only process unit scalars once in the first window.
let mut res = zero;
if w_start == 0 {
for i in 0..instances_size {
if scalars[i] == one {
res += points[i];
}
}
}

let buckets = buckets_matrix[bucket_starts[idx]..bucket_starts[idx + 1]].to_vec();
let mut running_sum = zero;
buckets.into_iter().rev().for_each(|b| {
running_sum += &b;
res += &running_sum;
});
res
})
.collect();

let lowest = *window_sums.first().unwrap();

Ok(lowest
+ &window_sums[1..]
.iter()
.rev()
.fold(zero, |mut total, sum_i| {
total += sum_i;
for _ in 0..window_size {
total = total.double();
}
total
}))
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possible optimization direction: move all of this msm computation to GPU using metal

@moven0831
Copy link
Collaborator

Hi, @vivianjeng I've rebased to main and adapted changes from the code reviews above. After the review from @FoodChain1028 , we can consider to merge it back to main.

Copy link
Collaborator

@vivianjeng vivianjeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can also add some benchmark tests with CI?
(can be very simple but at least it can work)
you can add a new .yml file if you think it is separated from mopro

@@ -0,0 +1,7 @@
msm_size,num_msm,avg_processing_time(ms)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will they support acceleration feature in the future? or it is not possible in arm?
so it will be a legacy benchmark?

@@ -62,7 +62,6 @@ where
println!(
"Average time to execute MSM with {} points and {} scalars in {} iterations is: {:?}",
points.len(),
scalars.len(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be added?

mopro-ffi/src/lib.rs Show resolved Hide resolved
@moven0831 moven0831 force-pushed the feat/msm-with-gpu-on-laptop branch from 5652d53 to e99f439 Compare June 7, 2024 12:19
@vivianjeng vivianjeng merged commit 7e59a47 into main Jun 8, 2024
7 checks passed
@vivianjeng vivianjeng deleted the feat/msm-with-gpu-on-laptop branch June 8, 2024 03:15
vivianjeng added a commit that referenced this pull request Jun 16, 2024
* Fixed (untested) issue with IOS debug mode not able to build

* Script `install_deps.sh` typo

* If a path with Mopro IOS App contains a space the `mopro build --platforms ios` fails as spaces in address confuse the `cd` command.

* If a path PROJECT_DIR contains a space, it no longer bugs on a `cd` command due to  escaping with columns

* (feat: core) added support for proving / verifying the sample circuit in morpo-core

* (feat: ffi) implemented call interfaces for Halo2 proofs (single input argument)

* (feat: template) added Halo2CircuitView for IOS

* (feat: example) Renamed Halo2 example module to `halo2-example`

* (feat: example) Renamed example crate to `halo2-circuit` to be used by all

* (feat: core) Circom is only compiled when halo2 flag is absent

* (feat: core) Major rework to build script:
1. Seperated Circom and Halo2 build dependencies.
2. Added Halo2 build dependancies
   - Check that all keys have been generated.
   - Replace the placeholder `example` project with the user specified project by using ``paths`` override in ``.cargo/config.toml`.
   - Check that the user provided project meets requirements
       - Is a valid cargo crate
       - Is named as `halo2-circuit`

* (feat: cfg) an example of config for Halo2 as well as updated other configs.

* (fix: core) issue of build.rs not compiling due to function declaration hidden with conditional compilation.

Accidentally commited script changes. Reverted later.

* (feat: core) added the autogenerated `.cargo/config.toml` to gitignore

* (feat: ffi) added support for halo2 feature and conditional compilation for all functions (with a stud in case a different proof system is used)

* (fix: script) reverted changes to script

* (feat: tmpl) added kind to template

* (feat: tmpl) added Halo2 View to IOS template

* (feat: core) added debug prints for build script to know where the SRS, PK and VK are read from

* (feat: script) updated the scripts to support Halo2 circuits

* (feat: script) added a sample Halo2 circuit to the template

* (feat: script) added a sample Halo2 circuit config to the template config

* (feat: lock) stable Halo2 version

* (feat: example) added conditional compilation for circom example

* (feat: core+example) moved circuit specific logic to halo2-circuit crate, making middleware generic

* (feat: core) added deserialisation for circuit inputs

* (feat: ffi) updated ffi to be compatible with core changes

* (fix: core) warning cleanup

* (fix: core) build.rs warning cleanup

* (fix: ffi) fixed circom compilation issue

* (feat: example) created a folder for halo2 example circuits and moved example fibonacci circuit there

* (fix: example) remove workspace option as it fails build

* (fix: ios-template) update IOS template files to work with new Halo2 circuit inputs

* (feat: halo2-template) updated halo2 template with up-to-date halo2 example

* (fix: lock) updated lock file

* (feat: docs) added documentation on how to use Halo2 circuits

* fix: fix toolchain version

* fix: fix mopro test

* chore: remove unused config file, simplify prepare script

* feat: enable parallel in gpu-benchmarks

* fix: fix println

* style nav and footer

* add intro elements and text

* import features svgs and text

* features layout

* build with yarn

* remove package-lock.json

* fix: fix yarn.lock file

* change link colors

* set yarn version 3.6.3

* fix arrow image on Docs pages

* restore h1 class to default, update px to rem

* remove comments

* fix mobile header

* fix landing page for mobile

* fix: remove yarnrc

* fix: fix ci error

* fix: fix mopro test

* (fix: general) Changed "kind" parameter in config to "adapter" to be consistent with [docs](https://zkmopro.org/docs/intro).

* (fix: example) Commited the Halo2 Fibonacci circuit to `mopro-core/examples`, as before it was missing.

* (fix: general) Changed missed "kind" parameter in config to "adapter" to be consistent with [docs](https://zkmopro.org/docs/intro).

* (fix: linter) Fixed all linter errors for `cargo fmt --all`

* (fix: ffi) Circom tests are only to be run when circom flag is enabled.

* (fix: examples) Removed multiply example (to be added as a different circuit example).

* (fix: ffi) Resolved issue with indent.

* (refactor: common) Renamed `verify_halo2_proof2` and `prove_halo2_proof2`to `..._halo2_proof`

* Feat/Msm with gpu (metal shader language) on laptop (#150)

* feat(msm_benchmark): integrate zprize2022 TrapdoorTech msm algo on Rust

* refactor(msm_benchmark): separate arkworks_pippenger as baseline from others

* refactor(benchmark): rewrite scalars and points gen to preprocess

* refactor(baseline): rewrite benchmark method to 2^10 x 2^16 instance size

* refactor: modify benchmark standard to match zprize works

* feat(baseline): adopt zprize benchmarking method and enable multi benchmarking at once

* feat(ffi): integrate trapdoor tech msm in mopro-ffi

* feat(ffi): add test of trapdoor tech msm

* fix(ffi): modify input for trapdoor msm

* fix: update arkowrks pippenger input/ ouput

* fix: update trapdoor tech zprize msm input/ ouput

* fix: add the feature flag back

* fix: modify msm functions input

* fix: lint

* feat(benchmark data): accelerate the benchmark data generation. 2^20 x 10 benchmark data can be gen in 5 min

* feat: add a README file for gpu-exploration

* feat(gpu-explorations): benchmark msm on BN254 curve, which leads to approx. 30% faster

* feat(gpu-explorations): integrated halo2curve's msm and benchmarks

* refactor(msm): disable other msm's except the arkworks 0.4 msm
for offloading dependencies requirement

* feat(metal): provide basic structure of metal backend and rust wrapper for msm

* fix: compile pathway error

* chore: fix shader path config and identify parallel part of arkworks' pippenger as metal wrapper

* feat(metal): draft msm wrapper in Rust for metal backend

* doc: added reference for bls-12-381 and bn254

* chore(metal): add python helper to compute BN254 params

* feat(metal): introduce u256 type implementation and bn254 params

* docs(metal): generate abstract addition chain instructions for further implementation

* feat: add instruction for bn254 addchain

* test(metal): add fixed-params tests for bn254 operations

* test(metal): update u256 type and focus on add test

* test(metal): fix To and From BigInt format, provide better view on add_test result

* fix: make the path root-compatible

* chore: added error test

* feat: compiled metal lib

* chore(metal): update test log for better view on the bug

* fix(metal): correct logic of {to, from}_u32_limbs and addition logic in metal of [low, high, 0, 0]

* fix(metal): correct data repr logic between metal and arkworks and complete uint test

* fix(metal): update bn254 tests and fix logic

* fix(metal): update bn254 neg test

* fix(metal): use larger result arrays

* test(metal): add & sub fuzzing test for Fq_bn254

* fix(metal): correct the Montgomery Mul. Constant and complete mul test

* test(metal): add pow test for bn254 base field

* fix(metal): fix logic for msm usage on bn254

* fix(metal): correct the metal buffer index

* refactor(metal): utils module for data format between metal (GPU) and arkworks (CPU)

* fix(metal): correct encode/decode logic

* test(metal): add test for msm accumulation phase to ensure correctness of metal result

* test(metal): add test to bn254 points arithmetics

* fix: modified double_in_place

* refactor(metal): add limbs_conversion module for to/from metal

* feat(metal): implement arkworks msm accumulation logic in metal

* refactor(metal): add Fq conversion to/from limbs for metal

* feat(metal): compute msm bucket in window-wise fashion

* test(metal): add msm wrapper test on metal implementation

* feat(metal): implement msm with enabling GPU computation on accumulation phase

* refactor(metal): update paths for metal shader files

* feat: integrate metal msm into mopro-ffi

* feat(metal): add mont_reduction module for gpu result conversion

* test(metal): enable latest limb conversion and remove unused module

* feat(metal): optimize msm bucket computation with window-wise accumulation

* feat(metal): Rust wrapper for latest metal msm accumulation

* chore: update the instanceSize and numInstance in metal to make consistency

* chore: remove commented code for bls12_377 curve parsing

* fix: correct warning for GPU explorations code

* chore(gpu-benchmarks): correct minor changes

---------

Co-authored-by: moven0831 <[email protected]>

* (feat: ci) Added an option to run CI on when a PR is ready for review, opened, re-opened and synchronized.

* Script `install_deps.sh` typo

* If a path with Mopro IOS App contains a space the `mopro build --platforms ios` fails as spaces in address confuse the `cd` command.

* If a path PROJECT_DIR contains a space, it no longer bugs on a `cd` command due to  escaping with columns

* (feat: core) added support for proving / verifying the sample circuit in morpo-core

* (feat: ffi) implemented call interfaces for Halo2 proofs (single input argument)

* (feat: template) added Halo2CircuitView for IOS

* (feat: example) Renamed Halo2 example module to `halo2-example`

* (feat: example) Renamed example crate to `halo2-circuit` to be used by all

* (feat: core) Circom is only compiled when halo2 flag is absent

* (feat: core) Major rework to build script:
1. Seperated Circom and Halo2 build dependencies.
2. Added Halo2 build dependancies
   - Check that all keys have been generated.
   - Replace the placeholder `example` project with the user specified project by using ``paths`` override in ``.cargo/config.toml`.
   - Check that the user provided project meets requirements
       - Is a valid cargo crate
       - Is named as `halo2-circuit`

* (feat: cfg) an example of config for Halo2 as well as updated other configs.

* (fix: core) issue of build.rs not compiling due to function declaration hidden with conditional compilation.

Accidentally commited script changes. Reverted later.

* (feat: core) added the autogenerated `.cargo/config.toml` to gitignore

* (feat: ffi) added support for halo2 feature and conditional compilation for all functions (with a stud in case a different proof system is used)

* (fix: script) reverted changes to script

* (feat: tmpl) added kind to template

* (feat: tmpl) added Halo2 View to IOS template

* (feat: core) added debug prints for build script to know where the SRS, PK and VK are read from

* (feat: script) updated the scripts to support Halo2 circuits

* (feat: script) added a sample Halo2 circuit to the template

* (feat: script) added a sample Halo2 circuit config to the template config

* (feat: lock) stable Halo2 version

* (feat: example) added conditional compilation for circom example

* (feat: core+example) moved circuit specific logic to halo2-circuit crate, making middleware generic

* (feat: core) added deserialisation for circuit inputs

* (feat: ffi) updated ffi to be compatible with core changes

* (fix: core) warning cleanup

* (fix: core) build.rs warning cleanup

* (fix: ffi) fixed circom compilation issue

* (feat: example) created a folder for halo2 example circuits and moved example fibonacci circuit there

* (fix: example) remove workspace option as it fails build

* (fix: ios-template) update IOS template files to work with new Halo2 circuit inputs

* (feat: halo2-template) updated halo2 template with up-to-date halo2 example

* (fix: lock) updated lock file

* (fix: general) Changed "kind" parameter in config to "adapter" to be consistent with [docs](https://zkmopro.org/docs/intro).

* (fix: example) Commited the Halo2 Fibonacci circuit to `mopro-core/examples`, as before it was missing.

* (fix: general) Changed missed "kind" parameter in config to "adapter" to be consistent with [docs](https://zkmopro.org/docs/intro).

* (fix: linter) Fixed all linter errors for `cargo fmt --all`

* (fix: ffi) Circom tests are only to be run when circom flag is enabled.

* (fix: examples) Removed multiply example (to be added as a different circuit example).

* (fix: ffi) Resolved issue with indent.

* (refactor: common) Renamed `verify_halo2_proof2` and `prove_halo2_proof2`to `..._halo2_proof`

* (merge: ffi) Merged changes from GPU explorations to the FFI file, moving them from `lib.rs` to `circom.rs`

* (fix: ffi) Fix a compilation error in `circom.rs`

* chore: update cargo.lock

* (feat: web) Added a Halo2 page to the docs.

* (fix: conf) Fixed a GitHub conflict issue in config

* (chore: halo2) optimised use statement conditional compilation by grouping them together

* (fix: examples) fixed the halo2 fibonacci example README to be up-to-date with the content of the example crate

* (fix: template) moved updated fibonacci halo2 example to the template

* Script `install_deps.sh` typo

* If a path with Mopro IOS App contains a space the `mopro build --platforms ios` fails as spaces in address confuse the `cd` command.

* If a path PROJECT_DIR contains a space, it no longer bugs on a `cd` command due to  escaping with columns

* (feat: core) added support for proving / verifying the sample circuit in morpo-core

* (feat: ffi) implemented call interfaces for Halo2 proofs (single input argument)

* (feat: template) added Halo2CircuitView for IOS

* (feat: example) Renamed Halo2 example module to `halo2-example`

* (feat: example) Renamed example crate to `halo2-circuit` to be used by all

* (feat: core) Circom is only compiled when halo2 flag is absent

* (feat: core) Major rework to build script:
1. Seperated Circom and Halo2 build dependencies.
2. Added Halo2 build dependancies
   - Check that all keys have been generated.
   - Replace the placeholder `example` project with the user specified project by using ``paths`` override in ``.cargo/config.toml`.
   - Check that the user provided project meets requirements
       - Is a valid cargo crate
       - Is named as `halo2-circuit`

* (feat: cfg) an example of config for Halo2 as well as updated other configs.

* (fix: core) issue of build.rs not compiling due to function declaration hidden with conditional compilation.

Accidentally commited script changes. Reverted later.

* (feat: core) added the autogenerated `.cargo/config.toml` to gitignore

* (feat: ffi) added support for halo2 feature and conditional compilation for all functions (with a stud in case a different proof system is used)

* (fix: script) reverted changes to script

* (feat: tmpl) added kind to template

* (feat: tmpl) added Halo2 View to IOS template

* (feat: core) added debug prints for build script to know where the SRS, PK and VK are read from

* (feat: script) updated the scripts to support Halo2 circuits

* (feat: script) added a sample Halo2 circuit to the template

* (feat: script) added a sample Halo2 circuit config to the template config

* (feat: lock) stable Halo2 version

* (feat: example) added conditional compilation for circom example

* (feat: core+example) moved circuit specific logic to halo2-circuit crate, making middleware generic

* (feat: core) added deserialisation for circuit inputs

* (feat: ffi) updated ffi to be compatible with core changes

* (fix: core) warning cleanup

* (fix: core) build.rs warning cleanup

* (fix: ffi) fixed circom compilation issue

* (feat: example) created a folder for halo2 example circuits and moved example fibonacci circuit there

* (fix: example) remove workspace option as it fails build

* (fix: ios-template) update IOS template files to work with new Halo2 circuit inputs

* (feat: halo2-template) updated halo2 template with up-to-date halo2 example

* (fix: lock) updated lock file

* (fix: general) Changed "kind" parameter in config to "adapter" to be consistent with [docs](https://zkmopro.org/docs/intro).

* (fix: example) Commited the Halo2 Fibonacci circuit to `mopro-core/examples`, as before it was missing.

* (fix: general) Changed missed "kind" parameter in config to "adapter" to be consistent with [docs](https://zkmopro.org/docs/intro).

* (fix: linter) Fixed all linter errors for `cargo fmt --all`

* (fix: ffi) Circom tests are only to be run when circom flag is enabled.

* (fix: examples) Removed multiply example (to be added as a different circuit example).

* (fix: ffi) Resolved issue with indent.

* (refactor: common) Renamed `verify_halo2_proof2` and `prove_halo2_proof2`to `..._halo2_proof`

* (feat: ffi) implemented call interfaces for Halo2 proofs (single input argument)

* (feat: script) added a sample Halo2 circuit to the template

* (feat: ffi) updated ffi to be compatible with core changes

* (fix: example) Commited the Halo2 Fibonacci circuit to `mopro-core/examples`, as before it was missing.

* (fix: examples) Removed multiply example (to be added as a different circuit example).

* (fix: ffi) Resolved issue with indent.

* (refactor: common) Renamed `verify_halo2_proof2` and `prove_halo2_proof2`to `..._halo2_proof`

* (merge: ffi) Merged changes from GPU explorations to the FFI file, moving them from `lib.rs` to `circom.rs`

* (fix: ffi) Fix a compilation error in `circom.rs`

* (feat: web) Added a Halo2 page to the docs.

* (chore: halo2) optimised use statement conditional compilation by grouping them together

* (fix: examples) fixed the halo2 fibonacci example README to be up-to-date with the content of the example crate

* (fix: template) moved updated fibonacci halo2 example to the template

---------

Co-authored-by: Ya-wen, Jeng <[email protected]>
Co-authored-by: CJ Rose <[email protected]>
Co-authored-by: FoodChain <[email protected]>
Co-authored-by: moven0831 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants