Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/Msm with gpu (metal shader language) on laptop #150

Merged
merged 66 commits into from
Jun 8, 2024
Merged
Show file tree
Hide file tree
Changes from 65 commits
Commits
Show all changes
66 commits
Select commit Hold shift + click to select a range
c760b2c
feat(msm_benchmark): integrate zprize2022 TrapdoorTech msm algo on Rust
moven0831 Mar 24, 2024
01b036d
refactor(msm_benchmark): separate arkworks_pippenger as baseline from…
moven0831 Mar 24, 2024
2f6ce16
refactor(benchmark): rewrite scalars and points gen to preprocess
moven0831 Mar 31, 2024
51828ae
refactor(baseline): rewrite benchmark method to 2^10 x 2^16 instance …
moven0831 Apr 8, 2024
bcc0ebe
refactor: modify benchmark standard to match zprize works
moven0831 Apr 11, 2024
4e19e96
feat(baseline): adopt zprize benchmarking method and enable multi ben…
moven0831 Apr 11, 2024
3344167
feat(ffi): integrate trapdoor tech msm in mopro-ffi
FoodChain1028 Apr 11, 2024
af11bd4
feat(ffi): add test of trapdoor tech msm
FoodChain1028 Apr 11, 2024
e36cab6
fix(ffi): modify input for trapdoor msm
FoodChain1028 Apr 12, 2024
47a2872
fix: update arkowrks pippenger input/ ouput
FoodChain1028 Apr 15, 2024
6c955cf
fix: update trapdoor tech zprize msm input/ ouput
FoodChain1028 Apr 15, 2024
6de77ea
fix: add the feature flag back
FoodChain1028 Apr 15, 2024
b385344
fix: modify msm functions input
FoodChain1028 Apr 15, 2024
0ee47bc
fix: lint
FoodChain1028 Apr 15, 2024
6127e1e
feat(benchmark data): accelerate the benchmark data generation. 2^20 …
moven0831 Apr 16, 2024
75f1b73
feat: add a README file for gpu-exploration
FoodChain1028 Apr 16, 2024
86ace53
feat(gpu-explorations): benchmark msm on BN254 curve, which leads to …
moven0831 Apr 21, 2024
1a714da
feat(gpu-explorations): integrated halo2curve's msm and benchmarks
moven0831 Apr 23, 2024
a446f0a
refactor(msm): disable other msm's except the arkworks 0.4 msm
moven0831 Apr 30, 2024
69a39f0
feat(metal): provide basic structure of metal backend and rust wrappe…
moven0831 May 6, 2024
79447ab
fix: compile pathway error
FoodChain1028 May 7, 2024
b3a401c
chore: fix shader path config and identify parallel part of arkworks'…
moven0831 May 8, 2024
52dee44
feat(metal): draft msm wrapper in Rust for metal backend
moven0831 May 14, 2024
92cdb5f
doc: added reference for bls-12-381 and bn254
FoodChain1028 May 15, 2024
e233fe9
chore(metal): add python helper to compute BN254 params
moven0831 May 19, 2024
2438bdc
feat(metal): introduce u256 type implementation and bn254 params
moven0831 May 19, 2024
6001bc8
docs(metal): generate abstract addition chain instructions for furthe…
moven0831 May 19, 2024
9d1684f
feat: add instruction for bn254 addchain
FoodChain1028 May 20, 2024
79c28e6
test(metal): add fixed-params tests for bn254 operations
moven0831 May 20, 2024
7f98e76
test(metal): update u256 type and focus on add test
moven0831 May 20, 2024
bea50c8
test(metal): fix To and From BigInt format, provide better view on ad…
moven0831 May 21, 2024
f7c32f6
fix: make the path root-compatible
FoodChain1028 May 22, 2024
ac23566
chore: added error test
FoodChain1028 May 22, 2024
27967fa
feat: compiled metal lib
FoodChain1028 May 22, 2024
36730e1
chore(metal): update test log for better view on the bug
moven0831 May 24, 2024
0e5d8e5
fix(metal): correct logic of {to, from}_u32_limbs and addition logic …
moven0831 May 25, 2024
e7c1239
fix(metal): correct data repr logic between metal and arkworks and co…
moven0831 May 25, 2024
519596d
fix(metal): update bn254 tests and fix logic
FoodChain1028 May 25, 2024
67ae3f1
fix(metal): update bn254 neg test
FoodChain1028 May 25, 2024
255802a
fix(metal): use larger result arrays
FoodChain1028 May 25, 2024
554968a
test(metal): add & sub fuzzing test for Fq_bn254
moven0831 May 27, 2024
5b8c994
fix(metal): correct the Montgomery Mul. Constant and complete mul test
moven0831 May 27, 2024
f71cbd2
test(metal): add pow test for bn254 base field
moven0831 May 27, 2024
cfca30a
fix(metal): fix logic for msm usage on bn254
moven0831 May 27, 2024
c8b1c4f
fix(metal): correct the metal buffer index
FoodChain1028 May 27, 2024
d9a7279
refactor(metal): utils module for data format between metal (GPU) and…
moven0831 May 28, 2024
2270ad4
fix(metal): correct encode/decode logic
moven0831 May 28, 2024
3bc0b75
test(metal): add test for msm accumulation phase to ensure correctnes…
moven0831 May 30, 2024
d5a62e7
test(metal): add test to bn254 points arithmetics
moven0831 May 31, 2024
e83098b
fix: modified double_in_place
FoodChain1028 Jun 1, 2024
71cde1e
refactor(metal): add limbs_conversion module for to/from metal
moven0831 Jun 1, 2024
3a6d791
feat(metal): implement arkworks msm accumulation logic in metal
moven0831 Jun 1, 2024
c4fdea6
refactor(metal): add Fq conversion to/from limbs for metal
moven0831 Jun 2, 2024
a9f73a9
feat(metal): compute msm bucket in window-wise fashion
moven0831 Jun 2, 2024
8fef337
test(metal): add msm wrapper test on metal implementation
moven0831 Jun 2, 2024
edb21d6
feat(metal): implement msm with enabling GPU computation on accumulat…
moven0831 Jun 2, 2024
909e0b2
refactor(metal): update paths for metal shader files
moven0831 Jun 3, 2024
c1e61e0
feat: integrate metal msm into mopro-ffi
FoodChain1028 Jun 4, 2024
f0a5449
feat(metal): add mont_reduction module for gpu result conversion
moven0831 Jun 5, 2024
9659393
test(metal): enable latest limb conversion and remove unused module
moven0831 Jun 5, 2024
cfe112a
feat(metal): optimize msm bucket computation with window-wise accumul…
moven0831 Jun 5, 2024
0dd64fa
feat(metal): Rust wrapper for latest metal msm accumulation
moven0831 Jun 5, 2024
a3003d3
chore: update the instanceSize and numInstance in metal to make consi…
FoodChain1028 Jun 5, 2024
8eed163
chore: remove commented code for bls12_377 curve parsing
moven0831 Jun 6, 2024
bf0ae37
fix: correct warning for GPU explorations code
moven0831 Jun 6, 2024
e99f439
chore(gpu-benchmarks): correct minor changes
moven0831 Jun 7, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
387 changes: 172 additions & 215 deletions Cargo.lock

Large diffs are not rendered by default.

7 changes: 0 additions & 7 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,3 @@ exclude = ["mopro-example-app"]
# NOTE: Forked wasmer to work around memory limits
# See https://github.com/wasmerio/wasmer/commit/09c7070
wasmer = { git = "https://github.com/oskarth/wasmer.git", rev = "09c7070" }

# NOTE: For gpu exploration on zprize works, will only compile when `gpu-benchmarks` feature is enabled
ark-bls12-377-3 = { git = 'https://github.com/arkworks-rs/curves.git', package = 'ark-bls12-377', tag = 'v0.3.0', optional = true}
ark-ec-3 = { git = 'https://github.com/arkworks-rs/algebra.git', package = 'ark-ec', tag = 'v0.3.0', features = ["parallel"], optional = true }
ark-ff-3 = { git = 'https://github.com/arkworks-rs/algebra.git', package = 'ark-ff', tag = 'v0.3.0', features = ["parallel"], optional = true }
ark-serialize-3 = { git = 'https://github.com/arkworks-rs/algebra.git', package = 'ark-serialize', tag = 'v0.3.0', optional = true }
ark-std-3 = { git = 'https://github.com/arkworks-rs/std.git', package = 'ark-std', tag = 'v0.3.0', optional = true }
5 changes: 4 additions & 1 deletion mopro-core/.gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -14,4 +14,7 @@ Cargo.lock
*.pdb

# GPU exploration - preprocessed vectors
src/middleware/gpu_explorations/utils/vectors
src/middleware/gpu_explorations/utils/vectors

# GPU exploration - proptest generated files
proptest-regressions
28 changes: 7 additions & 21 deletions mopro-core/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ edition = "2021"
[features]
default = ["wasmer/dylib"]
dylib = [] # NOTE: can probably remove this if we use env config instead
gpu-benchmarks = ["ark-bls12-377", "ark-bls12-381", "ark-ed-on-bls12-377", "ark-ed-on-bls12-381", "ark-poly", "ark-poly-commit", "ark-sponge", "duration-string", "rand", "rand_chacha", "lazy_static", "ark-ec-3", "ark-ff-3", "ark-serialize-3", "ark-std-3", "ark-bls12-377-3", "parallel"]
gpu-benchmarks = ["ark-ff", "ark-bls12-381", "metal", "objc", "proptest", "parallel"]
calc-native-witness = ["witness"] # experimental feature to calculate witness with witness graph
build-native-witness = ["witness/build-witness"] # only enable build-native-witness feature when building the witness graph
parallel = ["rayon", "ark-std/parallel"]
Expand Down Expand Up @@ -48,27 +48,13 @@ color-eyre = "=0.6.2"
criterion = "=0.3.6"

# GPU explorations
ark-bls12-377 = { version = "0.4", optional = true }
ark-bls12-381 = { version = "0.3", optional = true }
ark-ed-on-bls12-377 = { version = "0.3", optional = true }
ark-ed-on-bls12-381 = { version = "0.3", optional = true }
ark-poly = { version = "0.3", optional = true }
ark-poly-commit = { version = "0.3", optional = true }
ark-sponge = { version = "0.3", optional = true }
duration-string = { version = "0.0.6", optional = true }
rand = { version = "0.8.0", optional = true }
rand_chacha = { version = "0.3.1", optional = true }
lazy_static = { version = "1.4.0", optional = true }
ark-ff = { version = "=0.4.1", default-features = false, features = [
ark-ff = { version = "=0.4.1", default-features = false, optional = true, features = [
"parallel",
] }

# GPU explorations from mopro/Cargo.toml patch
ark-bls12-377-3 = { git = 'https://github.com/arkworks-rs/curves.git', package = 'ark-bls12-377', tag = 'v0.3.0', optional = true}
ark-ec-3 = { git = 'https://github.com/arkworks-rs/algebra.git', package = 'ark-ec', tag = 'v0.3.0', features = ["parallel"], optional = true}
ark-ff-3 = { git = 'https://github.com/arkworks-rs/algebra.git', package = 'ark-ff', tag = 'v0.3.0', features = ["parallel"], optional = true }
ark-serialize-3 = { git = 'https://github.com/arkworks-rs/algebra.git', package = 'ark-serialize', tag = 'v0.3.0', optional = true }
ark-std-3 = { git = 'https://github.com/arkworks-rs/std.git', package = 'ark-std', tag = 'v0.3.0', optional = true }
ark-bls12-381 ={ version = "=0.4.0", optional = true }
metal = { version = "=0.28.0", optional = true }
objc ={ version = "=0.2.4", optional = true }
proptest ={ version = "1.4.0", optional = true }

[build-dependencies]
color-eyre = "0.6"
Expand All @@ -81,4 +67,4 @@ witness = { git = "https://github.com/philsippl/circom-witness-rs.git", optional

[dependencies.rayon]
version = "1"
optional = true
optional = true
Original file line number Diff line number Diff line change
@@ -1,9 +1,7 @@
msm_size,num_msm,avg_processing_time(ms)
8,5,3.302555666666667
8,10,1.7059585000000002
12,5,11.544680666666668
12,10,11.7898874
16,5,128.50465099999997
16,10,139.1740167
18,5,472.9359916
18,10,477.0808459000001
8,10,1.3042915
12,10,8.4526458
16,10,82.19201659999999
18,10,307.24009179999996
20,10,1140.8793625
22,10,4160.4375503
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
msm_size,num_msm,avg_processing_time(ms)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am interested about why halo2curve is benchmarked?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha, TLDR Carlos suggested that Halo2curve's msm is state-of-the-art. So we get benchmarks of it.

As you can see below, indeed, Halo2curve performs well with "asm" feature. However, this acceleration feature is only compatible with x86_64 architecture. Therefore, we didn't take this as a reference for our future work.

photo_2024-06-06 11 11 09

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will they support acceleration feature in the future? or it is not possible in arm?
so it will be a legacy benchmark?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've talked to the team and it's actually possible in arm for "asm" feature but they're not going to support this in the plan atm.

8,10,2.72165
12,10,12.6313959
16,10,116.79077500000001
18,10,410.9840459
20,10,1544.3454166
22,10,5254.7052669
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
msm_size,num_msm,avg_processing_time(ms)
8,10,5.959329
12,10,39.155546
16,10,464.353125
18,10,1649.2290374
20,10,6154.858971
22,10,20962.7212291
74 changes: 42 additions & 32 deletions mopro-core/src/middleware/gpu_explorations/arkworks_pippenger.rs
Original file line number Diff line number Diff line change
@@ -1,7 +1,5 @@
use ark_bls12_377::{Fr as ScalarField, G1Affine, G1Projective};
// use ark_bn254::{Fr as ScalarField, FrConfig, G1Affine as GAffine, G1Projective as G};
use ark_bn254::{Fr as ScalarField, G1Projective as G};
use ark_ec::VariableBaseMSM;
use ark_ff::BigInt;
use std::time::{Duration, Instant};

use crate::middleware::gpu_explorations::utils::{benchmark::BenchmarkResult, preprocess};
Expand All @@ -17,28 +15,16 @@ where

for instance in instances {
let points = &instance.0;
let scalars = &instance.1;
let mut parsed_points = Vec::<G1Affine>::new();
let mut parsed_scalars = Vec::<ScalarField>::new();

// parse points and scalars from arkworks 0.3 compatible format to 0.4 compatible
for p in points {
let new_p =
G1Affine::new_unchecked(BigInt::new(p.x.0 .0).into(), BigInt::new(p.y.0 .0).into());
parsed_points.push(new_p);
}

for s in scalars {
let new_s = ScalarField::new(BigInt::new(s.0));
parsed_scalars.push(new_s);
}

// map each scalar to a ScalarField
let scalars = &instance
.1
.iter()
.map(|s| ScalarField::new(*s))
.collect::<Vec<ScalarField>>();
let mut instance_total_duration = Duration::ZERO;
for _i in 0..iterations {
let start = Instant::now();
let _result =
<G1Projective as VariableBaseMSM>::msm(&parsed_points[..], &parsed_scalars[..])
.unwrap();
let _result = <G as VariableBaseMSM>::msm(&points[..], &scalars[..]).unwrap();

instance_total_duration += start.elapsed();
}
Expand Down Expand Up @@ -97,12 +83,18 @@ mod tests {

const INSTANCE_SIZE: u32 = 16;
const NUM_INSTANCE: u32 = 10;
const UTILSPATH: &str = "../mopro-core/src/middleware/gpu_explorations/utils/vectors";
const BENCHMARKSPATH: &str = "../mopro-core/gpu_explorations/benchmarks";
const UTILSPATH: &str = "mopro-core/src/middleware/gpu_explorations/utils/vectors";
const BENCHMARKSPATH: &str = "mopro-core/gpu_explorations/benchmarks";

#[test]
fn test_benchmark_msm() {
let dir = format!("{}/{}x{}", UTILSPATH, INSTANCE_SIZE, NUM_INSTANCE);
let dir = format!(
"{}/{}/{}x{}",
preprocess::get_root_path(),
UTILSPATH,
INSTANCE_SIZE,
NUM_INSTANCE
);

// Check if the vectors have been generated
match preprocess::FileInputIterator::open(&dir) {
Expand All @@ -121,29 +113,47 @@ mod tests {

#[test]
fn test_run_benchmark() {
let utils_path = format!("{}/{}x{}", &UTILSPATH, INSTANCE_SIZE, NUM_INSTANCE);
let utils_path = format!(
"{}/{}/{}x{}",
preprocess::get_root_path(),
&UTILSPATH,
INSTANCE_SIZE,
NUM_INSTANCE
);
let result = run_benchmark(INSTANCE_SIZE, NUM_INSTANCE, &utils_path).unwrap();
println!("Benchmark result: {:#?}", result);
}

#[test]
fn test_run_multi_benchmarks() {
let output_path = format!("{}/{}_benchmark.txt", &BENCHMARKSPATH, "arkworks_pippenger");
let output_path = format!(
"{}/{}/{}_benchmark.txt",
preprocess::get_root_path(),
&BENCHMARKSPATH,
"arkworks_pippenger"
);
let mut output_file = File::create(output_path).expect("output file creation failed");
writeln!(output_file, "msm_size,num_msm,avg_processing_time(ms)");
writeln!(output_file, "msm_size,num_msm,avg_processing_time(ms)").unwrap();

let instance_size = vec![8, 12, 16, 18, 20];
let num_instance = vec![5, 10];
let instance_size = vec![8, 12, 16, 18, 20, 22];
let num_instance = vec![10];
for size in &instance_size {
for num in &num_instance {
let utils_path = format!("{}/{}x{}", &UTILSPATH, *size, *num);
let utils_path = format!(
"{}/{}/{}x{}",
preprocess::get_root_path(),
&UTILSPATH,
*size,
*num
);
let result = run_benchmark(*size, *num, &utils_path).unwrap();
println!("{}x{} result: {:#?}", *size, *num, result);
writeln!(
output_file,
"{},{},{}",
result.instance_size, result.num_instance, result.avg_processing_time
);
)
.unwrap();
}
}
}
Expand Down
Loading
Loading