diff --git a/barretenberg/cpp/src/barretenberg/common/thread.cpp b/barretenberg/cpp/src/barretenberg/common/thread.cpp index 04a71f5746b..6d3007c8739 100644 --- a/barretenberg/cpp/src/barretenberg/common/thread.cpp +++ b/barretenberg/cpp/src/barretenberg/common/thread.cpp @@ -9,11 +9,11 @@ * The first implementation was `parallel_for_spawning`. You can read a description of each implementation in the * relevant source file, but parallel_for_spawning is the simplest approach imaginable. * Once WASM was working, I checked its performance in native code by running it against the polynomials benchmarks. - * In doing so, OMP outperformed it significantly (at least for FFT algorithims). This set me on a course to try + * In doing so, OMP outperformed it significantly (at least for FFT algorithms). This set me on a course to try * and understand why and to provide a suitable alternative. Ultimately I found solutions that compared to OMP with * "moody" and "atomic_pool" solutions, although they were not *quite* as fast as OMP. However interestingly, when it * comes to actual "real world" testing (with proof construction), rather than raw benchmarking, most of the solutions - * performaed about the same, with OMP *actually slightly worse*. So maybe all this effort was a bit redundant. + * performed about the same, with OMP *actually slightly worse*. So maybe all this effort was a bit redundant. * Remember to always do real world testing... * * My theory as to why OMP performs so much better in benchmarks is because it runs the tests in a very tight loop, @@ -85,4 +85,4 @@ void parallel_for(size_t num_iterations, const std::function<void(size_t)>& func // parallel_for_queued(num_iterations, func); #endif #endif -} \ No newline at end of file +} diff --git a/barretenberg/cpp/src/barretenberg/ecc/groups/wnaf.hpp b/barretenberg/cpp/src/barretenberg/ecc/groups/wnaf.hpp index c6dbee10ead..846d11d64c0 100644 --- a/barretenberg/cpp/src/barretenberg/ecc/groups/wnaf.hpp +++ b/barretenberg/cpp/src/barretenberg/ecc/groups/wnaf.hpp @@ -239,12 +239,12 @@ inline uint64_t get_num_scalar_bits(const uint64_t* scalar) * * N.B. IN OUR STDLIB ALGORITHMS THE SKEW VALUE REPRESENTS AN ADDITION NOT A SUBTRACTION (i.e. we add +1 at the end of * the scalar mul algo we don't sub 1) (this is to eliminate situations which could produce the point at infinity as an - * output as our circuit logic cannot accomodate this edge case). + * output as our circuit logic cannot accommodate this edge case). * * Credits: Zac W. * * @param scalar Pointer to the 128-bit non-montgomery scalar that is supposed to be transformed into wnaf - * @param wnaf Pointer to output array that needs to accomodate enough 64-bit WNAF entries + * @param wnaf Pointer to output array that needs to accommodate enough 64-bit WNAF entries * @param skew_map Reference to output skew value, which if true shows that the point should be added once at the end of * computation * @param wnaf_round_counts Pointer to output array specifying the number of points participating in each round @@ -497,4 +497,4 @@ inline void fixed_wnaf_with_restricted_first_slice(uint64_t* scalar, // } } // namespace barretenberg::wnaf -// NOLINTEND(readability-implicit-bool-conversion) \ No newline at end of file +// NOLINTEND(readability-implicit-bool-conversion) diff --git a/barretenberg/cpp/src/barretenberg/ecc/pippenger.md b/barretenberg/cpp/src/barretenberg/ecc/pippenger.md index 18f8b85941f..a7463663216 100644 --- a/barretenberg/cpp/src/barretenberg/ecc/pippenger.md +++ b/barretenberg/cpp/src/barretenberg/ecc/pippenger.md @@ -26,7 +26,7 @@ So, for example, if the most significant 6 bits of a scalar are `011001` (25), w At the end of each round, we then 'concatenate' all of the buckets into a sum. Let's represent each bucket accumulator in an array `A[num_buckets]`. The concatenation phase will compute `A[0] + 2A[1] + 3A[2] + 4A[3] + 5A[4] + ... = Sum`. -Finally, we add each `Sum` point into an overall accumulator. For example, for a set of 254 bit scalars, if we evaluate the most 6 significant bits of each scalar and accumulate the resulting point into `Sum`, we actually need `(2^{248}).Sum` to accomodate for the bit shift. +Finally, we add each `Sum` point into an overall accumulator. For example, for a set of 254 bit scalars, if we evaluate the most 6 significant bits of each scalar and accumulate the resulting point into `Sum`, we actually need `(2^{248}).Sum` to accommodate for the bit shift. This final step is similar to the 'double and add' algorithm in a traditional scalar multiplication algorithm - we start at the most significant bit slice and work our way down to minimize the number of doublings. At each round, we multiply the overall accumulator by 2^{bit slice size} - by the time we iterate over every round, we will have performed the total required number of doublings for every 'bit slice' that we add into our accumulator. @@ -38,7 +38,7 @@ Total run time = 16 * 2^{18} + 16 * 2^{15} = 18 * 2^{18}. So the aggregate numbe ## The problem with Pippenger's algorithm -As it is currently implemented, each round will iterate over the points to be added, and add each point into one of the round's buckets. Whilst point access is sequential in memory, bucket access is very much not. In fact, if the points being multiplied are from a zero-knowlege proof, bucket access is literally uniformly randomly distributed and therefore presents the worst-case scenario. +As it is currently implemented, each round will iterate over the points to be added, and add each point into one of the round's buckets. Whilst point access is sequential in memory, bucket access is very much not. In fact, if the points being multiplied are from a zero-knowledge proof, bucket access is literally uniformly randomly distributed and therefore presents the worst-case scenario. This makes it difficult to parallelize. It is not possible to simply assign threads a section of points to iterate over, because of race conditions when two threads access the same bucket. @@ -69,8 +69,8 @@ Drawbacks of this approach: ## Summary -By restructuring the memory heirarchy of our pippenger algorithm, we can create a parallelizable version of pippenger. This will significantly simplify the logic of our PLONK prover (instead of allocating threads for batches of multi-exponentations, we can multi-thread individual multi-exponentiations, simplifying our thread logic). +By restructuring the memory hierarchy of our pippenger algorithm, we can create a parallelizable version of pippenger. This will significantly simplify the logic of our PLONK prover (instead of allocating threads for batches of multi-exponentations, we can multi-thread individual multi-exponentiations, simplifying our thread logic). This will concretely reduce the number of pippenger rounds of our multi-exponentiations by approximately 1, giving a theoretical 15% speed-up. Some of this will be eaten by the run-time of the radix sort. -Longer term, this parallelizable algorithm will be significantly easier to adapt for GPUs, using OpenCL. \ No newline at end of file +Longer term, this parallelizable algorithm will be significantly easier to adapt for GPUs, using OpenCL. diff --git a/barretenberg/cpp/src/barretenberg/plonk/proof_system/utils/generalized_permutation.hpp b/barretenberg/cpp/src/barretenberg/plonk/proof_system/utils/generalized_permutation.hpp index fbf08b9c102..7741d113b06 100644 --- a/barretenberg/cpp/src/barretenberg/plonk/proof_system/utils/generalized_permutation.hpp +++ b/barretenberg/cpp/src/barretenberg/plonk/proof_system/utils/generalized_permutation.hpp @@ -27,7 +27,7 @@ inline void compute_gen_permutation_lagrange_base_single(barretenberg::polynomia // here, 'index' refers to an element of our subgroup H // we can almost use permutation[i] to directly index our `roots` array, which contains our subgroup elements // we first have to mask off the 2 high bits, which describe which wire polynomial our permutation maps to (we'll - // deal with that in a bit) we then have to accomodate for the fact that, `roots` only contains *half* of our + // deal with that in a bit) we then have to accommodate for the fact that, `roots` only contains *half* of our // subgroup elements. this is because w^{n/2} = -w and we don't want to perform redundant work computing roots of // unity diff --git a/barretenberg/cpp/src/barretenberg/plonk/proof_system/utils/permutation.hpp b/barretenberg/cpp/src/barretenberg/plonk/proof_system/utils/permutation.hpp index 5f3a78d1a11..7d5565251b9 100644 --- a/barretenberg/cpp/src/barretenberg/plonk/proof_system/utils/permutation.hpp +++ b/barretenberg/cpp/src/barretenberg/plonk/proof_system/utils/permutation.hpp @@ -65,7 +65,7 @@ inline void compute_permutation_lagrange_base_single(barretenberg::polynomial& o // `permutation[i]` will specify the 'index' that this wire value will map to. // Here, 'index' refers to an element of our subgroup H. // We can almost use `permutation[i]` to directly index our `roots` array, which contains our subgroup elements. - // We first have to accomodate for the fact that `roots` only contains *half* of our subgroup elements. This is + // We first have to accommodate for the fact that `roots` only contains *half* of our subgroup elements. This is // because ω^{n/2} = -ω and we don't want to perform redundant work computing roots of unity. size_t raw_idx = permutation[i].subgroup_index; diff --git a/barretenberg/cpp/src/barretenberg/proof_system/circuit_builder/eccvm/eccvm_circuit_builder.hpp b/barretenberg/cpp/src/barretenberg/proof_system/circuit_builder/eccvm/eccvm_circuit_builder.hpp index 6bee55c0d4c..9468df3aa6e 100644 --- a/barretenberg/cpp/src/barretenberg/proof_system/circuit_builder/eccvm/eccvm_circuit_builder.hpp +++ b/barretenberg/cpp/src/barretenberg/proof_system/circuit_builder/eccvm/eccvm_circuit_builder.hpp @@ -390,7 +390,7 @@ template <typename Flavor> class ECCVMCircuitBuilder { } } for (size_t i = 0; i < precompute_table_state.size(); ++i) { - // first row is always an empty row (to accomodate shifted polynomials which must have 0 as 1st + // first row is always an empty row (to accommodate shifted polynomials which must have 0 as 1st // coefficient). All other rows in the precompute_table_state represent active wnaf gates (i.e. // precompute_select = 1) polys.precompute_select[i] = (i != 0) ? 1 : 0; diff --git a/barretenberg/cpp/src/barretenberg/proof_system/circuit_builder/ultra_circuit_builder.cpp b/barretenberg/cpp/src/barretenberg/proof_system/circuit_builder/ultra_circuit_builder.cpp index 047e3e8489e..42ef656b807 100644 --- a/barretenberg/cpp/src/barretenberg/proof_system/circuit_builder/ultra_circuit_builder.cpp +++ b/barretenberg/cpp/src/barretenberg/proof_system/circuit_builder/ultra_circuit_builder.cpp @@ -701,7 +701,7 @@ std::vector<uint32_t> UltraCircuitBuilder_<Arithmetization>::decompose_into_defa * This is not strictly required iff num_bits <= target_range_bitnum. * However, this produces an edge-case where a variable is range-constrained but NOT present in an arithmetic gate. * This in turn produces an unsatisfiable circuit (see `create_new_range_constraint`). We would need to check for - * and accomodate/reject this edge case to support not adding addition gates here if not reqiured + * and accommodate/reject this edge case to support not adding addition gates here if not reqiured * if (num_bits <= target_range_bitnum) { * const uint64_t expected_range = (1ULL << num_bits) - 1ULL; * create_new_range_constraint(variable_index, expected_range); @@ -3454,4 +3454,4 @@ template class UltraCircuitBuilder_<arithmetization::UltraHonk<barretenberg::fr> // To enable this we need to template plookup // template class UltraCircuitBuilder_<grumpkin::fr>; -} // namespace proof_system \ No newline at end of file +} // namespace proof_system diff --git a/barretenberg/cpp/src/barretenberg/proof_system/composer/permutation_lib.hpp b/barretenberg/cpp/src/barretenberg/proof_system/composer/permutation_lib.hpp index 27dac50a025..90a7f0152f0 100644 --- a/barretenberg/cpp/src/barretenberg/proof_system/composer/permutation_lib.hpp +++ b/barretenberg/cpp/src/barretenberg/proof_system/composer/permutation_lib.hpp @@ -347,7 +347,7 @@ inline void compute_standard_plonk_lagrange_polynomial(barretenberg::polynomial& // `permutation[i]` will specify the 'index' that this wire value will map to. // Here, 'index' refers to an element of our subgroup H. // We can almost use `permutation[i]` to directly index our `roots` array, which contains our subgroup elements. - // We first have to accomodate for the fact that `roots` only contains *half* of our subgroup elements. This is + // We first have to accommodate for the fact that `roots` only contains *half* of our subgroup elements. This is // because ω^{n/2} = -ω and we don't want to perform redundant work computing roots of unity. size_t raw_idx = permutation[i].row_index; diff --git a/barretenberg/cpp/src/barretenberg/proof_system/plookup_tables/keccak/keccak_rho.hpp b/barretenberg/cpp/src/barretenberg/proof_system/plookup_tables/keccak/keccak_rho.hpp index d29f4009b05..158c6119d61 100644 --- a/barretenberg/cpp/src/barretenberg/proof_system/plookup_tables/keccak/keccak_rho.hpp +++ b/barretenberg/cpp/src/barretenberg/proof_system/plookup_tables/keccak/keccak_rho.hpp @@ -40,7 +40,7 @@ namespace keccak_tables { * * We need multiple Rho tables in order to efficiently range-constrain our input slices. * - * The maximum number of bits we can accomodate in this lookup table is MAXIMUM_MULTITABLE_BITS (assume this is 8) + * The maximum number of bits we can accommodate in this lookup table is MAXIMUM_MULTITABLE_BITS (assume this is 8) * For example take a left-rotation by 1 bit. The right-slice will be a 63-bit integer. * 63 does not evenly divide 8. i.e. an 8-bit table cannot correctly range-constrain the input slice and we would need * additional range constraints. @@ -49,7 +49,7 @@ namespace keccak_tables { * We can stitch together a lookup table sequence that correctly range constrains the left/right slices for any of our * 25 rotation values * - * @tparam TABLE_BITS The number of bits each lookup table can accomodate + * @tparam TABLE_BITS The number of bits each lookup table can accommodate * @tparam LANE_INDEX Required by get_rho_output_table to produce the correct MultiTable */ template <size_t TABLE_BITS = 0, size_t LANE_INDEX = 0> class Rho { diff --git a/barretenberg/cpp/src/barretenberg/smt_verification/README.md b/barretenberg/cpp/src/barretenberg/smt_verification/README.md index fdb6d1c2072..8dfe2fc1548 100644 --- a/barretenberg/cpp/src/barretenberg/smt_verification/README.md +++ b/barretenberg/cpp/src/barretenberg/smt_verification/README.md @@ -71,7 +71,7 @@ To store it on the disk just do `FFTerm` - the symbolic value that simulates finite field elements. - `FFTerm` - the symbolic value that simulates integer elements which behave like finite field ones. Usefull, when you want to create range constraints or perform operations like XOR. + `FFTerm` - the symbolic value that simulates integer elements which behave like finite field ones. Useful, when you want to create range constraints or perform operations like XOR. `Bool` - simulates the boolean values and mostly will be used only to simulate complex `if` statements if needed. @@ -250,4 +250,4 @@ void model_variables(Circuit<smt_terms::FFTerm>& c, Solver* s, FFTerm& evaluatio } ``` -More examples can be found in *.test.cpp files \ No newline at end of file +More examples can be found in *.test.cpp files diff --git a/barretenberg/cpp/src/barretenberg/stdlib/primitives/uint/logic.cpp b/barretenberg/cpp/src/barretenberg/stdlib/primitives/uint/logic.cpp index 72cc1dee046..0effaca5ed4 100644 --- a/barretenberg/cpp/src/barretenberg/stdlib/primitives/uint/logic.cpp +++ b/barretenberg/cpp/src/barretenberg/stdlib/primitives/uint/logic.cpp @@ -110,7 +110,7 @@ uint<Builder, Native> uint<Builder, Native>::operator>>(const size_t shift) cons * * We have a special selector configuration in our arithmetic widget that extracts 6.b_x from given the two * relevant accumulators. The factor of 6 is for efficiency reasons. We need to scale our other gate - * coefficients by 6 to accomodate this. + * coefficients by 6 to accommodate this. **/ if ((shift & 1) == 0) {