Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(bench): memory usage depending on k_table_size #272

Open
cyphersnake opened this issue May 23, 2024 · 21 comments
Open

feat(bench): memory usage depending on k_table_size #272

cyphersnake opened this issue May 23, 2024 · 21 comments
Assignees
Labels
perf Only applies to optimizations and performance acceleration

Comments

@cyphersnake
Copy link
Collaborator

You need to do the same thing we did in #249, but with memory

@cyphersnake cyphersnake self-assigned this May 23, 2024
@cyphersnake cyphersnake added the perf Only applies to optimizations and performance acceleration label May 23, 2024
cyphersnake added a commit that referenced this issue May 23, 2024
@cyphersnake
Copy link
Collaborator Author

Ran it on the server along with dhat for all the same k as in #249

@cyphersnake
Copy link
Collaborator Author

cyphersnake commented May 23, 2024

For PRIMARY_K_TABLE_SIZE=17.

If this information is not enough for you, please give me any step, I will make you an itemization as per your request

| k     | repeat_count   | total_bytes    | total_blocks   | t_gmax        | t_gmax_blocks   | t_end   | t_end_blocks    |
| ---   | -------------- | -------------  | -------------- | --------      | --------------- | ------- | --------------- |
| 4.58  | 1              | 21,103,360,561 | 8,379,587      | 1,294,194,541 | 30,297          | 958,609 | 1,563           |
| 14.55 | 1000           | 21,149,395,860 | 8,607,137      | 1,294,194,541 | 30,297          | 958,609 | 1,563           |
| 15.55 | 2000           | 21,193,222,448 | 8,806,406      | 1,294,194,541 | 30,297          | 958,609 | 1,563           |
| 16.14 | 3000           | 21,242,455,990 | 9,068,519      | 1,294,194,541 | 30,297          | 958,609 | 1,563           |
| 16.55 | 4000           | 21,290,778,894 | 9,315,603      | 1,294,194,541 | 30,297          | 958,609 | 1,563           |
| 16.87 | 5000           | 21,335,068,150 | 9,518,476      | 1,294,194,541 | 30,297          | 958,609 | 1,563           |
  • k: The log2 from filled by poseidon step-circuit rows
  • repeat_count: The poseidon hash the repetition count.
  • total_bytes: The total number of bytes allocated over the entire execution.
  • total_blocks: The total number of heap blocks allocated over the entire execution.
  • t_gmax: The number of bytes alive at the time when the heap size reached its global maximum.
  • t_gmax_blocks: The number of heap blocks alive at the time when the heap size reached its global maximum.
  • t_end: The number of bytes alive at the end of execution (i.e., bytes not explicitly freed).
  • t_end_blocks: The number of heap blocks alive at the end of execution (i.e., blocks not explicitly freed).

@cyphersnake
Copy link
Collaborator Author

mem_profiling.tar.gz
Raw data, including dhat logs, which we can also analyze with https://nnethercote.github.io/dh_view/dh_view.html

@cyphersnake
Copy link
Collaborator Author

If you want, I can choose some other way to double-check the correctness of the result. Still, the utility I used is not production-ready

@drouyang
Copy link
Contributor

I observed 21GB usage for all settings. Does it mean same memory usage for different circuit, step_size etc? I was expecting larger circuit, more memory needed.

@cyphersnake
Copy link
Collaborator Author

Also ran benchmark where primamry & secondary k circuit size grows from 17 to 21

@cyphersnake
Copy link
Collaborator Author

cyphersnake commented May 23, 2024

I observed 21GB usage for all settings. Does it mean same memory usage for different circuit, step_size etc? I was expecting larger circuit, more memory needed.

I ran with different size of filled step circuit rows as I had earlier with benchmark.

Now I also added a test with different size of step circuit table. As soon as the result is ready, I will provide it here

@cyphersnake
Copy link
Collaborator Author

In this table {PRIMARY,SECONDARY}_CIRCUIT_K_TABLE_SIZE is equal to k, otherwise the algorithms are identical - fold 1 step

I used COMMITMENT_KEY_SIZE=27 so that I wouldn't have to change it between cases (it would have affected the size)

For 20, 21 - in process (with a custom allocator, it's going to be a long time)

| k   | total_bytes     | total_blocks   | t_gmax         | t_gmax_blocks   | t_end   | t_end_blocks    |
| --- | -------------   | -------------- | --------       | --------------- | ------- | --------------- |
| 17  | 46,470,334,145  | 8,384,081      | 26,415,304,923 | 15,860          | 958,633 | 1,566           |
| 18  | 66,329,954,655  | 13,391,175     | 27,059,130,587 | 15,860          | 958,633 | 1,566           |
| 19  | 103,941,909,225 | 22,422,684     | 28,346,781,915 | 15,860          | 958,633 | 1,566           |

@chaosma
Copy link
Collaborator

chaosma commented May 24, 2024

o change it between cases (it would have affected the size)

For 20, 21 - in process (with a custom allocator, it's going to be a long time)

I am a little confused about these two

(1)

k repeat_count total_bytes total_blocks t_gmax t_gmax_blocks t_end t_end_blocks
4.58 1 21,103,360,561 8,379,587 1,294,194,541 30,297 958,609 1,563
14.55 1000 21,149,395,860 8,607,137 1,294,194,541 30,297 958,609 1,563
15.55 2000 21,193,222,448 8,806,406 1,294,194,541 30,297 958,609 1,563
16.14 3000 21,242,455,990 9,068,519 1,294,194,541 30,297 958,609 1,563
16.55 4000 21,290,778,894 9,315,603 1,294,194,541 30,297 958,609 1,563
16.87 5000 21,335,068,150 9,518,476 1,294,194,541 30,297 958,609 1,563

And (2)

k total_bytes total_blocks t_gmax t_gmax_blocks t_end t_end_blocks
17 46,470,334,145 8,384,081 26,415,304,923 15,860 958,633 1,566
18 66,329,954,655 13,391,175 27,059,130,587 15,860 958,633 1,566
19 103,941,909,225 22,422,684 28,346,781,915 15,860 958,633 1,566

when k=17, the total bytes of (2) are much larger than table (1). But the table (1) also corresponds to Table size 17. Is that correct?

@chaosma
Copy link
Collaborator

chaosma commented May 24, 2024

I did some roughly memory estimate, when k=17, the memory usage is around 2GB. From your data above, it seems not subtract memory usage from other program? @cyphersnake

@cyphersnake
Copy link
Collaborator Author

when k=17, the total bytes of (2) are much larger than table (1). But the table (1) also corresponds to Table size 17. Is that correct?

You didn't notice that I said that to measure for different circuit sizes, I immediately chose COMMITMENT_KEY=27, which means about 16 GB for commitment key only

Also these are different `k's. In the first case it is filled data, in the second case it is absolute data. I'll correct the names now, so it's not confusing

@cyphersnake
Copy link
Collaborator Author

I did some roughly memory estimate, when k=17, the memory usage is around 2GB. From your data above, it seems not subtract memory usage from other program? @cyphersnake

k-table-size = 17
commitment-key = 21

Total: 19,449,044,966 bytes in 6,417,946 blocks
At t-gmax: 1,293,397,125 bytes in 28,956 blocks
At t-end: 161,193 bytes in 222 blocks

#273 (comment)

@cyphersnake
Copy link
Collaborator Author

If we use the size of the commitment key together with the size of the circuit, the degree of influence of the latter on the result will be too high to estimate the difference in performance, so I made the measurements with COMMITMENT_KEY the maximum of the necessary ones and wrote about it

@cyphersnake
Copy link
Collaborator Author

I will now summarize the results and describe them separately, describing beforehand the design of each experiment and the motivation for that particular design. If you want measurements with any specific parameters, just describe them. The key dimensions are.

  • primary_circuit_k_table_size
  • secondary_circuit_k_table_size
  • commitment_key_sizе - by the way, it can be different too, but we use the same one
  • filled by primary circuit rows count
  • filled by secondary circuit rows count

(the last two don't affect the result much)

@cyphersnake
Copy link
Collaborator Author

cyphersnake commented May 24, 2024

Common terms

  • repeat_count: The poseidon hash the repetition count.
  • total_bytes: The total number of bytes allocated over the entire execution.
  • total_blocks: The total number of heap blocks allocated over the entire execution.
  • t_gmax: The number of bytes alive at the time when the heap size reached its global maximum.
  • t_gmax_blocks: The number of heap blocks alive at the time when the heap size reached its global maximum.
  • t_end: The number of bytes alive at the end of execution (i.e., bytes not explicitly freed).
  • t_end_blocks: The number of heap blocks alive at the end of execution (i.e., blocks not explicitly freed).

Test 1

  • primary circuit - poseidon

  • secondary circuit - trivial

  • primary_circuit_k_table_size = 17

  • secondary_circuit_k_table_size = 17

  • commitment_key_sizе - 21 (64 MB * 2)

  • filled by primary circuit rows count - filled_k; varies for the experiment

  • filled by secondary circuit rows count - 0

| filled_k | repeat_count   | total_bytes    | total_blocks   | t_gmax        | t_gmax_blocks   | t_end   | t_end_blocks    |
| -------- | -------------- | -------------  | -------------- | --------      | --------------- | ------- | --------------- |
| 4.58     | 1              | 21,103,360,561 | 8,379,587      | 1,294,194,541 | 30,297          | 958,609 | 1,563           |
| 14.55    | 1000           | 21,149,395,860 | 8,607,137      | 1,294,194,541 | 30,297          | 958,609 | 1,563           |
| 15.55    | 2000           | 21,193,222,448 | 8,806,406      | 1,294,194,541 | 30,297          | 958,609 | 1,563           |
| 16.14    | 3000           | 21,242,455,990 | 9,068,519      | 1,294,194,541 | 30,297          | 958,609 | 1,563           |
| 16.55    | 4000           | 21,290,778,894 | 9,315,603      | 1,294,194,541 | 30,297          | 958,609 | 1,563           |
| 16.87    | 5000           | 21,335,068,150 | 9,518,476      | 1,294,194,541 | 30,297          | 958,609 | 1,563           |
  • filled_k: The log2 from filled by poseidon step-circuit rows

Test 2

  • primary circuit - poseidon

  • secondary circuit - trivial

  • primary_circuit_k_table_size = k; varies for the experiment

  • secondary_circuit_k_table_size = k; varies for the experiment

  • commitment_key_sizе - 27 (8 GB * 2)

  • filled by primary circuit rows count - 24

  • filled by secondary circuit rows count - 0

| k   | total_bytes     | total_blocks   | t_gmax         | t_gmax_blocks   | t_end   | t_end_blocks    |
| --- | -------------   | -------------- | --------       | --------------- | ------- | --------------- |
| 17  | 46,470,334,145  | 8,384,081      | 26,415,304,923 | 15,860          | 958,633 | 1,566           |
| 18  | 66,329,954,655  | 13,391,175     | 27,059,130,587 | 15,860          | 958,633 | 1,566           |
| 19  | 103,941,909,225 | 22,422,684     | 28,346,781,915 | 15,860          | 958,633 | 1,566           |
| 20  | 177,592,778,430 | 39,557,793     | 30,922,084,571 | 15,860          | 958,633 | 1,566           |
| 21  | 323,176,881,971 | 71,974,399     | 36,072,689,883 | 15,860          | 958,633 | 1,566           |

NOTE: In this test, it is not the absolute values that are important, as they are skewed by the large COMMITMENT_KEY, but the nature of the growth in memory usage.

Test 3

  • primary_circuit_k_table_size = 17
  • secondary_circuit_k_table_size = 17
  • commitment_key_sizе - 21
  • filled by primary circuit rows count - 24
  • filled by secondary circuit rows count - 0
| k   | total_bytes    | total_blocks   | t_gmax        | t_gmax_blocks   | t_end   | t_end_blocks    |
| --- | -------------  | -------------- | --------      | --------------- | ------- | --------------- |
| 17  | 19,449,044,966 | 6,417,946      | 1,293,397,125 | 28,956          | 161,193 | 222             |

cyphersnake added a commit that referenced this issue May 24, 2024
**Motivaion**
Generalized Implementation #272

**Overview**
Thanks to cli+dhat we can now run any test cases and measure any memory usage
@chaosma
Copy link
Collaborator

chaosma commented May 24, 2024

When table size k = 17 and commitment key size = 21, how do you get 19,449,044,966 which is ~20GB? The commitment key is just 256MB.

@cyphersnake
Copy link
Collaborator Author

cyphersnake commented May 24, 2024

When table size k = 17 and commitment key size = 21, how do you get 19,449,044,966 which is ~20GB? The commitment key is just 256MB.

total_bytes: The total number of bytes allocated over the entire execution.
t_gmax: The number of bytes alive at the time when the heap size reached its global maximum.

Let me draw your attention to the difference. The total blocks is the memory footprint, all the memory that was allocated (not at one moment)
The t_gmax is how much memory was occupied at one time at peak.

@chaosma
Copy link
Collaborator

chaosma commented May 25, 2024

Test 1: k=17, key size is 21. the t_gmax=1,294,194,541 which is 1.2 GB which makes sense.

Test 2: k=17, key size is 27 (16GB), t_gmax=26,415,304,923 which is 24.6GB, after subtract the commitment key in memory which left us 8.6GB.

In both two tests, we have same circuit size but different commitment key. After we subtract the commitment key size from the memory, I would expect them to be similar size, but they are not. I don't understand why. Any idea about it? @cyphersnake

@cyphersnake
Copy link
Collaborator Author

Any idea about it?

Don't forget about serialization of commitment key in PublicParams

@chaosma
Copy link
Collaborator

chaosma commented May 27, 2024

Any idea about it?

Don't forget about serialization of commitment key in PublicParams

The keys of bn256 and grumpkin are total 8GB+8GB=16GB. Why the increase of memory is 8GB but not 0GB or 16GB? How does the serialization of commitment keys internally work?

cyphersnake added a commit that referenced this issue May 28, 2024
**Motivation**
Now we have a default path for time-profling & mem-profiling, but it is
not very convenient to vary the runtime parameters by hand. That's why I
put all runtime parameters in a separate cli example. This allows us to
do time-profiling & mem-profiling on different parameters directly on
the command line
Part of #272 

**Overview**
Fairly simple code that parses parameters with clap and runs circuit.
Covers all our examples

Once mem-profiling is in main, I'll add its description to the README
along with this example

```console
Usage: cli [OPTIONS] [PRIMARY_CIRCUIT] [SECONDARY_CIRCUIT]

Arguments:
  [PRIMARY_CIRCUIT]    [default: poseidon] [possible values: poseidon, trivial]
  [SECONDARY_CIRCUIT]  [default: trivial] [possible values: poseidon, trivial]

Options:
      --primary-circuit-k-table-size <PRIMARY_CIRCUIT_K_TABLE_SIZE>      [default: 17]
      --primary-commitment-key-size <PRIMARY_COMMITMENT_KEY_SIZE>        [default: 21]
      --primary-repeat-count <PRIMARY_REPEAT_COUNT>                      [default: 1]
      --primary-r-f <PRIMARY_R_F>                                        [default: 10]
      --primary-r-p <PRIMARY_R_P>                                        [default: 10]
      --secondary-circuit-k-table-size <SECONDARY_CIRCUIT_K_TABLE_SIZE>  [default: 17]
      --secondary-commitment-key-size <SECONDARY_COMMITMENT_KEY_SIZE>    [default: 21]
      --secondary-repeat-count <SECONDARY_REPEAT_COUNT>                  [default: 1]
      --secondary-r-f <SECONDARY_R_F>                                    [default: 10]
      --secondary-r-p <SECONDARY_R_P>                                    [default: 10]
      --limb-width <LIMB_WIDTH>                                          [default: 32]
      --limbs-count <LIMBS_COUNT>                                        [default: 10]
      --debug-mode
      --fold-step-count <FOLD_STEP_COUNT>                                [default: 1]
      --json-logs
  -h, --help                                                             Print help
  -V, --version                                                          Print version
```
cyphersnake added a commit that referenced this issue May 28, 2024
**Motivaion**
Generalized Implementation #272

**Overview**
Thanks to cli+dhat we can now run any test cases and measure any memory usage
cyphersnake added a commit that referenced this issue May 28, 2024
**Motivaion**
Generalized Implementation #272
Part of #249 

**Overview**
Thanks to cli+dhat we can now run any test cases and measure any memory
usage
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
perf Only applies to optimizations and performance acceleration
Projects
None yet
Development

No branches or pull requests

3 participants