Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance analysis of DATASET_PARENTS​=512 #50

Open
chfast opened this issue Sep 10, 2019 · 3 comments
Open

Performance analysis of DATASET_PARENTS​=512 #50

chfast opened this issue Sep 10, 2019 · 3 comments

Comments

@chfast
Copy link

chfast commented Sep 10, 2019

The ProgPoW software audit recommend to increase the DATASET_PARENTS​ Ethash cache parameter from 256 to 512. This has direct impact on verification performance as the time for single verification doubles (while ProgPoW verification slowdown is only 30-50% over Ethash).

The DATASET_PARENTS​ increase makes the verification "even more" memory hard and lowers the instruction per cycle ratio to 1 (the max being 4).

ProgPoW verification, DATASET_PARENTS = 256, epoch 0:

cset shield -- perf stat -B -e cache-references,cache-misses,cycles,instructions test/ethash-bench --benchmark_filter=progpow_hash/0
cset: **> 1 tasks are not movable, impossible to move
cset: --> last message, executed args into cpuset "/user", new pid is: 10825
2019-09-10 14:19:50
Running test/ethash-bench
Run on (8 X 4400 MHz CPU s)
CPU Caches:
  L1 Data 32K (x4)
  L1 Instruction 32K (x4)
  L2 Unified 256K (x4)
  L3 Unified 8192K (x1)
------------------------------------------------------
Benchmark               Time           CPU Iterations
------------------------------------------------------
progpow_hash/0       1960 us       1960 us        347

 Performance counter stats for 'test/ethash-bench --benchmark_filter=progpow_hash/0':

        65 642 783      cache-references                                            
        39 184 374      cache-misses              #   59,693 % of all cache refs    
     5 636 657 996      cycles                                                      
     7 104 679 821      instructions              #    1,26  insn per cycle         

       1,314309256 seconds time elapsed

       1,296116000 seconds user
       0,000000000 seconds sys

ProgPoW verification, DATASET_PARENTS = 512, epoch 0:

cset shield -- perf stat -B -e cache-references,cache-misses,cycles,instructions test/ethash-bench --benchmark_filter=progpow_hash/0
cset: **> 1 tasks are not movable, impossible to move
cset: --> last message, executed args into cpuset "/user", new pid is: 10697
2019-09-10 14:19:26
Running test/ethash-bench
Run on (8 X 4400 MHz CPU s)
CPU Caches:
  L1 Data 32K (x4)
  L1 Instruction 32K (x4)
  L2 Unified 256K (x4)
  L3 Unified 8192K (x1)
------------------------------------------------------
Benchmark               Time           CPU Iterations
------------------------------------------------------
progpow_hash/0       3695 us       3694 us        195

 Performance counter stats for 'test/ethash-bench --benchmark_filter=progpow_hash/0':

        87 073 601      cache-references                                            
        48 426 695      cache-misses              #   55,616 % of all cache refs    
     6 589 826 522      cycles                                                      
     6 898 095 482      instructions              #    1,05  insn per cycle         

       1,534862112 seconds time elapsed

       1,512262000 seconds user
       0,004011000 seconds sys
@solardiz
Copy link
Contributor

How about increasing the size of the DAG cache instead, above Ethereum's current curve, at the time of the switch to ProgPoW? Sizes of a few hundred MB should be acceptable for light verification now, and wouldn't result in significantly slower verification (right?)

@chfast
Copy link
Author

chfast commented Sep 10, 2019

Sizes of a few hundred MB should be acceptable for light verification now, and wouldn't result in significantly slower verification (right?)

That does not seem to be the case. From my observations, the verification strictly depends on the Cache access time and depends on L3 cache size in CPU. The more memory of the Cache will not fit into L3 cache the slower it will be.

@solardiz
Copy link
Contributor

I used the word "significantly" specifically to account for the potential slight slowdown from the lower L3 cache hit rate. The DAG cache is already in excess of typical CPUs' L3 cache sizes (although those are increasing as well). In my experience (not with Ethash/ProgPoW, though), while L3 cache is a lot faster than RAM in synthetic benchmarks designed to fit in the cache, it provides little speedup for non-trivial algorithms - e.g., for yescrypt on a typical server platform there's little reduction in bandwidth when going from 16 MiB to higher sizes (even when I tweak it to reduce the amount of computation so that it could potentially use more bandwidth with the lower sizes). I've even seen cases where L3 cache hurt performance, compared to reading non-cached data from RAM, when the data happened to be cached in a CPU in a different socket.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants