uint256: optimize WriteToArray and add PutUint256 #190

minh-bq · 2024-11-29T08:34:12Z

Make WriteToArray32 and WriteToArray20 use PutUint64 like in Bytes32 and Bytes20 to remove all branches and increase the number of bytes per 1 load/store to 8 bytes.

goos: linux
goarch: amd64
pkg: github.com/holiman/uint256
cpu: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
                 │    old.txt     │                new.txt                │
                 │     sec/op     │    sec/op      vs base                │
WriteToArray20-8   31.8000n ± 15%   0.7819n ± 17%  -97.54% (p=0.000 n=10)
WriteToArray32-8    57.640n ± 22%    1.050n ± 18%  -98.18% (p=0.000 n=10)
geomean              42.81n         0.9059n        -97.88%

By changing Memory.Set32 in go-ethereum to use new WriteToArray32 we can reduce a memory copy which makes OpMstore faster.

The patch

diff --git a/core/vm/memory.go b/core/vm/memory.go index 1ddd8d1ea..58d6f7383 100644
--- a/core/vm/memory.go
+++ b/core/vm/memory.go
@@ -73,8 +73,7 @@ func (m *Memory) Set32(offset uint64, val *uint256.Int) {
                panic("invalid memory: store empty")
        }
        // Fill in relevant bits
-       b32 := val.Bytes32()
-       copy(m.store[offset:], b32[:])
+       val.WriteToArray32((*[32]byte)(m.store[offset:]))
 }

Benchmark result

goos: linux
goarch: amd64
pkg: github.com/ethereum/go-ethereum/core/vm
cpu: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
           │   old.txt    │               new.txt                │
           │    sec/op    │    sec/op     vs base                │
OpMstore-8   19.40n ± 27%   14.18n ± 17%  -26.91% (p=0.002 n=10)

Make WriteToArray32 and WriteToArray20 use PutUint64 like in Bytes32 and Bytes20 to remove all branches and increase the number of bytes per 1 load/store to 8 bytes. goos: linux goarch: amd64 pkg: github.com/holiman/uint256 cpu: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz │ old.txt │ new.txt │ │ sec/op │ sec/op vs base │ WriteToArray20-8 31.8000n ± 15% 0.7819n ± 17% -97.54% (p=0.000 n=10) WriteToArray32-8 57.640n ± 22% 1.050n ± 18% -98.18% (p=0.000 n=10) geomean 42.81n 0.9059n -97.88% By changing Memory.Set32 in go-ethereum to use new WriteToArray32 we can reduce a memory copy which makes OpMstore faster. The patch diff --git a/core/vm/memory.go b/core/vm/memory.go index 1ddd8d1ea..58d6f7383 100644 --- a/core/vm/memory.go +++ b/core/vm/memory.go @@ -73,8 +73,7 @@ func (m *Memory) Set32(offset uint64, val *uint256.Int) { panic("invalid memory: store empty") } // Fill in relevant bits - b32 := val.Bytes32() - copy(m.store[offset:], b32[:]) + val.WriteToArray32((*[32]byte)(m.store[offset:])) } Benchmark result goos: linux goarch: amd64 pkg: github.com/ethereum/go-ethereum/core/vm cpu: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz │ old.txt │ new.txt │ │ sec/op │ sec/op vs base │ OpMstore-8 19.40n ± 27% 14.18n ± 17% -26.91% (p=0.002 n=10)

codecov · 2024-11-29T08:45:57Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 100.00%. Comparing base (555918b) to head (bba5c08).
Report is 1 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff            @@
##            master      #190   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files            5         5           
  Lines         1666      1675    +9     
=========================================
+ Hits          1666      1675    +9

holiman

Very nice, thanks!

holiman · 2024-11-29T14:00:31Z

Nice work!
I wasn't super-fond of the idea of doing val.WriteToArray32((*[32]byte)(m.store[offset:])), so instead I added a method in the same vein as the binary PutUint64 - methods: PutUint256 which just writes it into a slice, and it's up to the caller to ensure that there's sufficient size in the destination.

WDYT?

minh-bq · 2024-11-29T14:52:26Z

uint256.go

+// Note: The dest slice must be at least 32 bytes large, otherwise this
+// method will panic. The method WriteToSlice, which is slower,  should be used
+// if the destination slice is smaller or of unknown size.
+func (z *Int) PutUint256(dest []byte) {


I think we should add

_ = dest[31]

so that there is only 1 bound check at the beginning, otherwise, before each PutUint64, there will be a bound check.

Sure. I couldn't see any tangible benefit on a bench, but it doesn't hurt

minh-bq · 2024-11-29T14:55:09Z

I just leave a comment. Overall, that function looks good to me.

holiman · 2024-11-29T15:34:47Z

It is interesteing how much faster the new/optimized methods are, vs writing to slice:

[user@work uint256]$ go test . -run - -bench BenchmarkWriteTo
goos: linux
goarch: amd64
pkg: github.com/holiman/uint256
cpu: 12th Gen Intel(R) Core(TM) i7-1270P
BenchmarkWriteTo/fixed-20-8             684307134                1.541 ns/op
BenchmarkWriteTo/fixed-32-8             647183281                2.028 ns/op
BenchmarkWriteTo/slice-8                15876942                86.58 ns/op
BenchmarkWriteTo/put256-8               637238838                2.598 ns/op

…ory (#30868) (#650) commit ethereum/go-ethereum@5c58612. Updates geth to use the uint256 v1.3.2, and use faster memory-writer to speed up MSTORE. goos: linux goarch: amd64 pkg: github.com/ethereum/go-ethereum/core/vm cpu: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz │ old.txt │ new.txt │ │ sec/op │ sec/op vs base │ OpMstore-8 18.18n ± 8% 12.58n ± 8% -30.76% (p=0.000 n=10) Link: holiman/uint256#190 Co-authored-by: Martin HS <[email protected]>

minh-bq force-pushed the optimize-writetoarray branch from 8dedb60 to 412a3be Compare November 29, 2024 08:38

holiman approved these changes Nov 29, 2024

View reviewed changes

add method (*Int).PutUint256([]byte)

429e1b2

minh-bq commented Nov 29, 2024

View reviewed changes

improve bench, try to reduce bounds-check

bba5c08

holiman merged commit 439fbd4 into holiman:master Nov 29, 2024
6 of 7 checks passed

holiman changed the title ~~uint256: optimize WriteToArray~~ uint256: optimize WriteToArray and add PutUint256 Nov 29, 2024

minh-bq deleted the optimize-writetoarray branch December 2, 2024 03:16

minh-bq mentioned this pull request Dec 19, 2024

core/vm, go.mod: update uint256 and use faster method to write to memory (#30868) axieinfinity/ronin#650

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

uint256: optimize WriteToArray and add PutUint256 #190

uint256: optimize WriteToArray and add PutUint256 #190

minh-bq commented Nov 29, 2024 •

edited

Loading

codecov bot commented Nov 29, 2024 •

edited

Loading

holiman left a comment

holiman commented Nov 29, 2024

minh-bq Nov 29, 2024

holiman Nov 29, 2024

minh-bq commented Nov 29, 2024

holiman commented Nov 29, 2024 •

edited

Loading

uint256: optimize WriteToArray and add PutUint256 #190

uint256: optimize WriteToArray and add PutUint256 #190

Conversation

minh-bq commented Nov 29, 2024 • edited Loading

codecov bot commented Nov 29, 2024 • edited Loading

Codecov Report

holiman left a comment

Choose a reason for hiding this comment

holiman commented Nov 29, 2024

minh-bq Nov 29, 2024

Choose a reason for hiding this comment

holiman Nov 29, 2024

Choose a reason for hiding this comment

minh-bq commented Nov 29, 2024

holiman commented Nov 29, 2024 • edited Loading

minh-bq commented Nov 29, 2024 •

edited

Loading

codecov bot commented Nov 29, 2024 •

edited

Loading

holiman commented Nov 29, 2024 •

edited

Loading