Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

uint256: optimize WriteToArray and add PutUint256 #190

Merged
merged 3 commits into from
Nov 29, 2024

Conversation

minh-bq
Copy link
Contributor

@minh-bq minh-bq commented Nov 29, 2024

Make WriteToArray32 and WriteToArray20 use PutUint64 like in Bytes32 and Bytes20 to remove all branches and increase the number of bytes per 1 load/store to 8 bytes.

goos: linux
goarch: amd64
pkg: github.com/holiman/uint256
cpu: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
                 │    old.txt     │                new.txt                │
                 │     sec/op     │    sec/op      vs base                │
WriteToArray20-8   31.8000n ± 15%   0.7819n ± 17%  -97.54% (p=0.000 n=10)
WriteToArray32-8    57.640n ± 22%    1.050n ± 18%  -98.18% (p=0.000 n=10)
geomean              42.81n         0.9059n        -97.88%

By changing Memory.Set32 in go-ethereum to use new WriteToArray32 we can reduce a memory copy which makes OpMstore faster.

The patch

diff --git a/core/vm/memory.go b/core/vm/memory.go index 1ddd8d1ea..58d6f7383 100644
--- a/core/vm/memory.go
+++ b/core/vm/memory.go
@@ -73,8 +73,7 @@ func (m *Memory) Set32(offset uint64, val *uint256.Int) {
                panic("invalid memory: store empty")
        }
        // Fill in relevant bits
-       b32 := val.Bytes32()
-       copy(m.store[offset:], b32[:])
+       val.WriteToArray32((*[32]byte)(m.store[offset:]))
 }

Benchmark result

goos: linux
goarch: amd64
pkg: github.com/ethereum/go-ethereum/core/vm
cpu: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
           │   old.txt    │               new.txt                │
           │    sec/op    │    sec/op     vs base                │
OpMstore-8   19.40n ± 27%   14.18n ± 17%  -26.91% (p=0.002 n=10)

Make WriteToArray32 and WriteToArray20 use PutUint64 like in Bytes32 and
Bytes20 to remove all branches and increase the number of bytes per 1
load/store to 8 bytes.

goos: linux
goarch: amd64
pkg: github.com/holiman/uint256
cpu: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
                 │    old.txt     │                new.txt                │
                 │     sec/op     │    sec/op      vs base                │
WriteToArray20-8   31.8000n ± 15%   0.7819n ± 17%  -97.54% (p=0.000 n=10)
WriteToArray32-8    57.640n ± 22%    1.050n ± 18%  -98.18% (p=0.000 n=10)
geomean              42.81n         0.9059n        -97.88%

By changing Memory.Set32 in go-ethereum to use new WriteToArray32 we can reduce
a memory copy which makes OpMstore faster.

The patch

diff --git a/core/vm/memory.go b/core/vm/memory.go
index 1ddd8d1ea..58d6f7383 100644
--- a/core/vm/memory.go
+++ b/core/vm/memory.go
@@ -73,8 +73,7 @@ func (m *Memory) Set32(offset uint64, val *uint256.Int) {
                panic("invalid memory: store empty")
        }
        // Fill in relevant bits
-       b32 := val.Bytes32()
-       copy(m.store[offset:], b32[:])
+       val.WriteToArray32((*[32]byte)(m.store[offset:]))
 }

Benchmark result

goos: linux
goarch: amd64
pkg: github.com/ethereum/go-ethereum/core/vm
cpu: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
           │   old.txt    │               new.txt                │
           │    sec/op    │    sec/op     vs base                │
OpMstore-8   19.40n ± 27%   14.18n ± 17%  -26.91% (p=0.002 n=10)
@minh-bq minh-bq force-pushed the optimize-writetoarray branch from 8dedb60 to 412a3be Compare November 29, 2024 08:38
Copy link

codecov bot commented Nov 29, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 100.00%. Comparing base (555918b) to head (bba5c08).
Report is 1 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff            @@
##            master      #190   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files            5         5           
  Lines         1666      1675    +9     
=========================================
+ Hits          1666      1675    +9     

Copy link
Owner

@holiman holiman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice, thanks!

@holiman
Copy link
Owner

holiman commented Nov 29, 2024

Nice work!
I wasn't super-fond of the idea of doing val.WriteToArray32((*[32]byte)(m.store[offset:])), so instead I added a method in the same vein as the binary PutUint64 - methods: PutUint256 which just writes it into a slice, and it's up to the caller to ensure that there's sufficient size in the destination.

WDYT?

// Note: The dest slice must be at least 32 bytes large, otherwise this
// method will panic. The method WriteToSlice, which is slower, should be used
// if the destination slice is smaller or of unknown size.
func (z *Int) PutUint256(dest []byte) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should add

_ = dest[31]

so that there is only 1 bound check at the beginning, otherwise, before each PutUint64, there will be a bound check.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I couldn't see any tangible benefit on a bench, but it doesn't hurt

@minh-bq
Copy link
Contributor Author

minh-bq commented Nov 29, 2024

I just leave a comment. Overall, that function looks good to me.

@holiman holiman merged commit 439fbd4 into holiman:master Nov 29, 2024
6 of 7 checks passed
@holiman holiman changed the title uint256: optimize WriteToArray uint256: optimize WriteToArray and add PutUint256 Nov 29, 2024
@holiman
Copy link
Owner

holiman commented Nov 29, 2024

It is interesteing how much faster the new/optimized methods are, vs writing to slice:

[user@work uint256]$ go test . -run - -bench BenchmarkWriteTo
goos: linux
goarch: amd64
pkg: github.com/holiman/uint256
cpu: 12th Gen Intel(R) Core(TM) i7-1270P
BenchmarkWriteTo/fixed-20-8             684307134                1.541 ns/op
BenchmarkWriteTo/fixed-32-8             647183281                2.028 ns/op
BenchmarkWriteTo/slice-8                15876942                86.58 ns/op
BenchmarkWriteTo/put256-8               637238838                2.598 ns/op

@minh-bq minh-bq deleted the optimize-writetoarray branch December 2, 2024 03:16
minh-bq added a commit to axieinfinity/ronin that referenced this pull request Dec 19, 2024
…ory (#30868) (#650)

commit ethereum/go-ethereum@5c58612.

Updates geth to use the uint256 v1.3.2, and use faster memory-writer to speed
up MSTORE.

goos: linux
goarch: amd64
pkg: github.com/ethereum/go-ethereum/core/vm
cpu: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
           │   old.txt   │               new.txt               │
           │   sec/op    │   sec/op     vs base                │
OpMstore-8   18.18n ± 8%   12.58n ± 8%  -30.76% (p=0.000 n=10)

Link: holiman/uint256#190

Co-authored-by: Martin HS <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants