Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement faster method for Size() method in Dec and Int #10331

Closed
4 tasks
ValarDragon opened this issue Oct 10, 2021 · 4 comments · Fixed by #16263
Closed
4 tasks

Implement faster method for Size() method in Dec and Int #10331

ValarDragon opened this issue Oct 10, 2021 · 4 comments · Fixed by #16263
Labels
T: Performance Performance improvements

Comments

@ValarDragon
Copy link
Contributor

Summary

Decimal and Int protobuf serialization time is a large part of Osmosis execution time, and I suspect the same is true for all the SDK chains. Currently, this serialization time is doubled, because the Size estimate is computed by marshalling the data.

cosmos-sdk/types/int.go

Lines 427 to 430 in 3458f64

func (i *Int) Size() int {
bz, _ := i.Marshal()
return len(bz)
}

There is a size estimate method in the internal lib, thats at most off by one. It should be investigated to see if we can use something simple & direct here.
https://cs.opensource.google/go/go/+/refs/tags/go1.17.2:src/math/big/natconv.go;l=276;drc=refs%2Ftags%2Fgo1.17.2

This is much simpler if we switch to using proper binary marshalling for Int's and Dec's in proto.

Problem Definition

Speed up proto serialization for structs across the board. Significantly helps in reducing allocations per marshal, epoch time / init genesis / upgrade time, querying time, and transactions that interact with staking.

Proposal

Either:

  • Make a decision to use binary marshalling of Ints/Dec's and figure out migration plan (cc fix: dec marshal/unmarshal proto #10280 )
  • Add a better Size() method for a 2x speedup to serialization of Ints & Dec's

For Admin Use

  • Not duplicate issue
  • Appropriate labels applied
  • Appropriate contributors tagged
  • Contributor assigned/self-assigned
@catShaark
Copy link
Contributor

catShaark commented Aug 17, 2022

@ValarDragon, is this still relevant ? I was thinking of having a map of 10, 100, 1000,...., 1e77 in math.Int so that we can compare with the Int we're calling Size() on, to handle the off by one problem. So for example, with the formula you linked above we calculate that an certain Int is either 9 or 8 decimal digits ( we don't know if it's off by 1 or not ), we can compare that int with 1e9 we saved on the map.

@odeke-em
Copy link
Collaborator

odeke-em commented May 11, 2023

@ValarDragon should we still work on this? I can confirm the Mathematics with one tweak that we need to encounter for the closest value of the base that fits within the bits so essentially after

        res := float64(i.i.BitLen()) / log2Of10
        ires := int(res)
        if res-float64(ires) > 0 { // There are other digits past the bitMask in the bits available hence add 1
                size += 1
        }

so the method could look like this

var maxUint64 = new(big.Int).SetUint64(uint64(stdmath.MaxUint64))
var log2Of10 = stdmath.Log2(10)

func (i *Int) Size() (size int) {
        if i.IsZero() {
                // To avoid log10(0) which is undefined.
                return 1
        }
        if i.IsNegative() {
                size += 1
        }

        // 1. For the common case, check if it is less than maxUint64
        if i.i.Cmp(maxUint64) <= 0 {
                return size + 1 + int(stdmath.Log10(float64(i.i.Uint64())))
        }

        // For other cases find the number of times that the value fits within
        // the max bitMask of bitLen but on the boundary of 9999*
        res := float64(i.i.BitLen()) / log2Of10
        ires := int(res)
        if res-float64(ires) > 0 { // There are other digits past the bitMask in the bits available hence add 1
                size += 1
        }
        return size + ires
}

and the performance win is there

 benchstat before.txt after.txt 
name       old time/op    new time/op    delta
IntSize-8    5.74µs ± 3%    3.94µs ± 1%  -31.33%  (p=0.008 n=5+5)

name       old alloc/op   new alloc/op   delta
IntSize-8    1.74kB ± 0%    1.14kB ± 0%  -34.10%  (p=0.008 n=5+5)

name       old allocs/op  new allocs/op  delta
IntSize-8      59.0 ± 0%      41.0 ± 0%  -30.51%  (p=0.008 n=5+5)

/cc @elias-orijtech

@elias-orijtech
Copy link
Contributor

Good analysis. Two drive-by comments while I'm looking through issues marked performance.

First, I'm surprised the Log10 case,

        // 1. For the common case, check if it is less than maxUint64
        if i.i.Cmp(maxUint64) <= 0 {
                return size + 1 + int(stdmath.Log10(float64(i.i.Uint64())))
        }

is faster than the generic BitLen/log2Of10 path.

Second, I'm always skeptical of using floats for integer problems. Can

        res := float64(i.i.BitLen()) / log2Of10
        ires := int(res)
        if res-float64(ires) > 0 { // There are other digits past the bitMask in the bits available hence add 1
                size += 1
        }

be changed to something like

        bits := i.i.BitLen() // Perhaps uint(i.i.BitLen()) may help the compiler to determine that the result is always positive.
        res := bits / log2Of10
        mod := bits % log2Of10
        if mod > 0 { // There are other digits past the bitMask in the bits available hence add 1
                size += 1
        }

? I believe the division and modulus is done in one machine instruction, but haven't checked.

@ValarDragon
Copy link
Contributor Author

Didn't check the details of the size estimation, but this is still showing up in benchmarks and would be valuable to lower!

odeke-em added a commit that referenced this issue May 25, 2023
…ytes)

This change computes Int.Size() by checking bit lengths and translating
those to base 10 values. This is instead of firstly invoking .Marshal
to check the length, yet .MarshalTo requires invoking .Size() then
.Marshal which is double work.

The results show improvements:

```shell
$ benchstat before.txt after.txt
name       old time/op    new time/op    delta
IntSize-8    23.9µs ± 3%    20.3µs ± 1%  -15.15%  (p=0.000 n=9+9)

name       old alloc/op   new alloc/op   delta
IntSize-8    6.62kB ± 0%    5.09kB ± 0%  -23.19%  (p=0.000 n=10+10)

name       old allocs/op  new allocs/op  delta
IntSize-8       186 ± 0%       177 ± 0%   -4.84%  (p=0.000 n=10+10)
```

Fixes #10331
odeke-em added a commit that referenced this issue May 27, 2023
…ytes)

This change computes Int.Size() by checking bit lengths and translating
those to base 10 values. This is instead of firstly invoking .Marshal
to check the length, yet .MarshalTo requires invoking .Size() then
.Marshal which is double work.

The results show improvements:

```shell
$ benchstat before.txt after.txt
name       old time/op    new time/op    delta
IntSize-8    23.9µs ± 3%    20.3µs ± 1%  -15.15%  (p=0.000 n=9+9)

name       old alloc/op   new alloc/op   delta
IntSize-8    6.62kB ± 0%    5.09kB ± 0%  -23.19%  (p=0.000 n=10+10)

name       old allocs/op  new allocs/op  delta
IntSize-8       186 ± 0%       177 ± 0%   -4.84%  (p=0.000 n=10+10)
```

Fixes #10331
odeke-em added a commit that referenced this issue May 30, 2023
…ytes)

This change computes Int.Size() by checking bit lengths and translating
those to base 10 values. This is instead of firstly invoking .Marshal
to check the length, yet .MarshalTo requires invoking .Size() then
.Marshal which is double work.

The results show improvements:

```shell
$ benchstat before.txt after.txt
name       old time/op    new time/op    delta
IntSize-8    23.9µs ± 3%    20.3µs ± 1%  -15.15%  (p=0.000 n=9+9)

name       old alloc/op   new alloc/op   delta
IntSize-8    6.62kB ± 0%    5.09kB ± 0%  -23.19%  (p=0.000 n=10+10)

name       old allocs/op  new allocs/op  delta
IntSize-8       186 ± 0%       177 ± 0%   -4.84%  (p=0.000 n=10+10)
```

Fixes #10331
odeke-em added a commit that referenced this issue Jun 6, 2023
…ytes)

This change computes Int.Size() by checking bit lengths and translating
those to base 10 values. This is instead of firstly invoking .Marshal
to check the length, yet .MarshalTo requires invoking .Size() then
.Marshal which is double work.

The results show improvements:

```shell
$ benchstat before.txt after.txt
name       old time/op    new time/op    delta
IntSize-8    23.9µs ± 3%    20.3µs ± 1%  -15.15%  (p=0.000 n=9+9)

name       old alloc/op   new alloc/op   delta
IntSize-8    6.62kB ± 0%    5.09kB ± 0%  -23.19%  (p=0.000 n=10+10)

name       old allocs/op  new allocs/op  delta
IntSize-8       186 ± 0%       177 ± 0%   -4.84%  (p=0.000 n=10+10)
```

Fixes #10331
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T: Performance Performance improvements
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants