Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance: store path in stack to avoid copying context #438

Closed
wants to merge 1 commit into from

Conversation

yoelsusanto
Copy link

Fixes #325

This PR is created to dramatically improve performance for go-yaml library especially when parsing larger YAML files.

This is the methodology that I used to arrive at the current solution:

  1. I use go test tool to create a CPU profile when doing parser.ParseBytes.
  2. When I analyzed the CPUProfile, I found that withChild and withIndex are the two functions that dominated the CPU utilization.
  3. Reading the source code, I found that both functions call parser.copy. This operation is performed repeatedly when parsing the YAML tokens.
  4. Taking a deeper look, the essence of this copy operation is to pass the correct node path when parsing child node.
  5. We can achieve the same result by managing a stack. We can push when parsing child node and pop after we finish.
  6. This way, we improve performance by avoiding excessive memory allocation caused by copying the context.tokens.

I realized the implementation can be refactored further to adjust the way of using context, but I aim for minimum change in this PR. Please let me know if you have any suggestions 🙏 . Thank you!

Checklist:

  • Describe the purpose for which you created this PR.
  • Create test code that corresponds to the modification

@codecov-commenter
Copy link

Codecov Report

Merging #438 (2018c1a) into master (4653a1b) will decrease coverage by 0.09%.
The diff coverage is 96.96%.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #438      +/-   ##
==========================================
- Coverage   76.02%   75.94%   -0.09%     
==========================================
  Files          13       13              
  Lines        4692     4701       +9     
==========================================
+ Hits         3567     3570       +3     
- Misses        866      870       +4     
- Partials      259      261       +2     

@yoelsusanto
Copy link
Author

Benchmark

package main

import (
	"bytes"
	"fmt"
	"testing"

	"github.com/goccy/go-yaml"
	goyaml2 "gopkg.in/yaml.v2"
	goyaml3 "gopkg.in/yaml.v3"
)

func prepareSampleData(line int) []byte {
	var data bytes.Buffer

	for i := 0; i < line/5; i++ {
		data.WriteString(`- propertyA: valueA
  propertyB: 2
  propertyC: test
  propertyD: example
  propertyE: example2
`)
	}

	return data.Bytes()
}

type sampleData struct {
	PropertyA string `yaml:"propertyA"`
	PropertyB string `yaml:"propertyB"`
	PropertyC string `yaml:"propertyC"`
	PropertyD string `yaml:"propertyD"`
	PropertyE string `yaml:"propertyE"`
}

func BenchmarkGoccyYAML(b *testing.B) {
	lines := []int{100, 1000, 10000}

	for _, line := range lines {
		sample := prepareSampleData(line)

		b.Run(fmt.Sprintf("%d lines", line), func(b *testing.B) {
			b.Run("github.com/goccy/go-yaml", func(b *testing.B) {
				var datas []sampleData

				for i := 0; i < b.N; i++ {
					err := yaml.Unmarshal(sample, &datas)
					if err != nil {
						b.Fatal(err)
					}
				}
			})

			b.Run("gopkg.in/yaml.v2", func(b *testing.B) {
				var datas []sampleData

				for i := 0; i < b.N; i++ {
					err := goyaml2.Unmarshal(sample, &datas)
					if err != nil {
						b.Fatal(err)
					}
				}
			})

			b.Run("gopkg.in/yaml.v3", func(b *testing.B) {
				var datas []sampleData

				for i := 0; i < b.N; i++ {
					err := goyaml3.Unmarshal(sample, &datas)
					if err != nil {
						b.Fatal(err)
					}
				}
			})
		})
	}
}

To compare before and after the fix, I have to use replace github.com/goccy/go-yaml => github.com/yoelsusanto/go-yaml v0.0.0-20240324162521-2018c1ab915b directive.

Result:

goos: darwin
goarch: arm64
pkg: github.com/yoelsusanto/benchmark-go-yaml
                                                  │ original.txt  │             fixed.txt              │
                                                  │    sec/op     │   sec/op     vs base               │
GoccyYAML/100_lines/github.com/goccy/go-yaml-12       411.7µ ± 1%   215.4µ ± 7%  -47.68% (p=0.002 n=6)
GoccyYAML/100_lines/gopkg.in/yaml.v2-12               69.28µ ± 0%   69.21µ ± 1%        ~ (p=0.240 n=6)
GoccyYAML/100_lines/gopkg.in/yaml.v3-12               91.12µ ± 0%   92.08µ ± 1%   +1.05% (p=0.004 n=6)
GoccyYAML/1000_lines/github.com/goccy/go-yaml-12     29.976m ± 3%   2.328m ± 1%  -92.23% (p=0.002 n=6)
GoccyYAML/1000_lines/gopkg.in/yaml.v2-12              697.8µ ± 0%   691.1µ ± 0%   -0.95% (p=0.002 n=6)
GoccyYAML/1000_lines/gopkg.in/yaml.v3-12              910.7µ ± 1%   919.5µ ± 1%        ~ (p=0.065 n=6)
GoccyYAML/10000_lines/github.com/goccy/go-yaml-12   3541.11m ± 1%   24.41m ± 1%  -99.31% (p=0.002 n=6)
GoccyYAML/10000_lines/gopkg.in/yaml.v2-12             7.591m ± 1%   7.596m ± 1%        ~ (p=0.937 n=6)
GoccyYAML/10000_lines/gopkg.in/yaml.v3-12             9.754m ± 1%   9.882m ± 1%   +1.31% (p=0.002 n=6)
geomean                                               2.864m        1.157m       -59.60%

                                                  │  original.txt   │               fixed.txt               │
                                                  │      B/op       │     B/op      vs base                 │
GoccyYAML/100_lines/github.com/goccy/go-yaml-12       1585.8Ki ± 0%   187.2Ki ± 0%  -88.19% (p=0.002 n=6)
GoccyYAML/100_lines/gopkg.in/yaml.v2-12                49.48Ki ± 0%   49.48Ki ± 0%        ~ (p=1.000 n=6) ¹
GoccyYAML/100_lines/gopkg.in/yaml.v3-12                64.36Ki ± 0%   64.36Ki ± 0%        ~ (p=1.000 n=6) ¹
GoccyYAML/1000_lines/github.com/goccy/go-yaml-12     137.455Mi ± 0%   1.828Mi ± 0%  -98.67% (p=0.002 n=6)
GoccyYAML/1000_lines/gopkg.in/yaml.v2-12               449.7Ki ± 0%   449.7Ki ± 0%        ~ (p=0.364 n=6)
GoccyYAML/1000_lines/gopkg.in/yaml.v3-12               585.1Ki ± 0%   585.1Ki ± 0%        ~ (p=0.545 n=6)
GoccyYAML/10000_lines/github.com/goccy/go-yaml-12   13025.59Mi ± 0%   19.00Mi ± 0%  -99.85% (p=0.002 n=6)
GoccyYAML/10000_lines/gopkg.in/yaml.v2-12              4.369Mi ± 0%   4.369Mi ± 0%        ~ (p=0.069 n=6)
GoccyYAML/10000_lines/gopkg.in/yaml.v3-12              5.677Mi ± 0%   5.677Mi ± 0%        ~ (p=0.498 n=6)
geomean                                                3.345Mi        809.1Ki       -76.38%
¹ all samples are equal

                                                  │ original.txt │              fixed.txt               │
                                                  │  allocs/op   │  allocs/op   vs base                 │
GoccyYAML/100_lines/github.com/goccy/go-yaml-12      6.381k ± 0%   5.338k ± 0%  -16.35% (p=0.002 n=6)
GoccyYAML/100_lines/gopkg.in/yaml.v2-12              1.037k ± 0%   1.037k ± 0%        ~ (p=1.000 n=6) ¹
GoccyYAML/100_lines/gopkg.in/yaml.v3-12              1.258k ± 0%   1.258k ± 0%        ~ (p=1.000 n=6) ¹
GoccyYAML/1000_lines/github.com/goccy/go-yaml-12     63.73k ± 0%   52.88k ± 0%  -17.02% (p=0.002 n=6)
GoccyYAML/1000_lines/gopkg.in/yaml.v2-12             10.04k ± 0%   10.04k ± 0%        ~ (p=1.000 n=6) ¹
GoccyYAML/1000_lines/gopkg.in/yaml.v3-12             12.24k ± 0%   12.24k ± 0%        ~ (p=1.000 n=6) ¹
GoccyYAML/10000_lines/github.com/goccy/go-yaml-12    649.2k ± 0%   529.9k ± 0%  -18.38% (p=0.002 n=6)
GoccyYAML/10000_lines/gopkg.in/yaml.v2-12            100.0k ± 0%   100.0k ± 0%        ~ (p=1.000 n=6) ¹
GoccyYAML/10000_lines/gopkg.in/yaml.v3-12            122.0k ± 0%   122.0k ± 0%        ~ (p=1.000 n=6) ¹
geomean                                              20.02k        18.80k        -6.12%
¹ all samples are equal

From the results, we can see:

  • Lower CPU usage
  • Lower number of allocations
  • Lower number of bytes per operation

Let me know if you need anything else 🙏

@mcdee
Copy link

mcdee commented Mar 27, 2024

This looks like a good change to me. Comparing with 1.11.3 using a complex ~6MB sample:

goos: linux
goarch: amd64
pkg: github.com/wealdtech/yamltest
cpu: 12th Gen Intel(R) Core(TM) i7-1265U
                 │     1.11.3     │              proposed              │
                 │     sec/op     │   sec/op     vs base               │
Decode/Decode-12   174099.6m ± 1%   798.8m ± 2%  -99.54% (p=0.002 n=6)

                 │     1.11.3      │              proposed               │
                 │      B/op       │     B/op      vs base               │
Decode/Decode-12   257761.9Mi ± 0%   494.6Mi ± 0%  -99.81% (p=0.002 n=6)

                 │   1.11.3    │             proposed              │
                 │  allocs/op  │  allocs/op   vs base              │
Decode/Decode-12   4.989M ± 0%   4.590M ± 0%  -8.00% (p=0.002 n=6)

And it looks okay compared to the last good version, 1.9.2:

goos: linux
goarch: amd64
pkg: github.com/wealdtech/yamltest
cpu: 12th Gen Intel(R) Core(TM) i7-1265U
                 │    1.9.2    │             proposed              │
                 │   sec/op    │   sec/op     vs base              │
Decode/Decode-12   750.5m ± 1%   798.8m ± 2%  +6.43% (p=0.002 n=6)

                 │    1.9.2     │              proposed              │
                 │     B/op     │     B/op      vs base              │
Decode/Decode-12   486.8Mi ± 0%   494.6Mi ± 0%  +1.60% (p=0.002 n=6)

                 │    1.9.2    │             proposed              │
                 │  allocs/op  │  allocs/op   vs base              │
Decode/Decode-12   4.181M ± 0%   4.590M ± 0%  +9.77% (p=0.002 n=6)

@fzipi
Copy link

fzipi commented Apr 7, 2024

Looks good to us also. See benchmarks in the related issue.

@yoelsusanto
Copy link
Author

@goccy would you mind to have a look on the PR? 🙏 Thank you!

charithe added a commit to charithe/cerbos that referenced this pull request May 2, 2024
Update our fork with the performance patch from
goccy/go-yaml#438.

```
goos: linux
goarch: amd64
pkg: github.com/cerbos/cerbos/internal/parser
cpu: 13th Gen Intel(R) Core(TM) i7-1360P
                 │ parser_before.txt │          parser_after.txt           │
                 │      sec/op       │   sec/op     vs base                │
Unmarshal/10-16         3399.0µ ± 3%   847.2µ ± 4%  -75.08% (p=0.000 n=10)
Unmarshal/50-16         50.093m ± 7%   3.847m ± 2%  -92.32% (p=0.000 n=10)
Unmarshal/100-16       191.559m ± 6%   7.504m ± 3%  -96.08% (p=0.000 n=10)
Unmarshal/500-16       4674.56m ± 3%   35.09m ± 4%  -99.25% (p=0.000 n=10)
geomean                  111.1m        5.412m       -95.13%

                 │ parser_before.txt │             parser_after.txt              │
                 │        B/s        │      B/s        vs base                   │
Unmarshal/10-16         1.030Mi ± 3%     4.134Mi ± 4%    +301.39% (p=0.000 n=10)
Unmarshal/50-16         341.8Ki ± 6%    4423.8Ki ± 2%   +1194.29% (p=0.000 n=10)
Unmarshal/100-16        175.8Ki ± 6%    4506.8Ki ± 3%   +2463.89% (p=0.000 n=10)
Unmarshal/500-16        39.06Ki ± 0%   4794.92Ki ± 4%  +12175.00% (p=0.000 n=10)
geomean                 223.1Ki          4.380Mi        +1910.85%

                 │ parser_before.txt │           parser_after.txt           │
                 │       B/op        │     B/op      vs base                │
Unmarshal/10-16        4258.8Ki ± 0%   384.6Ki ± 0%  -90.97% (p=0.000 n=10)
Unmarshal/50-16        91.248Mi ± 0%   1.813Mi ± 0%  -98.01% (p=0.000 n=10)
Unmarshal/100-16      358.297Mi ± 0%   3.596Mi ± 0%  -99.00% (p=0.000 n=10)
Unmarshal/500-16      7944.51Mi ± 0%   18.98Mi ± 0%  -99.76% (p=0.000 n=10)
geomean                 181.3Mi        2.611Mi       -98.56%

                 │ parser_before.txt │          parser_after.txt           │
                 │     allocs/op     │  allocs/op   vs base                │
Unmarshal/10-16         10.595k ± 0%   8.978k ± 0%  -15.26% (p=0.000 n=10)
Unmarshal/50-16          50.64k ± 0%   42.93k ± 0%  -15.23% (p=0.000 n=10)
Unmarshal/100-16        100.83k ± 0%   85.38k ± 0%  -15.32% (p=0.000 n=10)
Unmarshal/500-16         505.9k ± 0%   425.6k ± 0%  -15.87% (p=0.000 n=10)
geomean                  72.33k        61.17k       -15.42%
```

Signed-off-by: Charith Ellawala <[email protected]>
charithe added a commit to cerbos/cerbos that referenced this pull request May 2, 2024
Update our fork with the performance patch from
goccy/go-yaml#438.

```
goos: linux
goarch: amd64
pkg: github.com/cerbos/cerbos/internal/parser
cpu: 13th Gen Intel(R) Core(TM) i7-1360P
                 │ parser_before.txt │          parser_after.txt           │
                 │      sec/op       │   sec/op     vs base                │
Unmarshal/10-16         3399.0µ ± 3%   847.2µ ± 4%  -75.08% (p=0.000 n=10)
Unmarshal/50-16         50.093m ± 7%   3.847m ± 2%  -92.32% (p=0.000 n=10)
Unmarshal/100-16       191.559m ± 6%   7.504m ± 3%  -96.08% (p=0.000 n=10)
Unmarshal/500-16       4674.56m ± 3%   35.09m ± 4%  -99.25% (p=0.000 n=10)
geomean                  111.1m        5.412m       -95.13%

                 │ parser_before.txt │             parser_after.txt              │
                 │        B/s        │      B/s        vs base                   │
Unmarshal/10-16         1.030Mi ± 3%     4.134Mi ± 4%    +301.39% (p=0.000 n=10)
Unmarshal/50-16         341.8Ki ± 6%    4423.8Ki ± 2%   +1194.29% (p=0.000 n=10)
Unmarshal/100-16        175.8Ki ± 6%    4506.8Ki ± 3%   +2463.89% (p=0.000 n=10)
Unmarshal/500-16        39.06Ki ± 0%   4794.92Ki ± 4%  +12175.00% (p=0.000 n=10)
geomean                 223.1Ki          4.380Mi        +1910.85%

                 │ parser_before.txt │           parser_after.txt           │
                 │       B/op        │     B/op      vs base                │
Unmarshal/10-16        4258.8Ki ± 0%   384.6Ki ± 0%  -90.97% (p=0.000 n=10)
Unmarshal/50-16        91.248Mi ± 0%   1.813Mi ± 0%  -98.01% (p=0.000 n=10)
Unmarshal/100-16      358.297Mi ± 0%   3.596Mi ± 0%  -99.00% (p=0.000 n=10)
Unmarshal/500-16      7944.51Mi ± 0%   18.98Mi ± 0%  -99.76% (p=0.000 n=10)
geomean                 181.3Mi        2.611Mi       -98.56%

                 │ parser_before.txt │          parser_after.txt           │
                 │     allocs/op     │  allocs/op   vs base                │
Unmarshal/10-16         10.595k ± 0%   8.978k ± 0%  -15.26% (p=0.000 n=10)
Unmarshal/50-16          50.64k ± 0%   42.93k ± 0%  -15.23% (p=0.000 n=10)
Unmarshal/100-16        100.83k ± 0%   85.38k ± 0%  -15.32% (p=0.000 n=10)
Unmarshal/500-16         505.9k ± 0%   425.6k ± 0%  -15.87% (p=0.000 n=10)
geomean                  72.33k        61.17k       -15.42%
```

Signed-off-by: Charith Ellawala <[email protected]>

---------

Signed-off-by: Charith Ellawala <[email protected]>
@fzipi
Copy link

fzipi commented Sep 22, 2024

@goccy can you please take a look at this PR?

@goccy goccy mentioned this pull request Oct 31, 2024
@goccy
Copy link
Owner

goccy commented Oct 31, 2024

@yoelsusanto Thank you for your contribution. Also, Sorry for the late reply.
I'd like to improve the process of pop from the stack, so I created a PR on my side #494

@goccy goccy closed this Oct 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

There is a bad performance hit when upgrading to >= 1.9.2
5 participants