Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/cue: serious performance regression #2243

Closed
nxcc opened this issue Feb 3, 2023 · 5 comments
Closed

cmd/cue: serious performance regression #2243

nxcc opened this issue Feb 3, 2023 · 5 comments
Assignees
Labels
evalv3-win Issues resolved by switching from evalv2 to evalv3 NeedsInvestigation v0.5 hangover Serious bugs that did not get fixed in v0.5.0

Comments

@nxcc
Copy link

nxcc commented Feb 3, 2023

What version of CUE are you using (cue version)?

cue version v0.5.0-beta.5

go version go1.19.3
       -compiler gc
       -trimpath true
     CGO_ENABLED 0
          GOARCH amd64
            GOOS linux
         GOAMD64 v1

Does this issue reproduce with the latest release?

yes

What did you do?

cue eval ./testcase (see testcase.zip)

What did you expect to see?

the expected output after a fraction of a second (<0.2s), like with cue v0.5.0-beta2

What did you see instead?

the expected output after more than 12 seconds

@nxcc nxcc added NeedsInvestigation Triage Requires triage/attention labels Feb 3, 2023
@nxcc nxcc changed the title massive performance regression using the api massive performance regression Feb 3, 2023
@nxcc nxcc changed the title massive performance regression cmd/cue: massive performance regression Feb 3, 2023
@nxcc nxcc changed the title cmd/cue: massive performance regression cmd/cue: serious performance regression Feb 3, 2023
@mvdan
Copy link
Member

mvdan commented Feb 3, 2023

Thanks for reporting. v0.5.0-beta.2 is roughly as fast as v0.4.3, so it must have been a recent change that was included in v0.5.0-beta.5.

I bisected between the two, which pointed me at https://review.gerrithub.io/c/cue-lang/cue/+/549247. Not entirely surprising:

This change more aggresively early evaluates conjuncts to prevent them from being missed if a node is in finalization mode. This fix most likely will not cover all cases, but hopefully enough to hold over until v0.6.

I imagine the more aggressive early evaluation might be causing worse performance in your case.

@mvdan mvdan removed the Triage Requires triage/attention label Feb 3, 2023
@mvdan mvdan added this to the v0.5.0 comprehension rework milestone Feb 3, 2023
@mvdan
Copy link
Member

mvdan commented Feb 3, 2023

Reduced it down by quite a bit:

import (
	"strings"
	"crypto/sha512"
	"crypto/sha256"
	"crypto/md5"
	"encoding/base64"
	"encoding/hex"
	"regexp"
	"list"
)

secret: this={
	(#metadataName & {data: this.data}).out
	... // evalv2 did not correctly point out a closedness issue here
}

#metadataName: {
	DATA=data: {[string]: string | bytes, ...}
	out: metadata: name: "-" + (#shortHash & {data: DATA}).out
}

#shortHash: {
	data: {[string]: string | bytes, ...}

	_hashLen:       6
	_sortedKeys:    list.SortStrings([ for k, _ in data {k}])
	_joinedKV:      strings.Join([ for _, k in _sortedKeys {data[k]}], "\n")
	_joinedKVBytes: '\(_joinedKV)'
	_md5:           md5.Sum(_joinedKVBytes)
	_sha256:        sha256.Sum256(_joinedKVBytes)
	_sha512:        sha512.Sum512(_joinedKVBytes)
	_base64:        base64.Encode(null, _sha512+_sha256+_md5)
	_base36:        regexp.ReplaceAll("[^a-z0-9]", _base64, "")
	_fragments:     strings.SplitN( _base36, "", _hashLen+1)
	_truncated:     strings.Join(_fragments[0:_hashLen], "")

	out: _truncated
}

secret: data: {
	d1: _
	d2: _
	d3: _
	d4: _
	d5: _
}

With this shorter config, v0.5.0-beta.5 takes over a second, and v0.4.3 barely takes ten milliseconds.

$ cue version
cue version v0.0.0-20230202180031-576d0e461a99

go version devel go1.21-88a36c9e9a Thu Feb 2 20:23:27 2023 +0000
      -buildmode exe
       -compiler gc
     CGO_ENABLED 1
          GOARCH amd64
            GOOS linux
         GOAMD64 v3
             vcs git
    vcs.revision 576d0e461a990ddcdab9da7a10375a1c6d87a865
        vcs.time 2023-02-02T18:00:31Z
    vcs.modified false
$ time cue eval new.cue >stdout

real	0m1.281s
user	0m1.606s
sys	0m0.040s
$ time cue-v0.4.3 eval new.cue >stdout-stable

real	0m0.018s
user	0m0.018s
sys	0m0.009s
$ diff -u stdout-stable stdout
$

@mvdan
Copy link
Member

mvdan commented Feb 7, 2023

I believe @tmm1 is running into the same performance regression; cue export test.cue in https://github.com/tmm1/taxes.cue takes about five seconds on v0.4.3, but appears to spin CPU for a very long time on v0.5.0-beta.5. I ran it for multiple minutes and gave up; it uses one full CPU while it runs.

Worth noting that beta.2 fails, partially because of #2246, and it's unclear if the other errors are caused by 2246 as well. But at least the export finishes in about ten seconds instead of taking a long time.

Small reproducer:

import "list"

out: #qualifiedDividendsAndCapitalGainTax | *"incomplete"

#qualifiedDividendsAndCapitalGainTax: {
	_in: {
		_f1040: _
		_filingStatus: _f1040.filingStatus
		_form1040: {
			l3a: _f1040.qualifiedDividends
			l15: _f1040.taxableIncome
			l7:  _f1040.capitalGainOrLoss
		}
	}
	out: _sheet.l17
	_sheet: {
		l1:  _in._form1040.l15
		l2:  _in._form1040.l3a
		l3:  _in._form1040.l7
		l4:  l2 + l3
		l5:  list.Max([0, l1 - l4])
		l6:  40_400
		l7:  list.Min([l1, l6])
		l8:  list.Min([l5, l7])
		l9:  l7 - l8
		l10: list.Min([l1, l4])
		l11: l9
		l12: l10 - l11
		l13: 445_850
		l14: list.Min([l1, l13])
		l15: l5 + l9
		l16: list.Max([0, l14 - l15])
		l17: list.Min([l12, l16])
	}
}

That smaller reproducer takes 0.08s on v0.4.3, 0.25s on v0.5.0-beta.1, 0.30s on v0.5.0-beta.2, and 4.63s on v0.5.0-beta5. CUE master (02e19c8) also takes nearly five seconds. I believe it is the same performance regression as the one reported here, because both regressed between beta2 and beta5, and both involve multiple levels of fields which do computation on top of each other, e.g. strings.Join or list.Min.

I bisected between beta.2 and beta.5 and the wall time jumps from 0.3s to 4.5s with https://review.gerrithub.io/c/cue-lang/cue/+/549247, which further convinces me that the two users ran into the same performance regression: they both bisected to the same CUE change.

@myitcv
Copy link
Member

myitcv commented Jun 14, 2023

Marking this as v0.7.0. If we were to address this by backing out the change that caused the regression, we would likely need an additional fix as part of v0.6.0. As things stand, we don't want to delay v0.6.0 if we can avoid it, and instead prefer to address this along with other performance issues in v0.7.0. This has the added benefit of the fix likely being considerably easier, because of refactors in the evaluation engine as part of v0.7.0 (that are not part of v0.6.0, where the cost of making such changes is higher).

@mvdan mvdan self-assigned this Oct 3, 2024
@mvdan mvdan added the evalv3-win Issues resolved by switching from evalv2 to evalv3 label Oct 3, 2024
@mvdan
Copy link
Member

mvdan commented Oct 9, 2024

With @mpvl we have confirmed that both of the examples above are reasonably fast with CUE_EXPERIMENT=evalv3 as of master (655236e).

First, the example at #2243 (comment), reduced from the code provided by @nxcc:

$ time CUE_EXPERIMENT=evalv3=0 cue eval secret.cue >/dev/null

real	0m0.530s
user	0m0.738s
sys	0m0.030s
$ time CUE_EXPERIMENT=evalv3=1 cue eval secret.cue >/dev/null

real	0m0.019s
user	0m0.011s
sys	0m0.011s

The original code also sees a similar speed-up, after tweaking the top-level secret map to keep the structs open:

$ time CUE_EXPERIMENT=evalv3=0 cue eval . >/dev/null

real	0m8.333s
user	0m9.157s
sys	0m0.195s
$ time CUE_EXPERIMENT=evalv3=1 cue eval . >/dev/null

real	0m0.026s
user	0m0.034s
sys	0m0.006s

As for #2243 (comment), reduced from @tmm1's code:

$ time CUE_EXPERIMENT=evalv3=0 cue eval secret.cue >/dev/null

real	0m2.128s
user	0m2.528s
sys	0m0.033s
$ time CUE_EXPERIMENT=evalv3=1 cue eval secret.cue >/dev/null

real	0m0.018s
user	0m0.018s
sys	0m0.005s

Looking at the original project at https://github.com/tmm1/taxes.cue, and working around the regression bug I filed at #3483, the performance there still isn't great when evaluating the entire package:

$ time CUE_EXPERIMENT=evalv3=0 cue eval .:taxes >/dev/null

real	0m9.807s
user	0m19.758s
sys	0m0.229s
$ time CUE_EXPERIMENT=evalv3=1 cue eval .:taxes >/dev/null

real	0m23.710s
user	1m9.246s
sys	0m0.454s

We are pretty sure that this is caused by increased memory usage; the ~2x wall time corresponds to a ~3x number of Go allocations and peak memory usage. We think that the planned improvements in work reuse and memory allocations (#2854, #2853) will give better results than the old evaluator. The new evaluator is already evaluating fewer unifications and disjunctions than the old one, which is a good sign.

I should note that, on this same taxes.cue project, cue export test.cue continues to take a long time (I stopped after two minutes) on both the old and new evaluators. Here, I again think #2854 will help greatly. We will test the project again once more performance work for the new evaluator has landed.

Since @nxcc's original project is fast now, and @tmm1's performance improvements are already tracked in the issues I listed above, I'm going to close this issue as resolved. Please try the new evaluator on either the latest alpha release or master, and let us know how it works for you :)

@mvdan mvdan closed this as completed Oct 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
evalv3-win Issues resolved by switching from evalv2 to evalv3 NeedsInvestigation v0.5 hangover Serious bugs that did not get fixed in v0.5.0
Projects
None yet
Development

No branches or pull requests

3 participants