Improve memory usage when reaching diff limits #2990

mrexodia · 2017-11-27T03:04:22Z

Related to #2669, also related to go-gitea/git#93

This change introduces a hand-rolled implementation of input.ReadString() that stops reading if the line buffer gets bigger than maxLineCharacters.

I thought the performance of this would be terrible, but actually it appears to be ~~slightly faster~~ slower when timing the ParsePatch function (with real loose timing so might be good to check out).

For commits that are shown completely the memory usage is slightly better, but for commits where files are hidden because they are too big, the memory usage is much, much better (my test was a 100mb one-liner which went from 322mb to 55mb).

The ParsePatch function does not behave completely identical because line is now truncated to whatever the user sets as maximum. I checked all usages a little and it does not appear to matter (diffs are only shown to the user and if IsIncomplete is set the line member is not used), however this needs some attention during review.

Possible follow-up for this is to completely rewrite the diff functions to use things like git diff --numstat to figure out if diffs have to be truncated and which files are part of a diff instead of this ugly parser.

codecov-io · 2017-11-27T03:12:35Z

Codecov Report

Merging #2990 into master will decrease coverage by 0.02%.
The diff coverage is 66.66%.

@@            Coverage Diff             @@
##           master    #2990      +/-   ##
==========================================
- Coverage   33.04%   33.01%   -0.03%     
==========================================
  Files         269      269              
  Lines       39484    39492       +8     
==========================================
- Hits        13047    13039       -8     
- Misses      24584    24603      +19     
+ Partials     1853     1850       -3

Impacted Files	Coverage Δ
models/git_diff.go	`60.16% <66.66%> (+0.05%)`	⬆️
models/repo_indexer.go	`45.54% <0%> (-6.44%)`	⬇️
modules/indexer/repo.go	`60.86% <0%> (-2.61%)`	⬇️
modules/avatar/avatar.go	`100% <0%> (+18.75%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d39b88a...2ade5b3. Read the comment docs.

mrexodia · 2017-11-27T15:47:56Z

I will refactor this to use io.LimitReader instead...

mrexodia · 2017-11-27T21:18:19Z

Okay, I'm not going to rewrite this with io.LimitReader because weird things start to happen...

mrexodia · 2017-11-28T01:36:50Z

I fixed an issue where it would hard crash when trying to parse https://try.gitea.io/mrexodia/DarkSouls3.TextViewer/commit/629cf9b3d6b295bbcddf76d1f6167259b764d9dc

lunny · 2017-11-28T01:47:42Z

LGTM

ethantkoenig · 2017-11-28T04:05:13Z

models/git_diff.go

-			} else {
-				return nil, fmt.Errorf("ReadString: %v", err)
+		var linebuf bytes.Buffer
+		for {


From what I can tell, we only break from this loop when we reach an EOF or new-line. Should we break once linebuf.Len() >= maxLineCharacters? Otherwise it's not clear to me how this helps memory usage.

Never mind

@ethantkoenig I think it's right. See line 272.

Line 272 sets curFile.IsIncomplete to true, but the loop will still run for more iterations.

Yes, so that, we could find the next line.

Yeah, first I used break to stop reading, but that would cause https://try.gitea.io/mrexodia/DarkSouls3.TextViewer/commit/629cf9b3d6b295bbcddf76d1f6167259b764d9dc to crash because it would think the next diff line was at the next character.

ethantkoenig

Looks good, I just have one minor suggestion. Since the ParsePatch(..) function is already quite long, could we move the newly-added code to a helper function? Something like

func ReadLineWithMaxLength(reader io.Reader, maxLen int) (string, error) {
...
}

mrexodia · 2017-11-28T13:00:34Z

I personally don't mind adding it to a function, but it would become rather ugly:

func ReadLineWithMaxLength(reader io.Reader, maxLineCharacters int) (line string, isEof bool, isTruncated bool) {
...
}

I could read maxLineCharacters + 1 and leave the original check at 298 but that feels a bit hacky...

Signed-off-by: Duncan Ogilvie <[email protected]>

ethantkoenig · 2017-11-28T18:10:44Z

Fair enough, LGTM

lafriks · 2017-11-28T20:57:37Z

@lunny why backport for this?

tboerger added the lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. label Nov 27, 2017

lunny added the type/enhancement An improvement of existing functionality label Nov 27, 2017

lunny added this to the 1.x.x milestone Nov 27, 2017

mrexodia force-pushed the diff-memory-usage branch from 7441a6e to 46d987a Compare November 27, 2017 18:43

This was referenced Nov 27, 2017

Introduce Blob.DataAsync go-gitea/git#94

Merged

cannot allocate memory after opening truncated diff (possible memory leak) #2669

Closed

mrexodia force-pushed the diff-memory-usage branch from 46d987a to 3990d7d Compare November 28, 2017 01:35

tboerger added lgtm/need 1 This PR needs approval from one additional maintainer to be merged. and removed lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. labels Nov 28, 2017

lunny modified the milestones: 1.x.x, 1.4.0 Nov 28, 2017

lunny added the backport/v1.3 label Nov 28, 2017

ethantkoenig reviewed Nov 28, 2017

View reviewed changes

ethantkoenig approved these changes Nov 28, 2017

View reviewed changes

Improve memory usage when reaching diff limits

1031bc2

Signed-off-by: Duncan Ogilvie <[email protected]>

mrexodia force-pushed the diff-memory-usage branch from 3990d7d to 1031bc2 Compare November 28, 2017 13:14

tboerger added lgtm/done This PR has enough approvals to get merged. There are no important open reservations anymore. and removed lgtm/need 1 This PR needs approval from one additional maintainer to be merged. labels Nov 28, 2017

lafriks removed the backport/v1.3 label Nov 28, 2017

lafriks added 2 commits November 28, 2017 23:01

Merge branch 'master' into diff-memory-usage

520c5e8

Merge branch 'master' into diff-memory-usage

2ade5b3

lafriks merged commit c80d147 into go-gitea:master Nov 28, 2017

mrexodia deleted the diff-memory-usage branch November 29, 2017 00:04

go-gitea locked and limited conversation to collaborators Nov 23, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve memory usage when reaching diff limits #2990

Improve memory usage when reaching diff limits #2990

mrexodia commented Nov 27, 2017 •

edited

Loading

codecov-io commented Nov 27, 2017 •

edited

Loading

mrexodia commented Nov 27, 2017

mrexodia commented Nov 27, 2017

mrexodia commented Nov 28, 2017

lunny commented Nov 28, 2017

ethantkoenig Nov 28, 2017 •

edited

Loading

lunny Nov 28, 2017

ethantkoenig Nov 28, 2017

lunny Nov 28, 2017

mrexodia Nov 28, 2017

ethantkoenig left a comment

mrexodia commented Nov 28, 2017

ethantkoenig commented Nov 28, 2017

lafriks commented Nov 28, 2017

Improve memory usage when reaching diff limits #2990

Improve memory usage when reaching diff limits #2990

Conversation

mrexodia commented Nov 27, 2017 • edited Loading

codecov-io commented Nov 27, 2017 • edited Loading

Codecov Report

mrexodia commented Nov 27, 2017

mrexodia commented Nov 27, 2017

mrexodia commented Nov 28, 2017

lunny commented Nov 28, 2017

ethantkoenig Nov 28, 2017 • edited Loading

Choose a reason for hiding this comment

lunny Nov 28, 2017

Choose a reason for hiding this comment

ethantkoenig Nov 28, 2017

Choose a reason for hiding this comment

lunny Nov 28, 2017

Choose a reason for hiding this comment

mrexodia Nov 28, 2017

Choose a reason for hiding this comment

ethantkoenig left a comment

Choose a reason for hiding this comment

mrexodia commented Nov 28, 2017

ethantkoenig commented Nov 28, 2017

lafriks commented Nov 28, 2017

mrexodia commented Nov 27, 2017 •

edited

Loading

codecov-io commented Nov 27, 2017 •

edited

Loading

ethantkoenig Nov 28, 2017 •

edited

Loading