Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve memory usage when reaching diff limits #2990

Merged
merged 3 commits into from
Nov 28, 2017

Conversation

mrexodia
Copy link
Contributor

@mrexodia mrexodia commented Nov 27, 2017

Related to #2669, also related to go-gitea/git#93

This change introduces a hand-rolled implementation of input.ReadString() that stops reading if the line buffer gets bigger than maxLineCharacters.

I thought the performance of this would be terrible, but actually it appears to be slightly faster slower when timing the ParsePatch function (with real loose timing so might be good to check out).

For commits that are shown completely the memory usage is slightly better, but for commits where files are hidden because they are too big, the memory usage is much, much better (my test was a 100mb one-liner which went from 322mb to 55mb).

The ParsePatch function does not behave completely identical because line is now truncated to whatever the user sets as maximum. I checked all usages a little and it does not appear to matter (diffs are only shown to the user and if IsIncomplete is set the line member is not used), however this needs some attention during review.

Possible follow-up for this is to completely rewrite the diff functions to use things like git diff --numstat to figure out if diffs have to be truncated and which files are part of a diff instead of this ugly parser.

@codecov-io
Copy link

codecov-io commented Nov 27, 2017

Codecov Report

Merging #2990 into master will decrease coverage by 0.02%.
The diff coverage is 66.66%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2990      +/-   ##
==========================================
- Coverage   33.04%   33.01%   -0.03%     
==========================================
  Files         269      269              
  Lines       39484    39492       +8     
==========================================
- Hits        13047    13039       -8     
- Misses      24584    24603      +19     
+ Partials     1853     1850       -3
Impacted Files Coverage Δ
models/git_diff.go 60.16% <66.66%> (+0.05%) ⬆️
models/repo_indexer.go 45.54% <0%> (-6.44%) ⬇️
modules/indexer/repo.go 60.86% <0%> (-2.61%) ⬇️
modules/avatar/avatar.go 100% <0%> (+18.75%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d39b88a...2ade5b3. Read the comment docs.

@tboerger tboerger added the lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. label Nov 27, 2017
@lunny lunny added the type/enhancement An improvement of existing functionality label Nov 27, 2017
@lunny lunny added this to the 1.x.x milestone Nov 27, 2017
@mrexodia
Copy link
Contributor Author

I will refactor this to use io.LimitReader instead...

@mrexodia
Copy link
Contributor Author

Okay, I'm not going to rewrite this with io.LimitReader because weird things start to happen...

@mrexodia
Copy link
Contributor Author

I fixed an issue where it would hard crash when trying to parse https://try.gitea.io/mrexodia/DarkSouls3.TextViewer/commit/629cf9b3d6b295bbcddf76d1f6167259b764d9dc

@lunny
Copy link
Member

lunny commented Nov 28, 2017

LGTM

@tboerger tboerger added lgtm/need 1 This PR needs approval from one additional maintainer to be merged. and removed lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. labels Nov 28, 2017
@lunny lunny modified the milestones: 1.x.x, 1.4.0 Nov 28, 2017
} else {
return nil, fmt.Errorf("ReadString: %v", err)
var linebuf bytes.Buffer
for {
Copy link
Member

@ethantkoenig ethantkoenig Nov 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I can tell, we only break from this loop when we reach an EOF or new-line. Should we break once linebuf.Len() >= maxLineCharacters? Otherwise it's not clear to me how this helps memory usage.

Never mind

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ethantkoenig I think it's right. See line 272.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line 272 sets curFile.IsIncomplete to true, but the loop will still run for more iterations.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, so that, we could find the next line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, first I used break to stop reading, but that would cause https://try.gitea.io/mrexodia/DarkSouls3.TextViewer/commit/629cf9b3d6b295bbcddf76d1f6167259b764d9dc to crash because it would think the next diff line was at the next character.

Copy link
Member

@ethantkoenig ethantkoenig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, I just have one minor suggestion. Since the ParsePatch(..) function is already quite long, could we move the newly-added code to a helper function? Something like

func ReadLineWithMaxLength(reader io.Reader, maxLen int) (string, error) {
...
}

@mrexodia
Copy link
Contributor Author

I personally don't mind adding it to a function, but it would become rather ugly:

func ReadLineWithMaxLength(reader io.Reader, maxLineCharacters int) (line string, isEof bool, isTruncated bool) {
...
}

I could read maxLineCharacters + 1 and leave the original check at 298 but that feels a bit hacky...

@ethantkoenig
Copy link
Member

Fair enough, LGTM

@tboerger tboerger added lgtm/done This PR has enough approvals to get merged. There are no important open reservations anymore. and removed lgtm/need 1 This PR needs approval from one additional maintainer to be merged. labels Nov 28, 2017
@lafriks
Copy link
Member

lafriks commented Nov 28, 2017

@lunny why backport for this?

@lafriks lafriks merged commit c80d147 into go-gitea:master Nov 28, 2017
@mrexodia mrexodia deleted the diff-memory-usage branch November 29, 2017 00:04
@go-gitea go-gitea locked and limited conversation to collaborators Nov 23, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
lgtm/done This PR has enough approvals to get merged. There are no important open reservations anymore. type/enhancement An improvement of existing functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants