Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up JSON chunker. #3825

Merged
merged 2 commits into from
Aug 16, 2019
Merged

Speed up JSON chunker. #3825

merged 2 commits into from
Aug 16, 2019

Conversation

danielmai
Copy link
Contributor

@danielmai danielmai commented Aug 16, 2019

Remove a buffer size allocation in the JSON chunker that was taking up
a lot of time in the CPU profile while live loading, which had a huge
impact to live loading JSON data. I tried loading a JSON array of a
million objects structured like the following:

[{"uid":"0x1", "pred": "val"}, ...]

Before this change:

Number of TXs run            : 1010
Number of N-Quads processed  : 1009929
Time spent                   : 5m22.79806804s
N-Quads processed per second : 3136

After this change:

Number of TXs run            : 1010
Number of N-Quads processed  : 1009929
Time spent                   : 13.396047868s
N-Quads processed per second : 77686

Here's the CPU profile of the 5-minute live loading session 1-million edges (without this change), which shows the expensive GC path with (*jsonChunker).Chunk: cpu.pprof.gz.

go tool pprof -web ./cpu.pprof.gz

cpupprof

(pprof) list \).Chunk
Total: 11.83mins
ROUTINE ======================== github.com/dgraph-io/dgraph/chunker.(*jsonChunker).Chunk in src/github.com/dgraph-io/dgraph/chunker/chunk.go
     350ms   4.11mins (flat, cum) 34.72% of Total
         .          .    174:		return errors.Errorf("JSON file must contain array. Found: %v", ch)
         .          .    175:	}
         .          .    176:	return nil
         .          .    177:}
         .          .    178:
      20ms       20ms    179:func (*jsonChunker) Chunk(r *bufio.Reader) (*bytes.Buffer, error) {
      10ms      290ms    180:	out := new(bytes.Buffer)
         .   4.05mins    181:	out.Grow(1 << 20)
...

This change is Reviewable

Remove a buffer size allocation in the JSON chunker that was taking up
a lot of time in the CPU profile while live loading, which had a huge
impact to live loading JSON data. I tried loading a JSON array of a
million objects structured like the following:

    [{"uid":"0x1", "pred": "val"}, ...]

Before this change:

    live loader throughput: 3,100/sec

After this change:

    live loader throughput: 77,686/sec
@danielmai danielmai requested review from manishrjain and a team as code owners August 16, 2019 06:58
Copy link

@pullrequest pullrequest bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ A review job has been created and sent to the PullRequest network.


Check the status or cancel PullRequest code review here.

@manishrjain
Copy link
Contributor

LGTM. Please merge.

@danielmai danielmai merged commit 14185f5 into master Aug 16, 2019
@danielmai danielmai deleted the danielmai/speedup-json-chunker branch August 16, 2019 14:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants