Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Basic performance testing #22

Closed
jacobsa opened this issue Mar 24, 2015 · 14 comments
Closed

Basic performance testing #22

jacobsa opened this issue Mar 24, 2015 · 14 comments
Assignees
Milestone

Comments

@jacobsa
Copy link
Contributor

jacobsa commented Mar 24, 2015

We should do some performance testing to identify completely obvious bottlenecks.

  • Run on GCE, on a mid-range machine.
  • Generate random data of various sizes (1 KiB, 1 MiB, 10 MiB, 100 MiB, 1 GiB, 10 GiB?)
  • Use time cp to measure wall time taken to copy from local disk to GCS.
  • Find some way to measure CPU time taken by the gcsfuse process during the operation, too.
  • Look at throughput in terms of bytes per wall second and bytes per CPU second.
  • As a baseline, compare both to gsutil cp. (Both can be measured here by just using time gsutil cp, I think.)
@jacobsa jacobsa self-assigned this Mar 24, 2015
@jacobsa jacobsa added this to the alpha milestone Mar 24, 2015
@jacobsa
Copy link
Contributor Author

jacobsa commented Mar 25, 2015

Setting up in us-central1-a on n1-standard-4 with image backports-debian-7-wheezy-v20150320 and standard persistent disk.

@jacobsa
Copy link
Contributor Author

jacobsa commented Mar 25, 2015

Setup notes

Create an instance with a built-in key that can access GCS:

gcloud compute instances create test --scopes storage-full --image debian-7-backports --machine-type n1-standard-4 --project github-jacobsa-gcloud --zone us-central1-a

Copy in key file:

gcloud compute --project github-jacobsa-gcloud copy-files --zone us-central1-a '/Users/jacobsa/Downloads/GCloud GitHub project-bc05fbd617c4.json' test:

ssh in and set up the shell:

gcloud compute --project github-jacobsa-gcloud ssh --zone us-central1-a test
set -o vi

Install and set up fuse:

sudo apt-get update && sudo apt-get install fuse
sudo adduser $USER fuse
sudo chmod g+rw /dev/fuse
sudo chgrp fuse /dev/fuse

Log out and back in so the group change takes effect:

exit
[...]
gcloud compute --project github-jacobsa-gcloud ssh --zone us-central1-a test
set -o vi

Install Go:

wget https://storage.googleapis.com/golang/go1.4.2.linux-amd64.tar.gz
sudo tar -C /usr/local -xzf go1.4.2.linux-amd64.tar.gz

export GOPATH=$HOME/go
export PATH=$PATH:/usr/local/go/bin
export PATH=$PATH:$GOPATH/bin

Install Git:

sudo apt-get update && sudo apt-get -y install git-core

Install gcsfuse:

go get -v github.com/googlecloudplatform/gcsfuse

Test notes

Generate some data:

dd if=/dev/urandom of=foo bs=1048576 count=1     # 1 MiB
dd if=/dev/urandom of=foo bs=1048576 count=100   # 100 MiB
dd if=/dev/urandom of=foo bs=1048576 count=1024  # 1 GiB

gcsfuse

Mount gcsfuse:

mkdir -p mp
gcsfuse --bucket jacobsa-gcs-integration-test --key_file GCloud\ GitHub\ project-bc05fbd617c4.json mp &

Copy in the file:

rm -f mp/foo
time cp foo mp/

gsutil

gsutil rm -f gs://jacobsa-gcs-integration-test/foo
time gsutil cp foo gs://jacobsa-gcs-integration-test/

@jacobsa
Copy link
Contributor Author

jacobsa commented Mar 25, 2015

Results from a few runs

100 MiB

gcsfuse

real    0m8.240s
user    0m0.020s
sys     0m0.952s

real    0m6.245s
user    0m0.016s
sys     0m0.732s

real    0m6.353s
user    0m0.000s
sys     0m0.748s

gsutil

real    0m2.620s
user    0m1.548s
sys     0m0.672s

real    0m3.036s
user    0m1.552s
sys     0m0.700s

real    0m2.965s
user    0m1.384s
sys     0m0.684s

1 GiB

gcsfuse

real    1m5.938s
user    0m0.048s
sys     0m9.492s

real    1m1.468s
user    0m0.040s
sys     0m8.384s

gsutil

real    0m16.926s
user    0m6.420s
sys     0m2.832s

real    0m20.855s
user    0m6.408s
sys     0m2.928s

@jacobsa
Copy link
Contributor Author

jacobsa commented Mar 25, 2015

So pretty slow, but not as terribly slow as I had feared. Some things to look at:

  • Is it possible cp is doing something braindead here? Try using dd instead, where we can control the block size.
  • What proportion of the time spent is writing to the local file vs. uploading vs. everything else? Add logging to find out.
  • Confirm that the kernel is sending 4 KiB writes, which is probably terrible, then fix.

@jacobsa
Copy link
Contributor Author

jacobsa commented Mar 25, 2015

Trying with dd:

# <Generate 1 GiB foo>
rm -f mp/foo
time dd if=foo of=mp/foo bs=1048576

which gives:

real    0m58.100s
user    0m0.012s
sys     0m8.796s

which is not much different. Only slightly better with a block size of 10x that, on a single run:

real    0m53.033s
user    0m0.008s
sys     0m7.824s

@jacobsa
Copy link
Contributor Author

jacobsa commented Mar 25, 2015

Confirmed: writes are 4 KiB.

@jacobsa
Copy link
Contributor Author

jacobsa commented Mar 25, 2015

Hacked together a binary that uses bazilfuse.InitBigWrites in the response to fuse's init request, which appears to result in 32 KiB writes from the kernel.

New results for 1 GiB:

real    0m43.312s
user    0m0.052s
sys     0m2.656s

real    0m37.229s
user    0m0.064s
sys     0m1.980s

Better.

Surprisingly, InitWritebackCache plus big writes appears to make things much worse:

real    1m42.914s
user    0m0.012s
sys     0m0.220s

@jacobsa
Copy link
Contributor Author

jacobsa commented Mar 25, 2015

Hacked in some logging that shows that with 32 KiB most of the time is spent in ObjectProxy.Sync, which is doing the network write:

2015/03/25 05:05:47.256616 object_proxy.go:244: WriteAt called
2015/03/25 05:05:47.256658 object_proxy.go:412: ensureLocalFile called
2015/03/25 05:05:47.293666 object_proxy.go:423: ensureLocalFile returning
2015/03/25 05:05:47.293758 object_proxy.go:259: WriteAt returning
2015/03/25 05:05:52.133747 object_proxy.go:296: Sync called
2015/03/25 05:06:29.445792 object_proxy.go:342: Sync returning

real    0m42.493s
user    0m0.044s
sys     0m1.908s

Now that's actually a bit odd to me. What is our io.Copy doing that gsutil is not? Serving from a bytes.Reader rather than the file doesn't seem to make a difference.

@jacobsa
Copy link
Contributor Author

jacobsa commented Mar 25, 2015

FWIW, here is the CPU profile while calling CreateObject:

  flat  flat%   sum%        cum   cum%
8900ms 48.79% 48.79%     8900ms 48.79%  crypto/cipher.(*gcm).mul
2010ms 11.02% 59.81%     2010ms 11.02%  crypto/aes.encryptBlockAsm
1960ms 10.75% 70.56%     1960ms 10.75%  runtime.memmove
 760ms  4.17% 74.73%      760ms  4.17%  crypto/cipher.getUint64
 560ms  3.07% 77.80%    10200ms 55.92%  crypto/cipher.(*gcm).updateBlocks
 540ms  2.96% 80.76%      540ms  2.96%  runtime.futex
 520ms  2.85% 83.61%      560ms  3.07%  syscall.Syscall
 480ms  2.63% 86.24%     2640ms 14.47%  crypto/aes.(*aesCipher).Encrypt
 460ms  2.52% 88.76%     3760ms 20.61%  crypto/cipher.(*gcm).counterCrypt
 310ms  1.70% 90.46%      310ms  1.70%  crypto/cipher.fastXORWords

@jacobsa
Copy link
Contributor Author

jacobsa commented Mar 25, 2015

Although gcsfuse CPU usage hovers at about 50%, so that's probably not the bottleneck.

@jacobsa
Copy link
Contributor Author

jacobsa commented Mar 25, 2015

To do for alpha: Document these numbers, set performance expectations.

@jacobsa
Copy link
Contributor Author

jacobsa commented Mar 26, 2015

Switched to resumable object uploads in jacobsa/gcloud@3b1af22, which seems to make a bit of a difference to upload speed. As far as I can tell, this effect is responsible for most of the remaining delta with gsutil. (Ignoring the unavoidable double-copy overhead when staging the temporary file in gcsfuse.)

@jacobsa
Copy link
Contributor Author

jacobsa commented Mar 26, 2015

To do:

  1. Wait for jacobsa/gcsfuse#27 (increasing kernel write size).
  2. Try the test above again.
  3. Set performance expectations in the docs.
  4. Call it good enough or not. If good enough, close this bug. Either way remove from alpha milestone.

@jacobsa
Copy link
Contributor Author

jacobsa commented Mar 26, 2015

Issue #27 is fixed.

Tried again with 1 GiB:

real    0m27.226s
user    0m0.064s
sys     0m2.152s

real    0m24.109s
user    0m0.040s
sys     0m1.736s

real    0m25.231s
user    0m0.048s
sys     0m1.616s

Much much better, and proabably good enough for now. Let's document that performance is comparable with gsutil, and wait for specific complaints.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant