Skip to content
This repository has been archived by the owner on Sep 12, 2018. It is now read-only.

docker client times out pulling large fs layer from S3-backed registry #956

Closed
chaiken-verticloud opened this issue Mar 6, 2015 · 6 comments
Labels

Comments

@chaiken-verticloud
Copy link

At Altiscale, we have been running a standalone docker registry backed by S3. The registry works fine with STORAGE_REDIRECT=true. We're tightening up registry security and bandwidth usage by setting STORAGE_REDIRECT=false, but our docker clients now time out when fetching large fs layers.

Here's the scenario:

  • On the docker client, pull the image with the large fs layer...
[docker_client_host]$ docker pull registry reghost.altiscale.com/container_name
Pulling repository reghost.altiscale.com/container_name
921a3438ca3d: Pulling image (latest) from reghost.altiscale.com/container_name
921a3438ca3d: Pulling image (latest) from reghost.altiscale.com/container_name, endpoint: https://reghost.altiscale.com/v1/ 
921a3438ca3d: Pulling metadata 
921a3438ca3d: Pulling fs layer 
  • The docker registry receives a request for the layer.
  • This code in depends/docker-registry-core/docker_registry/core/boto.py:Base.stream_read decides to fetch the layer using the ParallelKey class.
        if not bytes_range and key.size > 1024 * 1024:
            # Use the parallel key only if the key size is > 1MB                                                                                                                  
            # And if bytes_range is not enabled (since ParallelKey is already                                                                                                     
            # using bytes range)                                                                                                                                                  
            key = ParallelKey(key)
  • This code in depends/docker-registry-core/docker_registry/core/boto.py:ParallelKey.__init__ opens a temporary file:
self._tmpfile = tempfile.NamedTemporaryFile(mode='rb')
  • In 5 (CONCURRENCY) threads, this code in depends/docker-registry-core/docker_registry/core/boto.py:ParallelKey._fetch_part starts transferring 1/5 of the object at the corresponding offset in the temporary file:
with open(fname, 'wb') as f:
            f.seek(min_cur)
            brange = 'bytes={0}-{1}'.format(min_cur, max_cur)
            boto_key.get_contents_to_file(f, headers={'Range': brange})
            boto_key.close()
  • The fs layer object in S3 is so large that the docker client times out while these 5 threads are running, and retries.
  • The above process repeats: opens a second temporary file, spawns 5 threads, and the 5 new threads start transferring the object into the second temporary file. At this point, 10 threads are running, transferring into two different temporary files.
  • Eventually one of the first 5 threads completes the call to boto_key.get_contents_to_file and calls ParallelKey._refresh_max_completed_byte, which emits the message buffering complete at 20.0% of the total transfer; now serving straight from the tempfile. At this point, the client has already timed out on its original request, so the initial fetch of the S3 object into the temporary file was not useful work.
  • This process repeats: each time the threads complete transferring the object, the docker client has already timed out.
  • After two tries, the docker client emits this error after all of its retries fail:
921a3438ca3d: Error pulling image (latest) from reghost.altiscale.com/container_name, Get https://reghost.altiscale.com/v1/images/921a3438ca3d58320b54f1117a992b5ce2c2f7305ddb354250364a19fb3d37e6/layer: read tcp ip_address_of_reghost:443: i/o timeout 

I have confirmed this same behavior using curl directly on the docker registry host. This curl command demonstrates the same behavior, except that curl doesn't time out so that the initial download eventually works. When running the curl command, curl does not download the first byte of the file until the first of the 5 docker registry threads calls ParallelKey._refresh_max_completed_byte. Once the first thread completes, curl starts downloading the layer.

[reghost.altiscale.com]$ curl -v -O http://localhost:5000/v1/images/921a3438ca3d58320b54f1117a992b5ce2c2f7305ddb354250364a19fb3d37e6/layer

This curl command uses the Range header to disable ParallelKey. This header makes the if not bytes_range conditional in the code above false, so that ParallelKey does not run. In this case, curl starts downloading bytes immediately.

[reghost.altiscale.com]$  curl -v -O -H 'Range: bytes=0-' http://localhost:5000/v1/images/921a3438ca3d58320b54f1117a992b5ce2c2f7305ddb354250364a19fb3d37e6/layer

Here are some potential solutions for the problem that I would be happy to contribute as code:

  1. Modify boto.py:ParallelKey._fetch_part so that it limits the size of any individual transfer, and therefore calls ParallelKey._refresh_max_completed_byte faster. This change would be pretty easy, but each thread would still work on its own segment of the file. The result is that the streaming transfer of the client would be rate limited by single thread transfer performance.
  2. Modify boto.py:ParallelKey so that each thread picks up a new range of bytes to transfer as close to the beginning of the object as possible, but not overlapping with the ranges being transferred by other threads. This change would be more substantial (and harder to get right), but the all of the threads would contribute to the rate that the client would be able to transfer.
  3. Modify the docker-registry configuration code to allow the operator to disable ParallelKey entirely, or if an S3 object is over a configured size.
  4. Remove the ParallelKey code from boto.py.
  5. Provide the ability to configure the docker client with a longer timeout. This approach would eliminate the need to modify docker-registry code, but does not improve performance for the client by allowing it to do a streaming transfer in parallel with the docker registry's transfer from S3.
  6. Provide the ability for the docker client to disable ParallelKey by passing the Range HTTP header to the docker registry. This approach would work with the current docker registry code, but seems like it's too much of a hack because it depends on the internal implementation of boto.py:Base.stream_read.

Any thoughts on the best way to proceed?

...or maybe I missed some obvious way to fix this issue? Please let me know if there is some obvious way to get a standalone registry working with STORAGE_REDIRECT=false and large fs layers stored as S3 objects!

@dmp42
Copy link
Contributor

dmp42 commented Mar 6, 2015

Ok, that sounds bad enough.

Given the effort is now on https://github.com/docker/distribution (new golang implementation), little effort is going to be put in the python-registry here.

My suggestion is either remove the parallel key code entirely, or better, make it configurable (and disable it by default).

If you are willing to PR this, that would be awesome...

@chaiken-verticloud
Copy link
Author

Thanks for the quick response, Olivier! I also noticed issues with tmp files not being cleaned up, and saw that others had already reported similar problems (like issue #954). It seems like the best thing to do at this point is to simplify the code by removing the ParallelKey class, rather than tracking down the corner cases that lead to orphaned tmpfiles. I'll take care of removing the code and submit a pull request.

@chaiken-verticloud
Copy link
Author

Here's a draft of the code, which passes flake8 and seems to work on my test instance:
master...Altiscale:dc_OPS-5907

I'll do a bit more testing, and then submit the pull request.

@dmp42
Copy link
Contributor

dmp42 commented Mar 7, 2015

Fantastic.
Give me a ping as soon as you have a PR, I'll make sure it's reviewed quickly.

@chaiken-verticloud
Copy link
Author

I'm moving my performance test results into this document...

https://docs.google.com/spreadsheets/d/1VbdtRL8w8eAf0sKse1qk7vLmd-J10tIiyeUqCTjSfL4/edit?usp=sharing

tl;dr: The pull request fixes the problem and doesn't change performance otherwise. As expected, STORAGE_REDIRECT provides better bandwidth and therefore better latency for multiple simultaneous clients. For access by a single client, performance mostly depends on the networking between the client and the registry, rather than the code running on the registry.

@dmp42
Copy link
Contributor

dmp42 commented Mar 10, 2015

Thanks a lot for your work on this!

@dmp42 dmp42 closed this as completed Mar 10, 2015
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants