-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Figure out optimal upload/download strategy #1
Comments
The basic requirement should be to transfer as much data transfer load as possible to the back-end storage and direct download by the client from the back-end storage, eg AWS S3 / GCP GCS / Minio. Otherwise, given the use of synchronous Django and Python, it may have low performance, although some optimizations are possible, eg Nginx X-Sendfile, with moderate scalability. |
Agreed! For Singularity Hub, for example, we just generate signed URLs for Google Storage and offload there, same with the Minio backend for Singularity Registry Server (both are not OCI, however, hence why I'm creating this module). You are correct that Django + Python tends to be a bottle neck - the use case is less a huge industry registry, and more a smaller research-oriented one that serves many fewer requests. |
From my experience with the Distribution Registry, it is relatively easy to lead to sudden bursts of high load, even in small environments (a team of 2-3 people), in the era of CI systems enabling the simultaneous launch of multiple workers connected to a high-speed link. For example, it is easy to run 20 parallel jobs in GitHub Actions (see usage limit of free plan https://docs.github.com/en/free-pro-team@latest/actions/reference/usage-limits-billing-and-administration#usage-limits ), each running on a separate virtual machine, and many of them at the beginning of their work will require some data to be downloaded from registry and finally sent to the registry. In order to trigger so many jobs, it is enough to have a few commits in a short period of time and/or a matrix that checks various configurations, e.g. different Python versions, in the case of a library. I understand that Python and Django are good enough for your needs. I would like points to some elements based on experiences that were not expected for me at the beginning. |
Thanks for the feedback! If we have a storage backend with signed urls for upload/download, the core registry running on Django (but offloading to that storage) I don't think would be an issue. But you are totally right that the filesystem storage, for example running alongside the action, might be too much. |
The text was updated successfully, but these errors were encountered: