Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't create archives larger than 1GB #10501

Open
prohtex opened this issue Feb 21, 2024 · 6 comments
Open

Can't create archives larger than 1GB #10501

prohtex opened this issue Feb 21, 2024 · 6 comments
Labels
Type:Bug Something isn't working

Comments

@prohtex
Copy link

prohtex commented Feb 21, 2024

Hi all, I'm curious to know if there's a plan for supporting larger archives in OCIS. In OC10, we frequently grab directories that are in excess of 50GB. After clicking the file download, the browser instantly begins to download the tar file. The behavior in OCIS is odd-even a 700mb download freezes the browser for quite a while, and increasing the max_size results in the browser consuming massive resources and/or hanging. It seems strange that the design would be for archives under 1GB-a very small limit, and a rather severe restriction for use cases involving dropbox-like sharing of files over the web.

If there are other archiver options or system optimizations that enable large archive creation, I'd be grateful to know them. For now, this is not working at all:

  archiver:
    insecure: true
    max_size: 10737418240
@prohtex prohtex added the Type:Bug Something isn't working label Feb 21, 2024
@prohtex prohtex changed the title Can't create large archives Can't create archives larger than 1GB Feb 21, 2024
@prohtex
Copy link
Author

prohtex commented Sep 16, 2024

Hey @micbar @ScharfViktor, @wkloucek, @kulmann is there any plan to move to a different model for creating archives in Web? As a few users have expressed (owncloud/ocis#9709 etc), using Web to create large archives is crucial to deployments in the media space (videographers, digital photographers and retouchers, etc) and while the desktop and mobile clients are wonderful, there are still users who must rely on accessing large files from Web. This for me is the single sticking point in a product that is otherwise far superior to the commercial alternatives. Impressive for open source! And thank you in advance for your time.

From what I can ascertain, OC10 used a server-side procedure that I recall being something like this:

  1. The download link redirects to a PHP script that hangs the browser while the server tar's some files
  2. Server side code (probably exec()) makes the tar file
  3. The download script hands the file to the browser as soon as it is ready, and the download begins
  4. Normal browser UI happens as the file downloads

This model, while being old-school, worked great, and allowed for archive creation bounded by the OS and Webserver file download limits.

The OCIS approach on the other hand is a bit of a mystery! I suppose the reliance on decomposedfs means that creating the tar server side and sticking it in /tmp to be passed off to the browser is tricky? It almost seems like the tar creation is browser-side, eg:

  1. The download link hands the browser a bunch of individual files, that download asynchronously without standard browser UI
  2. The browser places the files in some sort of OS-specific temp location (or in memory?)
  3. The browser uses JS to tar the files, causing both the server and browser to choke and hog memory if the files are large. During this time the browser shows a "pace" style round robin loading indicator, rather than native download UI
  4. Presto! The tar file is just "downloaded" instantly

While this sure is cool and weird and probably uses some bleeding edge JS magic, it only works well on small files. Is there some way to either A. Fork the codebase for larger files or B. Contemplate moving to a different model for the OCIS archiver?

Please forgive my assumptions here, and feel free to explain that I am totally wrong! I also wanted to say that if I can contribute some code rather than testing various deployments and sounding off in comments, I'd be so happy to do that. I've written a few HTML5 file downloaders and backend PHP scripts over the years. In the meantime I'm hoping to better understand the approach and limitations.

All of this is expressed with the deepest gratitude to you all for your fine work on this phenomenal and versatile product. Thank you!

@micbar
Copy link
Contributor

micbar commented Sep 18, 2024

@prohtex you are welcome!

The mechanism in ocis is quite different.
Ocis is a microservice architecture.

So basically every service is a client to the other services.

In this case, we have an archiver service. This archiver has no direct access to the storage and the files itself. (separation of concern). The archiver needs to ask the storage-users service for the files and download them into the archivers memory buffer. After that, the archiver creates the archive and hands it to the web client to download.

Possible Improvements

  1. We could implement the archiver directly inside the storage-users service
  2. We could try to stream the bytes directly into the archiver (if that is possible at all?) without a buffer in memory
  3. any other ideas? @aduffeck @butonic

@kulmann
Copy link
Member

kulmann commented Sep 18, 2024

From the web-ui perspective we could make the process completely async:

  1. trigger creating the archive
  2. server sends a SSE when the archive is ready for download including a download url.

@kulmann
Copy link
Member

kulmann commented Sep 18, 2024

Afaik the archiver service is a very naive implementation, just to have one at all... IMO should be replaced with a proper implementation. :D Funny that it doesn't even do compression...

@prohtex
Copy link
Author

prohtex commented Sep 18, 2024

@prohtex you are welcome!

The mechanism in ocis is quite different. Ocis is a microservice architecture.

So basically every service is a client to the other services.

In this case, we have an archiver service. This archiver has no direct access to the storage and the files itself. (separation of concern). The archiver needs to ask the storage-users service for the files and download them into the archivers memory buffer. After that, the archiver creates the archive and hands it to the web client to download.

Possible Improvements

  1. We could implement the archiver directly inside the storage-users service
  2. We could try to stream the bytes directly into the archiver (if that is possible at all?) without a buffer in memory
  3. any other ideas? @aduffeck @butonic

Hi @kulmann thanks for the thoughtful reply. It did seem implausible that the archive was being created client-side, but I did find a few js libraries that could do this. Could have just spent some more time in web inspector to educate myself.

One thing I did notice is that during the download process (the point where the JS pace style indicator displays) my browser became sluggish and consumed a lot more memory than it typically does.

It certainly seems the bottleneck is loading the files into server memory. Hopefully an improvement can be made so the archiver can approach OC10 functionality. I understand this is a much different animal than passing some shell commands from a PHP script that works on local files.

Thanks again!

@prohtex
Copy link
Author

prohtex commented Oct 2, 2024

Afaik the archiver service is a very naive implementation, just to have one at all... IMO should be replaced with a proper implementation. :D Funny that it doesn't even do compression...

Tonight I attempted to download a 500mb archive and had Safari crash on me, which is extraordinarily rare! Looking forward to some kind of improved solution for archives. For now, telling everyone not to use the browser, which is tricky workflow for some people!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type:Bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants