Can't create archives larger than 1GB #10501

prohtex · 2024-02-21T04:13:38Z

Hi all, I'm curious to know if there's a plan for supporting larger archives in OCIS. In OC10, we frequently grab directories that are in excess of 50GB. After clicking the file download, the browser instantly begins to download the tar file. The behavior in OCIS is odd-even a 700mb download freezes the browser for quite a while, and increasing the max_size results in the browser consuming massive resources and/or hanging. It seems strange that the design would be for archives under 1GB-a very small limit, and a rather severe restriction for use cases involving dropbox-like sharing of files over the web.

If there are other archiver options or system optimizations that enable large archive creation, I'd be grateful to know them. For now, this is not working at all:

  archiver:
    insecure: true
    max_size: 10737418240

The text was updated successfully, but these errors were encountered:

prohtex · 2024-09-16T07:49:23Z

Hey @micbar @ScharfViktor, @wkloucek, @kulmann is there any plan to move to a different model for creating archives in Web? As a few users have expressed (owncloud/ocis#9709 etc), using Web to create large archives is crucial to deployments in the media space (videographers, digital photographers and retouchers, etc) and while the desktop and mobile clients are wonderful, there are still users who must rely on accessing large files from Web. This for me is the single sticking point in a product that is otherwise far superior to the commercial alternatives. Impressive for open source! And thank you in advance for your time.

From what I can ascertain, OC10 used a server-side procedure that I recall being something like this:

The download link redirects to a PHP script that hangs the browser while the server tar's some files
Server side code (probably exec()) makes the tar file
The download script hands the file to the browser as soon as it is ready, and the download begins
Normal browser UI happens as the file downloads

This model, while being old-school, worked great, and allowed for archive creation bounded by the OS and Webserver file download limits.

The OCIS approach on the other hand is a bit of a mystery! I suppose the reliance on decomposedfs means that creating the tar server side and sticking it in /tmp to be passed off to the browser is tricky? It almost seems like the tar creation is browser-side, eg:

The download link hands the browser a bunch of individual files, that download asynchronously without standard browser UI
The browser places the files in some sort of OS-specific temp location (or in memory?)
The browser uses JS to tar the files, causing both the server and browser to choke and hog memory if the files are large. During this time the browser shows a "pace" style round robin loading indicator, rather than native download UI
Presto! The tar file is just "downloaded" instantly

While this sure is cool and weird and probably uses some bleeding edge JS magic, it only works well on small files. Is there some way to either A. Fork the codebase for larger files or B. Contemplate moving to a different model for the OCIS archiver?

Please forgive my assumptions here, and feel free to explain that I am totally wrong! I also wanted to say that if I can contribute some code rather than testing various deployments and sounding off in comments, I'd be so happy to do that. I've written a few HTML5 file downloaders and backend PHP scripts over the years. In the meantime I'm hoping to better understand the approach and limitations.

All of this is expressed with the deepest gratitude to you all for your fine work on this phenomenal and versatile product. Thank you!

micbar · 2024-09-18T07:43:58Z

@prohtex you are welcome!

The mechanism in ocis is quite different.
Ocis is a microservice architecture.

So basically every service is a client to the other services.

In this case, we have an archiver service. This archiver has no direct access to the storage and the files itself. (separation of concern). The archiver needs to ask the storage-users service for the files and download them into the archivers memory buffer. After that, the archiver creates the archive and hands it to the web client to download.

Possible Improvements

We could implement the archiver directly inside the storage-users service
We could try to stream the bytes directly into the archiver (if that is possible at all?) without a buffer in memory
any other ideas? @aduffeck @butonic

kulmann · 2024-09-18T08:12:34Z

From the web-ui perspective we could make the process completely async:

trigger creating the archive
server sends a SSE when the archive is ready for download including a download url.

kulmann · 2024-09-18T08:17:46Z

Afaik the archiver service is a very naive implementation, just to have one at all... IMO should be replaced with a proper implementation. :D Funny that it doesn't even do compression...

prohtex · 2024-09-18T22:15:00Z

@prohtex you are welcome!

The mechanism in ocis is quite different. Ocis is a microservice architecture.

So basically every service is a client to the other services.

In this case, we have an archiver service. This archiver has no direct access to the storage and the files itself. (separation of concern). The archiver needs to ask the storage-users service for the files and download them into the archivers memory buffer. After that, the archiver creates the archive and hands it to the web client to download.

Possible Improvements

We could implement the archiver directly inside the storage-users service

We could try to stream the bytes directly into the archiver (if that is possible at all?) without a buffer in memory

any other ideas? @aduffeck @butonic

Hi @kulmann thanks for the thoughtful reply. It did seem implausible that the archive was being created client-side, but I did find a few js libraries that could do this. Could have just spent some more time in web inspector to educate myself.

One thing I did notice is that during the download process (the point where the JS pace style indicator displays) my browser became sluggish and consumed a lot more memory than it typically does.

It certainly seems the bottleneck is loading the files into server memory. Hopefully an improvement can be made so the archiver can approach OC10 functionality. I understand this is a much different animal than passing some shell commands from a PHP script that works on local files.

Thanks again!

prohtex · 2024-10-02T23:13:25Z

Afaik the archiver service is a very naive implementation, just to have one at all... IMO should be replaced with a proper implementation. :D Funny that it doesn't even do compression...

Tonight I attempted to download a 500mb archive and had Safari crash on me, which is extraordinarily rare! Looking forward to some kind of improved solution for archives. For now, telling everyone not to use the browser, which is tricky workflow for some people!

prohtex added the Type:Bug Something isn't working label Feb 21, 2024

prohtex changed the title ~~Can't create large archives~~ Can't create archives larger than 1GB Feb 21, 2024

micbar mentioned this issue Sep 18, 2024

[Question] How to edit/adjust config of archiver? owncloud/ocis#9709

Open

kulmann mentioned this issue Oct 7, 2024

download of a large folder only starts after the archive has been created owncloud/ocis#10242

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't create archives larger than 1GB #10501

Can't create archives larger than 1GB #10501

prohtex commented Feb 21, 2024 •

edited

Loading

prohtex commented Sep 16, 2024 •

edited

Loading

micbar commented Sep 18, 2024

kulmann commented Sep 18, 2024

kulmann commented Sep 18, 2024

prohtex commented Sep 18, 2024

Possible Improvements

prohtex commented Oct 2, 2024

Can't create archives larger than 1GB #10501

Can't create archives larger than 1GB #10501

Comments

prohtex commented Feb 21, 2024 • edited Loading

prohtex commented Sep 16, 2024 • edited Loading

micbar commented Sep 18, 2024

Possible Improvements

kulmann commented Sep 18, 2024

kulmann commented Sep 18, 2024

prohtex commented Sep 18, 2024

Possible Improvements

prohtex commented Oct 2, 2024

prohtex commented Feb 21, 2024 •

edited

Loading

prohtex commented Sep 16, 2024 •

edited

Loading