[oCIS FS] clean up aborted uploads #2622

wkloucek · 2021-10-14T06:39:38Z

Is your feature request related to a problem? Please describe.

When using oCIS FS as users storage, you will have an storage/users/uploads directory in your oCIS data folder. This is an intermediate directory for resumable (TUS) uploads. Each upload consists of a blob file and a blob.info file. If an upload succeeds, the blob file will be moved to the target and the blob.info file will be deleted.

Uploads can be aborted and the blob and blob.info files will not be (re)moved in that case. Therefore a lot of data can pile up.

Describe the solution you'd like

oCIS FS cleans up expired uploads by it self.

Describe alternatives you've considered

oCIS offers a command to clean expired uploads, which can then be used for manual maintenance or a regular clean up (eg. scheduled by cron)

Additional context

The text was updated successfully, but these errors were encountered:

wkloucek · 2021-10-14T06:40:23Z

cc @butonic

stale · 2021-12-13T06:56:37Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 10 days if no further activity occurs. Thank you for your contributions.

jvillafanez · 2022-05-12T08:17:23Z

I think that's something reva should do. I mean, I don't see any reason this couldn't happen with any other storage backend. Maybe other storage implementation keep the chunks in a different storage area, but that doesn't mean that such storage area couldn't be filled.

I'm not sure about specific implementation details in reva, but I think it should provide a service that runs periodically to cleanup uploads after 1-2 days (configurable).

A possible idea is that reva could have a dedicated thread to cleanup the uploads.
When a new upload is started, the upload information (and anything needed to delete the upload) is sent to the cleanup thread along with the timestamp. Once the upload finish successfully, we can tell the cleanup thread to remove the information about that specific upload.
The cleanup thread is expected to "run" each 10 minutes (configurable), and remove the uploads that have been stalled for 1-2 days (also configurable).
During the startup, the cleanup thread can read the FS to see what files are being uploaded, but after that, the thread shouldn't need to check the FS

C0rby · 2022-06-13T14:36:51Z

Yeah, I think the clean solution would be to implement this in the storage driver.
Or is a command absolutely necessary?

C0rby · 2022-06-22T09:57:45Z

If it needs to be implemented as a command then we would either have to implement storage specific mechanisms in the command or include a generic 'clean up' API to the CS3 API to be able to replace the underlying storage.

kobergj · 2022-06-22T10:08:46Z

@C0rby I don't really understand how we can do this without a command (or an API). Should we just start a go-routine that cleans up upload data? Can you elaborate what you have in mind?

C0rby · 2022-06-22T10:10:45Z

Well, it could be the responsibility of every storage driver to implement a clean up mechanism. Then the storage driver itself would run an asynchronous goroutine or something else.

kobergj · 2022-06-22T10:17:31Z

Ah ok. But how would that go-routine know that it has something to do? It would just sleep for 10mins and then check? Then we could also have a command and trigger it via cronjob. Beneficial in that would be that we could also run it manually (on systems with low power for example)

C0rby · 2022-06-22T10:22:52Z

But how would that go-routine know that it has something to do? It would just sleep for 10mins and then check?

I'd say this depends on the implementation but yeah something like that.

Beneficial in that would be that we could also run it manually (on systems with low power for example)

Yeah I see the benefits. That also means that we need to add a ListUploads method or something similar to the CS3 APIs. Which we might need anyways when I think of async uploads.

kobergj · 2022-06-22T10:27:28Z

I'd say this depends on the implementation but yeah something like that.

Basically the upload is cleaned correctly (in success or error case or when the user aborts the upload). The upload is only left if the user pauses the upload (and never restarts) or maybe on some connection issues. So my point is: I'm not sure if this is worth having a go-routine running all the time. CPU time costs money also. But I'm fine with both approaches, both have their benefits.

C0rby · 2022-06-22T10:49:09Z

I see your points. I'd also say for the decomposedfs it would also be enough to have a cronjob running a bash script using standard tools since the decomposedfs are just plain files. But for other storages this might not be sufficient so we probably end up having to implement an API.

jvillafanez · 2022-06-22T11:43:07Z

I think we can keep it internal to the each storage implementation.

When the storage starts, a go-routine can also start to perform the cleanup of the uploads. This go-routine will behave cron-like, so it will be slept for the most time and run each 15-30 minutes (or something configurable). The CPU usage should be close to 0 when it's sleeping.

The communication between the storage and the go-routine will be via shared memory. The idea is that a list / map / custom object is shared between the storage implementation and the go-routine, so when the upload starts the required data is written in that shared object and it will be removed when the upload finished correctly. When the go-routine wakes up, it might see that there is an upload ongoing by checking the shared object, so depending on how much time the upload has been there it can decide to clean it up.
Assuming the shared data is sorted by timestamp, the go-routine can check the first entry and decide to do nothing because the upload started (or has been updated) a few minutes ago, so there is no need to check more because the remaining uploads will be more recent. This means that the go-routine won't need to check thousands of uploads but just a few of them.

The advantages I see are:

oCIS won't notice anything, so there is no code change to be done there.
CS3 APIS remain without any change.
Cleanup should be quite efficient:
- Code will run very close to the storage, so there should be very few overhead.
- It can use FS-specific calls that might not be available everywhere.
- The storage implementation is expected to know how the upload is handled (temporary file location, chunk handling, etc), so the cleanup can take advantage of this knowledge to be efficient.
It won't require additional setup (maybe some configuration)

There might be some corner cases, but in general it seems a good idea.

wkloucek · 2022-06-22T11:59:34Z

Please keep also in mind that we have multiple instances of the storage service running. So we have a good chance of running multiple cleanups in parallel then. Don't know if that could cause problems.

jvillafanez · 2022-06-22T12:31:00Z

It should be ok. The go-routine will monitor the uploads going through the storage instance that has spawned it. If there are multiple instances they won't overlap.
As far as I know, it isn't possible that part of an upload goes through one instances and another part through a different instance.

The only problem is a crash-recovery scenario where the storage service crashed with GB of partial uploads. It might be a good idea if each instance has its own temporary directory for the uploads, that way each instance can either recover or remove their own failed uploads, maybe something like <baseTmpDir>/<instanceId>, not sure how we can identify each instance though.

kobergj · 2022-06-22T12:52:04Z

The go-routine will monitor the uploads going through the storage instance that has spawned it.

I'm not sure it should do that. It would need to persist data in case it gets restarted. I would suggest it just goes through all uploads and decides whether it should remove it or not through the .info file.
@wkloucek I assume all those storage services running would look at the same volume? So we could indeed run into multiple cleanup jobs on the same upload at the same time.

jvillafanez · 2022-06-22T13:28:39Z

I would suggest it just goes through all uploads and decides whether it should remove it or not through the .info file.

I'd prefer to keep it as a recovery mechanism in case the service crashes. Without having any limitations around uploads nor users, there could be thousands of uploads at the same time. Having to scan all the uploads can take a lot of time, so I don't think we should do that by default.

By using shared memory (I'm talking about a shared object between the instance and the go-routine, not something like IPC), the go-routine doesn't need to constantly check the FS, and all the info it needs is already in memory and accessible. In addition, as said, the algorithm can be quite efficient, leading to a low CPU usage by the go-routine.

wkloucek · 2022-06-22T14:27:21Z

Uploads in oCIS should:

directly write to the target storage, to avoid cross storage moves
be resumable at every instance of the service. Therefore we don't need sticky sessions and users have no degraded upload experience if an instance of a service dies.

Since oCIS is a distributed system we cannot rely on shared memory. We only have shared storage.

All state needs to live on the storage. We cannot keep things in memory.

I would go for a manual command first.
That can be schedule as needed.
Ideally two parallel cleanups would not conflict with each other. Means it would give warnings like "upload xyz was scheduled to be cleaned, but disappeared in the meantime", but would not fail hard because of disappeared upload.

The cleanup tool needs to inspect the .info files for when the upload expires and only delete it after expiration. We could also add a force flag which also deletes uploads before expiry (needs some filter mechanism).

jvillafanez · 2022-06-22T15:35:29Z

A command is ok if it's planned as a debugging tool, so if something goes wrong we know what the command is doing step by step.
I'm pretty sure that we'll need something smarter at some point, because I don't think people will like to setup a cron job or similar on their own. If it's planned as a debugging tool, the command can stay when we have something better.

butonic · 2022-06-24T14:39:22Z

upload expiry is actually the TUS expiation extension that we should implement:

reva has to announce an Upload-Expires header in RFC 7231 datetime format, eg: Upload-Expires: Wed, 25 Jun 2014 16:00:00 GMT
we should make the timeout configurable per storage provider, maybe even per space, per storage provider is good enough for now
the storage provider could start a cleanup job that remembers uploads and removes them when they expire ... it should be automatic.
for cleanups a cli command to manage current uploads would be nice, as in list and delete commands

There is also the TUS Termination extension that allows clients to actively delete uploads.

michl19 · 2022-06-27T12:11:15Z

Outcome refinement: implement cli command using storage driver directly

Fixes owncloud#2622

wkloucek added Platform:oCIS Category:Enhancement Add new functionality Priority:p3-medium Normal priority Storage:OCIS labels Oct 14, 2021

stale bot added the Status:Stale label Dec 13, 2021

wkloucek added Type:Technical-Debt and removed Status:Stale labels Dec 13, 2021

aduffeck self-assigned this Jul 7, 2022

micbar added this to the 2.0.0 General Availability milestone Jul 18, 2022

aduffeck mentioned this issue Jul 21, 2022

Cleanup uploads cs3org/reva#3095

Merged

aduffeck added a commit to aduffeck/ocis that referenced this issue Jul 21, 2022

Add a cli for listing and cleaning up expired uploads

b11ba0b

Fixes owncloud#2622

aduffeck added a commit to aduffeck/ocis that referenced this issue Jul 21, 2022

Add a cli for listing and cleaning up expired uploads

eaa711d

Fixes owncloud#2622

aduffeck mentioned this issue Jul 21, 2022

Add a cli for listing and cleaning up expired uploads #4256

Merged

aduffeck added a commit to aduffeck/ocis that referenced this issue Aug 4, 2022

Add a cli for listing and cleaning up expired uploads

b7b7111

Fixes owncloud#2622

micbar closed this as completed in #4256 Aug 9, 2022

micbar closed this as completed in e35327c Aug 9, 2022

micbar moved this to Done in Infinite Scale Team Board Dec 16, 2022

micbar added this to Infinite Scale Team Board Dec 16, 2022

shishouyuan mentioned this issue Jul 1, 2023

Automatic cleanup for expired files #6687

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[oCIS FS] clean up aborted uploads #2622

[oCIS FS] clean up aborted uploads #2622

wkloucek commented Oct 14, 2021

wkloucek commented Oct 14, 2021

stale bot commented Dec 13, 2021

jvillafanez commented May 12, 2022

C0rby commented Jun 13, 2022

C0rby commented Jun 22, 2022

kobergj commented Jun 22, 2022

C0rby commented Jun 22, 2022

kobergj commented Jun 22, 2022

C0rby commented Jun 22, 2022

kobergj commented Jun 22, 2022

C0rby commented Jun 22, 2022

jvillafanez commented Jun 22, 2022

wkloucek commented Jun 22, 2022

jvillafanez commented Jun 22, 2022 •

edited

Loading

kobergj commented Jun 22, 2022 •

edited

Loading

jvillafanez commented Jun 22, 2022

wkloucek commented Jun 22, 2022

jvillafanez commented Jun 22, 2022

butonic commented Jun 24, 2022 •

edited

Loading

michl19 commented Jun 27, 2022

[oCIS FS] clean up aborted uploads #2622

[oCIS FS] clean up aborted uploads #2622

Comments

wkloucek commented Oct 14, 2021

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

wkloucek commented Oct 14, 2021

stale bot commented Dec 13, 2021

jvillafanez commented May 12, 2022

C0rby commented Jun 13, 2022

C0rby commented Jun 22, 2022

kobergj commented Jun 22, 2022

C0rby commented Jun 22, 2022

kobergj commented Jun 22, 2022

C0rby commented Jun 22, 2022

kobergj commented Jun 22, 2022

C0rby commented Jun 22, 2022

jvillafanez commented Jun 22, 2022

wkloucek commented Jun 22, 2022

jvillafanez commented Jun 22, 2022 • edited Loading

kobergj commented Jun 22, 2022 • edited Loading

jvillafanez commented Jun 22, 2022

wkloucek commented Jun 22, 2022

jvillafanez commented Jun 22, 2022

butonic commented Jun 24, 2022 • edited Loading

michl19 commented Jun 27, 2022

jvillafanez commented Jun 22, 2022 •

edited

Loading

kobergj commented Jun 22, 2022 •

edited

Loading

butonic commented Jun 24, 2022 •

edited

Loading