Calculate storage use #6614

normanrz · 2022-11-07T14:05:34Z

Detailed Description

In order to implement storage quotas, we need to capture the storage use of datasets on-disk. I would suggest to calculate that on import and perhaps as a regular cron job. I think it would be useful to store the storage use on a mag-level.
The aggregated storage use per organization needs to be exposed via an API so that the frontend can display it in the organization page and use it to enforce upload blocks.
Remote datasets and symlinked layers should not be counted towards the storage quota.

fm3 · 2022-11-14T14:59:40Z

Do you think it is fair to rely on du being present on the host system for this feature? There do not seem to be reliable pure-java APIs for this.

normanrz · 2022-11-14T17:08:34Z

I guess du would be ok, because we typically deploy in a Docker container.
However, I think that a async/parallelized file walk in java might be even faster.

fm3 · 2022-11-21T15:08:47Z

A few more questions have come up

You mention storing the storage use on a per-mag basis. What do we want to do with this fine-grained information? For organization storage use it seems that a by-dataset (or even aggregated by datastore) would be sufficient? Certainly easier to manage
Do we want to count the contents of the .uploading, .forConversion, .converting (i.e. worker working dir) and .trash directories?

normanrz · 2022-11-21T15:15:43Z

You mention storing the storage use on a per-mag basis. What do we want to do with this fine-grained information? For organization storage use it seems that a by-dataset (or even aggregated by datastore) would be sufficient? Certainly easier to manage

I don't see why it is harder to manage on a per-mag basis. Aggregating to by-dataset or by-datastore is just a simple SQL query.

Do we want to count the contents of the .uploading, .forConversion, .converting (i.e. worker working dir) and .trash directories?

Temp data should not count against the storage quota and does not need to be counted.

normanrz added enhancement backend labels Nov 7, 2022

hotzenklotz mentioned this issue Nov 11, 2022

Add Pricing Plans to Organization Page #6602

Merged

8 tasks

fm3 mentioned this issue Dec 7, 2022

Measure used disk storage #6685

Merged

4 tasks

fm3 self-assigned this Dec 9, 2022

fm3 closed this as completed in #6685 Jan 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Calculate storage use #6614

Calculate storage use #6614

normanrz commented Nov 7, 2022

fm3 commented Nov 14, 2022

normanrz commented Nov 14, 2022

fm3 commented Nov 21, 2022 •

edited

Loading

normanrz commented Nov 21, 2022

Calculate storage use #6614

Calculate storage use #6614

Comments

normanrz commented Nov 7, 2022

Detailed Description

fm3 commented Nov 14, 2022

normanrz commented Nov 14, 2022

fm3 commented Nov 21, 2022 • edited Loading

normanrz commented Nov 21, 2022

fm3 commented Nov 21, 2022 •

edited

Loading