-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deliverable: Slow response for datasets with high number of files #29
Comments
Sizing:
Note: There's an added step needed:
At that point this deliverable can then be treated like an iceberg and decomposed as a plan takes shape.
Next Steps:
|
IQSS/dataverse#9683 is another related issue. |
2024/04/10
|
This is an umbrella to be used for all the issues we are seeing related to the title.
What follows is a rough cut of what's been discussed thus far that would need to be done to complete this deliverable.
As a "bklog: Deliverable" This is decomposed into smaller issues.
We're talking about this issue in tech hours. Here are some pain points for users:
Other discussion:
@linsherpa thanks for chatting and opening this issue and the ticket.
I'll note that out of the box Dataverse only allows you unzip 1000 files at a time from a zip file: https://guides.dataverse.org/en/5.11.1/installation/config.html#multipleuploadfileslimit ... That's the most official statement I could find about how many files are supported in a single dataset... not a very strong one.
As you mentioned, the practical workaround is probably to double zip the files.
For developers I'll mention that
scripts/search/data/binary/1000files.zip
has 1000 small files we can test with.Finally, here are some open issues related to large numbers of files in a dataset:
The text was updated successfully, but these errors were encountered: