Large amount of queries when getting a dataset with API #9683

ErykKul · 2023-06-28T08:31:05Z

What steps does it take to reproduce the issue?
Retrieve a large dataset (containing thousands of files) through the API

When does this issue occur?
It is the same for all datasets, but is more problematic for larger datasets.
Which page(s) does it occur on?
API calls
What happens?
When we check the query log, we see multiple queries for each file of the dataset.
To whom does it occur (all users, curators, superusers)?
All users.
What did you expect to happen?
One larger query that retrieves all the necessary data at once.

Which version of Dataverse are you using?
Develop.

Any related open or closed issues to this bug report?

Performance issues when uploading files to a dataset containing thousands of files #9557

A better solution would be to have an API that retrieves only the dataset and a separate API calls for retrieving the file metadata in a paginated way, as proposed by @Kris-LIBIS. However, many existing applications already use the retrieve dataset API call, including all file metadata in one call, therefore, making it more efficient should be beneficial too.

ErykKul · 2023-09-01T08:00:45Z

An API call that retrieves only the dataset-metadata, without the files-metadata, would be a nice improvement for our tools using the API. In most cases, we are only interested in the dataset-metadata, not the files. This makes the current solution too heavy for large datasets with many files, when we even skip some datasets due to the one-minute timeout.

jggautier · 2023-09-01T14:46:31Z

#9763 is related, right? The solution being described there seems very relevant. Sorry if you're already aware of this and I'm just adding noise. I'm pretty interested in these improvements, too :)

ErykKul · 2023-09-01T15:04:14Z

I was not aware of that issue, but it is related. I did reopen the PR and did the merge, it might be worth considering/testing it. It should improve the response times for datasets with many files using the regular retrieve metadata call for dataset.

ErykKul mentioned this issue Jun 28, 2023

Faster combined query for retrieving datasets via API #9684

Merged

ErykKul mentioned this issue Sep 1, 2023

Performance: Slow response for the versions API call with large number of files or versions #9763

Closed

pdurbin added Feature: API Type: Bug a defect Feature: Performance & Stability User Role: API User Makes use of APIs labels Oct 12, 2023

qqmyers mentioned this issue Apr 10, 2024

Deliverable: Slow response for datasets with high number of files IQSS/dataverse-pm#29

Closed

cmbz added this to IQSS Dataverse Project Apr 10, 2024

sekmiller closed this as completed in #9684 May 24, 2024

pdurbin added this to the 6.3 milestone May 28, 2024

cmbz removed this from IQSS Dataverse Project Jul 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Large amount of queries when getting a dataset with API #9683

Large amount of queries when getting a dataset with API #9683

ErykKul commented Jun 28, 2023 •

edited by pdurbin

Loading

ErykKul commented Sep 1, 2023

jggautier commented Sep 1, 2023

ErykKul commented Sep 1, 2023

Large amount of queries when getting a dataset with API #9683

Large amount of queries when getting a dataset with API #9683

Comments

ErykKul commented Jun 28, 2023 • edited by pdurbin Loading

ErykKul commented Sep 1, 2023

jggautier commented Sep 1, 2023

ErykKul commented Sep 1, 2023

ErykKul commented Jun 28, 2023 •

edited by pdurbin

Loading