doi, "start" parameter using the Search API #4377

cscn · 2017-12-11T19:07:25Z

I'm attempting to use the dataverse search API to find the doi of all R files in a dataverse server, and I've run into a couple of interesting behaviors. I'm using ".R" as my query and "file" as the type parameter.

The API response to a get request at https://dataverse.harvard.edu/api/search/ (with search parameters) returns a response containing a "total_count" field within the "data" field. However, specifying a "start" parameter in the search beyond the expected number of pages continue to yield 10 results. Therefore, how would I determine the appropriate number of pages to get data for?
The search API doesn't return a special DOI field for each result. Instead, the DOI is part of a string result that is returned in the "dataset_citation" field, e.g.,
''Marquez, Javier, 2014, "MORENA (Parte 2): El efecto de spoiler", doi:10.7910/DVN/27462, Harvard Dataverse, V1'
I've been working around this by parsing out the DOI using a regex, but it would be super nice to have a it as a dedicated field.

Also would love suggestions on how to scrape all R-file-containing datasets from dataverse. Currently my process is to find the DOI's of all datasets containing .R files, find the file_ids of all files in each dataset, then download files by file_id using the data access API. I originally posted this in the R client repo IQSS/dataverse-client-r#19 (comment). Apologies for the pseudo-repost.

The text was updated successfully, but these errors were encountered:

pdurbin · 2017-12-11T20:24:28Z

@cscn thanks for opening this issue. Can you please see if https://dataverse.harvard.edu/api/search?q=fileContentType%3Atype%2Fx-r-syntax helps you find the R files? I mentioned something similar in #3597

I haven't yet looked into the bug you're reporting about expected number of pages.

cscn · 2017-12-13T04:27:59Z

Thanks so much! This is exactly what I was looking for. It turns out the number of pages wasn't a bug at all, but rather a misinterpretation of the start parameter. The search API documentation states that start parameter is

A cursor for paging through search results. See iteration example.

This led me to believe that the parameter was essentially a page number, when in fact it refers to the result number at which to return a page of results. The example in the documentation demonstrates this, but it may be helpful to clarify the definition.

pdurbin · 2017-12-13T11:48:38Z

@cscn I'm glad you're immediate problem is solved. I just flagged this issue to be about improving the API Guide. Are you interested in trying to come up with a different phrasing? The file to edit is https://github.com/IQSS/dataverse/blob/develop/doc/sphinx-guides/source/api/search.rst and I'd be happy to review a pull request. No pressure. 😄

pdurbin added Feature: API Guide User Role: API User Makes use of APIs Help Wanted: Documentation Mentor: pdurbin labels Dec 13, 2017

pdurbin mentioned this issue Jun 28, 2018

Unclear how to find more granularity of files beyond "File Type" (application, tabulardata, data, etc.) #3597

Closed

pdurbin mentioned this issue Aug 21, 2019

add beginner friendly API documentation #6086 #6107

Merged

cscn closed this as completed May 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

doi, "start" parameter using the Search API #4377

doi, "start" parameter using the Search API #4377

cscn commented Dec 11, 2017

pdurbin commented Dec 11, 2017

cscn commented Dec 13, 2017 •

edited

Loading

pdurbin commented Dec 13, 2017

doi, "start" parameter using the Search API #4377

doi, "start" parameter using the Search API #4377

Comments

cscn commented Dec 11, 2017

pdurbin commented Dec 11, 2017

cscn commented Dec 13, 2017 • edited Loading

pdurbin commented Dec 13, 2017

cscn commented Dec 13, 2017 •

edited

Loading