You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm attempting to use the dataverse search API to find the doi of all R files in a dataverse server, and I've run into a couple of interesting behaviors. I'm using ".R" as my query and "file" as the type parameter.
The API response to a get request at https://dataverse.harvard.edu/api/search/ (with search parameters) returns a response containing a "total_count" field within the "data" field. However, specifying a "start" parameter in the search beyond the expected number of pages continue to yield 10 results. Therefore, how would I determine the appropriate number of pages to get data for?
The search API doesn't return a special DOI field for each result. Instead, the DOI is part of a string result that is returned in the "dataset_citation" field, e.g., ''Marquez, Javier, 2014, "MORENA (Parte 2): El efecto de spoiler", doi:10.7910/DVN/27462, Harvard Dataverse, V1'
I've been working around this by parsing out the DOI using a regex, but it would be super nice to have a it as a dedicated field.
Also would love suggestions on how to scrape all R-file-containing datasets from dataverse. Currently my process is to find the DOI's of all datasets containing .R files, find the file_ids of all files in each dataset, then download files by file_id using the data access API. I originally posted this in the R client repo IQSS/dataverse-client-r#19 (comment). Apologies for the pseudo-repost.
The text was updated successfully, but these errors were encountered:
Thanks so much! This is exactly what I was looking for. It turns out the number of pages wasn't a bug at all, but rather a misinterpretation of the start parameter. The search API documentation states that start parameter is
A cursor for paging through search results. See iteration example.
This led me to believe that the parameter was essentially a page number, when in fact it refers to the result number at which to return a page of results. The example in the documentation demonstrates this, but it may be helpful to clarify the definition.
I'm attempting to use the dataverse search API to find the doi of all R files in a dataverse server, and I've run into a couple of interesting behaviors. I'm using ".R" as my query and "file" as the type parameter.
''Marquez, Javier, 2014, "MORENA (Parte 2): El efecto de spoiler", doi:10.7910/DVN/27462, Harvard Dataverse, V1'
I've been working around this by parsing out the DOI using a regex, but it would be super nice to have a it as a dedicated field.
Also would love suggestions on how to scrape all R-file-containing datasets from dataverse. Currently my process is to find the DOI's of all datasets containing .R files, find the file_ids of all files in each dataset, then download files by file_id using the data access API. I originally posted this in the R client repo IQSS/dataverse-client-r#19 (comment). Apologies for the pseudo-repost.
The text was updated successfully, but these errors were encountered: