Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get_publication 404 errors #108

Open
gnk02 opened this issue Feb 2, 2022 · 1 comment
Open

get_publication 404 errors #108

gnk02 opened this issue Feb 2, 2022 · 1 comment

Comments

@gnk02
Copy link

gnk02 commented Feb 2, 2022

I have a long data frame of authors with their scholar ids. Running get_publications sequentially on this list produces random 404 warnings, and NA is returned. For example:

get_publications("ri2FkCgAAAAJ")
[1] NA
Warning message:
In get_scholar_resp(url) :
Page 404. Please check whether the provided URL is correct.

It is strange that this error is random. For example, if i have a table with about 1000 authors, the first error may appear for the author in line 150, and all previous authors are processed without problems. After some time, i may run the code again and the problem may appear for some other author.
I also tried introducing random wait times between each search, but the problem persists.
Any ideas?

What i basically want to do is derive the number of publications of these authors in a certain time period (in years) and the number of cites to these publications.

@gnk02
Copy link
Author

gnk02 commented Feb 2, 2022

I also attach a csv file with the list of authors.
author_table_scholar.csv

The code i use to get the number of publications and the number of citations is the following:

function that gets the scholar id of an author and a range in years and

returns the number of citations of all publications in that range

citations_in_years <- function (id, start_year, end_year) {
pubs_in_range <- get_publications(id) %>% filter(!is.na(year) & year>=start_year & year<=end_year)
sum(pubs_in_range$cites)
}

#returns the number of papers in a given time range
papers_in_years <- function (id, start_year, end_year) {
pubs_in_range <- get_publications(id) %>% filter(!is.na(year) & year>=start_year & year<=end_year)
nrow(pubs_in_range)
}

#add them to author_table
author_table <- author_table %>% rowwise() %>%
mutate(num_of_papers_2016_2020=papers_in_years(scholar_id,2016,2020),
cites_of_papers_2016_2020=citations_in_years(scholar_id,2016,2020))

Unfortunately I'm calling get_publications twice but i couldn't do it another way with mutate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant