-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue with date_from and date_until #8
Comments
Having the same issue, only 10% of the returned papers were within the requested date-range |
@vgoel38 and @thecheeseontoast : Thank you for raising the issue. The scraper returns two date columns for each record:
If
If it would be something useful, I can slightly modify the behavior to use |
I notice that even some dates in the "updated" section are out of the range |
@ChakreshIITGN That's right. The edit doesn't have to be done by the authors. When ArXiv runs a bulk job, it modifies the datastamps.
I am not sure what is the best way to proceed but I'm considering various options. |
Hey. Great tool guys!. I found a bug with the Bug Reproduction : import arxivscraper
scraper = arxivscraper.Scraper(category='cs', date_from='2020-06-25',date_until='2020-06-27')
output = scraper.scrape() |
@valayDave : Did you use |
I installed with pip not from the source. |
@valayDave Sorry |
@valayDave |
@Mahdisadjadi One way to get around this which I thought of was: The |
I copied the following url from the output of the program. The url looks for records between dates 2019-01-01 and 2019-05-10.
URL: http://export.arxiv.org/oai2?verb=ListRecords&from=2019-01-01&until=2019-05-10&metadataPrefix=arXiv&set=cs
But lot of records I got lie outside this date range (e.g. the first record which is from year 2007)
Am I missing something? I am not sure if the issue is with the code or with the arxiv api.
The text was updated successfully, but these errors were encountered: