Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

always 100 unique ids despite the size of returned comments #58

Open
chaee opened this issue Jan 25, 2023 · 2 comments
Open

always 100 unique ids despite the size of returned comments #58

chaee opened this issue Jan 25, 2023 · 2 comments

Comments

@chaee
Copy link

chaee commented Jan 25, 2023

Hi! I am getting comments from the subreddit using before and after dates, but I found out that the number of unique items per day is always 100. The number of total result varies and seems right, but there are a lot of duplicates. The unique items are always 100 which is also the limit from reddit API, so I wonder if there's any connection here. Do I need to specify something in the query additionally? I tried adding size or limit but didn't seem to solve this problem (other than returning zero result when the limit is too big as others pointed out) Below is how I am sending the query now:

from pmaw import PushshiftAPI
api = PushshiftAPI()
api_request_generator = list(api.search_comments(subreddit='The_Donald',
                                                            before=calendar.timegm(until_date.timetuple()),
                                                            after=calendar.timegm(since_date.timetuple()),
                                                            safe_exit=True,
                                                            size=500,
                                                            mem_safe=True,
                                                            until=calendar.timegm(until_date.timetuple())
                                                         )
@SoonBanned
Copy link

Did you find any way to bypass this ? I have the same problem with submissions, I got 19k of submission but only 200 unique repeated in loop

@SoonBanned
Copy link

Oh I got it to work ! I checked this issue #57 and replace before and after by their new names and that did the trick :D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants