Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rate limit hits when using searches / conversations / file based input #575

Closed
igorbrigadir opened this issue Dec 3, 2021 · 7 comments
Closed
Labels

Comments

@igorbrigadir
Copy link
Contributor

When a loop like

for query in infile:
runs, I think it's possible there's a <1 second between calls when it loops over and processes users / searches that causes a rate limit hit and a 15 min wait, maybe to be on the safe side an extra wait is needed here

Example log where this is happening: https://twittercommunity.com/t/inconsistent-rate-limit-academic-research-full-archive-search/162928/14?u=igorbrigadir

@SamHames
Copy link
Contributor

SamHames commented Dec 3, 2021

Oh yeah, good point, that does need an extra sleep.

@SamHames
Copy link
Contributor

SamHames commented Dec 3, 2021

Double checking, the client should add the sleep even to the final page for the archive search (apparently I thought ahead?): https://github.com/DocNow/twarc/blob/main/twarc/client2.py#L245

I can take a belt and suspenders approach, but that 901 seconds is probably coming from this decorator - have they missed a warning?: https://github.com/DocNow/twarc/blob/main/twarc/decorators2.py#L42

@igorbrigadir
Copy link
Contributor Author

Yeah the search method and client2.py is fine, that works - the error appears when we're reading a text file and looping over users - because it's a fresh call to the api for each user, the client can potentially make 2 calls within 1 second - eg:

2021-12-03 01:12:53,794 WARNING rate limit exceeded: sleeping 901 secs
2021-12-03 01:27:54,894 INFO getting (‘https://api.twitter.com/2/tweets/search/all’,) {‘params’: {‘expansions’: ‘author_id,in_reply_to_user_id,referenced_tweets.id,referenced_tweets.id.author_id,attachments.poll_ids,attachments.media_keys,geo.place_id’, ‘tweet.fields’: ‘id,conversation_id,author_id,in_reply_to_user_id,referenced_tweets,geo’, ‘user.fields’: ‘id,username,name,pinned_tweet_id’, ‘media.fields’: ‘media_key’, ‘poll.fields’: ‘id’, ‘place.fields’: ‘id’, ‘start_time’: ‘2015-01-01T00:00:00+00:00’, ‘end_time’: ‘2015-07-01T00:00:00+00:00’, ‘query’: ‘from:AnnieRojas_ -is:retweet’, ‘max_results’: 100}}
2021-12-03 01:27:55,502 INFO getting (‘https://api.twitter.com/2/tweets/search/all’,) {‘params’: {‘expansions’: ‘author_id,in_reply_to_user_id,referenced_tweets.id,referenced_tweets.id.author_id,attachments.poll_ids,attachments.media_keys,geo.place_id’, ‘tweet.fields’: ‘id,conversation_id,author_id,in_reply_to_user_id,referenced_tweets,geo’, ‘user.fields’: ‘id,username,name,pinned_tweet_id’, ‘media.fields’: ‘media_key’, ‘poll.fields’: ‘id’, ‘place.fields’: ‘id’, ‘start_time’: ‘2015-01-01T00:00:00+00:00’, ‘end_time’: ‘2015-07-01T00:00:00+00:00’, ‘query’: ‘from:sirdickel -is:retweet’, ‘max_results’: 100}}
2021-12-03 01:27:55,553 WARNING rate limit exceeded: sleeping 901 secs

here after a full 15 min rate limit sleep, it makes 1 call processing one line, gets no results, then <1 second later processes the next line and hits the limit again, so an extra sleep(1) in the for loop that processes the input file should avoid this without slowing things down significantly

@SamHames
Copy link
Contributor

SamHames commented Dec 3, 2021

Oh, empty pages of results, of course!

@SamHames
Copy link
Contributor

SamHames commented Dec 3, 2021

No wait, I don't think it's empty results at all - there's always at least one item for this iterator.

If those results were empty, they should also be seeing the log on line 243 as well explicitly about an empty page.

I do think the error is in the library though.

@SamHames
Copy link
Contributor

SamHames commented Dec 4, 2021

Okay yeah, reviewing the original thread again - I think this is because the various command line limit commands don't fully consume the generator of search results, and therefore skip the necessary sleeps. I put in a PR for that.

At some point, it's probably worth taking another look at a decorator for that particular case/checking the headers that Twitter returns for that particular rate limit case.

@igorbrigadir
Copy link
Contributor Author

Fixed in #578

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants