Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hydration Logging / Exception Handling #327

Open
edsu opened this issue May 8, 2020 · 0 comments
Open

Hydration Logging / Exception Handling #327

edsu opened this issue May 8, 2020 · 0 comments

Comments

@edsu
Copy link
Member

edsu commented May 8, 2020

I received an email from a researcher who had to split a large tweet id dataset into multiple chunks, and ended up with a files that looked like:

"1" 1123436821189419008
"2" 1123436818471489536
"3" 1123436801736134656
"4" 1123436800796712960
"5" 1123436798468816896

Using twarc hydrate ran for a long time (there were a lot of files) and generated no output because twarc expects tweet id files to just contain, well, tweet ids. The problem is that twarc doesn't really catch that the line doesn't contain an ID and throws it at Twitter's API anyway. This results in no error message in the log other than messages like this:

2020-05-05 02:58:07,685 INFO loading None profile from config /rigel/home/inh2102/.twarc
2020-05-05 02:58:07,689 INFO creating http session
2020-05-05 02:58:07,690 INFO getting ('https://api.twitter.com/1.1/account/verify_credentials.json',) {'params': {'tweet_mode': 'extended'}}
2020-05-05 02:58:08,120 INFO hydrating 100 ids
2020-05-05 02:58:08,120 INFO posting ('https://api.twitter.com/1.1/statuses/lookup.json',) {'data': {'id': '"1123436826570760192","1"\t1123436821189419008,"2"\t1123436818471489536,"3"\t1123436801736134656,"4"\t1123436800796712960,"5"\t1123436798468816896,"6"\t1123436791913242624,"7"\t1123436791795728384,
---
(etc.)

Ideally I think twarc should:

  1. inspect the line and if it doesn't appear to contain a tweet id report it to the log and move on
  2. never throw what look like not IDs at the Twitter API for hydration
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant