Using twarc to pull in the dataset #3

igorbrigadir · 2021-10-28T18:55:26Z

WIP

igorbrigadir · 2021-10-28T19:16:25Z

Created rt_queries.txt from

csvcut -c screen_name congressmember_data.csv > screen_names.txt

First, getting counts of all the data with:

twarc2 searches --archive --start-time "2021-01-01" --end-time "2021-10-23" --counts-only --combine-queries --granularity "day" rt_queries.txt rt_queries_counts.csv

Then checked the total tweets with:

import pandas as pd
df =pd.read_csv("rt_queries_counts.csv")
df["day_count"].sum()

And it was 41563787 which would take too long to download all of them with the Monthly tweet cap restrictions. Unfortunate.

However, i'd like to use this as a motivating example for DocNow/twarc#566 later, so i'll update the PR here with the new results when i get them!

igorbrigadir · 2021-10-28T19:24:01Z

rt_queries_counts.csv are counts of combined search queries but maybe it will be more useful to see RT counts of each individual, so i'll add individual_day_counts.csv shortly

igorbrigadir added 3 commits October 28, 2021 19:52

add twarc requirements

87a48c7

add queries for twarc

80928c7

add counts of tweets

ce03f7d

add daily counts of retweets for each individual

13378a0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using twarc to pull in the dataset #3

Using twarc to pull in the dataset #3

igorbrigadir commented Oct 28, 2021

igorbrigadir commented Oct 28, 2021 •

edited

Loading

igorbrigadir commented Oct 28, 2021

Using twarc to pull in the dataset #3

Are you sure you want to change the base?

Using twarc to pull in the dataset #3

Conversation

igorbrigadir commented Oct 28, 2021

igorbrigadir commented Oct 28, 2021 • edited Loading

igorbrigadir commented Oct 28, 2021

igorbrigadir commented Oct 28, 2021 •

edited

Loading