Skip to content

Releases: DocNow/twarc

v2.4.1

11 Aug 23:17
e602393
Compare
Choose a tag to compare

This release includes support for requesting the new alt_text field for media from Twitter's v2 API:

https://twittercommunity.com/t/media-alt-text-field-now-available-in-twitter-api-v2/157939

v2.4.0

05 Aug 20:17
bd34758
Compare
Choose a tag to compare

This release includes a new dehydrate command for turning tweets into tweet id datasets.

twarc2 dehydrate tweets.jsonl > ids.txt

It also includes improvements to progress bar and user lookup behavior.

v2.3.12

30 Jul 19:13
93aa8c2
Compare
Choose a tag to compare

A bugfix release so that start-time is not inferred when searching with --archive and also using --until-id.

v2.3.11

30 Jul 17:37
31d5ecb
Compare
Choose a tag to compare

This release includes:

  • improved handling of user ids and invalid usernames when reading data using the timelines command
  • progress bar display when using the counts command
  • better date handling when using the --archive option with counts

v2.3.10

11 Jul 15:37
4febda2
Compare
Choose a tag to compare

This is another attempt at handling exceptions during streaming in a more straightforward way without using decorators2.catch_request_exceptions #505.

v2.3.9

09 Jul 16:01
19e7741
Compare
Choose a tag to compare

This bugfix release adds some additional handling of exceptions for accessing streaming endpoints when running twarc2 stream and twarc2 sample. See #505 for the details.

v2.3.8

09 Jul 14:32
1ad9a96
Compare
Choose a tag to compare

This bugfix release reuses twarc.decorators2.catch_request_exceptions in the
context of streaming responses (twarc2 sample and twarc2 stream) that use the response.iter_lines method. Hopefully this will address #505 but it will require testing by people who continue seeing
the error in the wild.

v2.3.7

06 Jul 00:53
de19dba
Compare
Choose a tag to compare

v2.3.7 includes new functionality that adds progress bars for twarc2 commands like search, hydrate, timeline and more. These visual indications of how much data has been collected and how much there is to go are extremely useful in data collection jobs. Progress bars display by default when you instruct twarc to write output to a file (#490).

twarc-progressbar

Additionally there is new code to catch 503 Twitter API errors that have recently been occurring much more regularly (#499). Apparently a big reason for these errors was the load that requesting 500 tweets from the search/all endpoint while also asking for context annotations. Twitter recently announced they were no longer making context annotations available for requests asking for more than 100 tweets. Since it's one of twarc's design principles to maximize the representation of tweets the search command has been adjusted to default to 100 now instead of 500, at least for the time being (#504).

v2.3.6

02 Jul 15:31
9928dd3
Compare
Choose a tag to compare

A bugfix release for Twarc2.stream so that it reconnects after being instructed to disconnect, and then continues to fetch data from the stream.

v2.3.5

02 Jul 15:30
e9fa3ca
Compare
Choose a tag to compare

Disable running stream unit test under GitHub Actions since it returns a 400 error.