Tweet Predictability

Code repository of "Limits to Predicting Online Speech Using Large Language Models". We study the predictability of online speech on Twitter. The significance of studying predictability is far-reaching; it helps us frame questions such as social influence, information diffusion and predicting sensitive author information.

Using 6.25M tweets from >5000 users as the base of our study and with language models of up to 70B parameters, we find that users' own history is most predictive of their future posts. We contrast this with posts from their social circle, and find that they consistently contain less predictive information. This result replicates across models and experimental methods (in-context learning as well as finetuning).

We additionally find that the extent to which we can predict online speech is limited even with state-of-the-art language models. Our observations do not suggest that peers exert an outsize influence on an individual's online posts. Concerns that large language models have made our individual expression predictable are not supported by our findings.

Data collection

Under data_collection/.

Prompting experiments

Under evaluation/.

Finetuning experiments

Under finetune/.

Name		Name	Last commit message	Last commit date
Latest commit History 182 Commits
data_collection		data_collection
evaluation		evaluation
finetune		finetune
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tweet Predictability

Data collection

Prompting experiments

Finetuning experiments

About

Releases

Packages

Contributors 2

Languages

License

socialfoundations/twitter-predictability

Folders and files

Latest commit

History

Repository files navigation

Tweet Predictability

Data collection

Prompting experiments

Finetuning experiments

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages