Skip to content

Releases: PlatformGovernanceArchive/pga-corpus

PGA v1 Dataset (2005-2021)

11 Jul 13:08
3cca8a0
Compare
Choose a tag to compare

This is the final PGA v1 Dataset (2005-2021) release.

Data has been collected retrospectively by a combination of automated and manual approaches., building on Internet Archive’s [Wayback Machine](https://web.archive.org/). It includes policies by four major platforms ranging back to their founding years.

Platforms: Facebook, Instagram, Twitter, YouTube

Time Frame: 2005-2021

Project Website: http://platformgovernancearchive.uni-bremen.de/

Using the Data

We are more than happy if you want to use our dataset in your research, reporting, and explorations. If you do:

  1. Consult the respective data documentation;
  2. reference this project and the actual dataset;
  3. send us a note so that we include you in our research and output page.

PGA v1 is made available under the [Open Data Commons Attribution License](http://opendatacommons.org/licenses/by/1.0/) (that means what we say above: use it, but reference us).

Cite the Dataset

Katzenbach, C., Kopps, A., Magalhaes, J. C., Redeker. D., Sühr, T. (2023). Platform Governance Archive (PGA) v1. [data set]. DOI: 10.17605/OSF.IO/XSBPT. URL: http://platformgovernancearchive.uni-bremen.de/data/dataset-pga-v1-historical-dataset/.

Cite a Single Document (recommended)

Name of platform. (Date of version). Name of policy. Platform Governance Archive. Direct URL.

Data Paper

The full documentation is available as a data paper:

Katzenbach, C., Kopps, A., Magalhaes, J. C., Redeker. D., Sühr, T., Wunderlich, L. (2023). The Platform Governance Archive v1 – A longitudinal dataset to study the governance of communication and interactions by platforms and the historical evolution of platform policies. Centre for Media, Communication and Information Research (ZeMKI), University of Bremen. https://doi.org/10.26092/elib/2331.

The dataset on Github

The Github repositories of the Platform Governance Archive provide detailed access to the PGA corpus as well as to research data and instruments that we used in the process, including the URL list, scripts and the datasets before the data cleaning processes.

PGA v1 consists of two repositories:

  • /pga-corpus (the dataset in this release, the final corpus of all identified policy versions)
  • /pga-workbench (additional data: providing tools, and data that we used in the research process)

Please consult the readme-file or the data paper for full documentation. The more recent datasets (2022-…) are available in the OTA/PGA repository.