CrowdTruth Evaluation Dataset

Corpus of crowdsourced annotations together with trusted judgments for 4 crowdsourcing tasks:

The dataset was used to evaluate the CrowdTruth crowdsourcing aggregation metrics. Details are available in the paper:

Anca Dumitrache, Oana Inel, Benjamin Timmermans, Carlos Ortiz, Robert-Jan Sips and Lora Aroyo: Empirical Methodology for Crowdsourcing Ground Truth. Semantic Web Journal 2017 (in review).

For each of the 4 tasks, 2 files are given:

*_raw.csv - Contains the judgments of individual workers for each of the tasks.
*_aggregated.csv - Contains the CrowdTruth aggregation of the judgments for each unit in the task expressed as the media unit - annotation score, as well as the trusted judgment.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
README.md		README.md
events_aggregated.csv		events_aggregated.csv
events_raw.csv		events_raw.csv
medical_aggregated.csv		medical_aggregated.csv
medical_raw.csv		medical_raw.csv
sounds_aggregated.csv		sounds_aggregated.csv
sounds_raw.csv		sounds_raw.csv
sounds_units.csv		sounds_units.csv
tweets_aggregated.csv		tweets_aggregated.csv
tweets_raw.csv		tweets_raw.csv

Provide feedback