Corpus of crowdsourced annotations together with trusted judgments for 4 crowdsourcing tasks:
- medical relation extraction (files
medical_*
) - Twitter event extraction (files
tweets_*
) - news event identification (files
events_*
) - sound interpretation (files
sounds_*
)
The dataset was used to evaluate the CrowdTruth crowdsourcing aggregation metrics. Details are available in the paper:
- Anca Dumitrache, Oana Inel, Benjamin Timmermans, Carlos Ortiz, Robert-Jan Sips and Lora Aroyo: Empirical Methodology for Crowdsourcing Ground Truth. Semantic Web Journal 2017 (in review).
For each of the 4 tasks, 2 files are given:
-
*_raw.csv
- Contains the judgments of individual workers for each of the tasks. -
*_aggregated.csv
- Contains the CrowdTruth aggregation of the judgments for each unit in the task expressed as the media unit - annotation score, as well as the trusted judgment.