by Dylan Bourgeois, Jérémie Rappaz, Karl Aberer
Accepted for oral presentation at the Alternate Track on Journalism, Misinformation, and Fact-checking at The Web Conference 2018.
News entities have to select and filter the coverage they broadcast through their respective channels, since the set of world events is too large to be treated exhaustively. The subjective nature of this filtering induces biases due to, among other things, resource constraints, editorial guidelines, ideological affinities, or even the fragmented nature of the information at a journalist's disposal. The magnitude and direction of this bias are, however, widely unknown. The absence of ground truth, the sheer size of the event space, or the lack of an exhaustive set of absolute features to measure makes it difficult to observe the bias directly, to characterize the leaning's nature and to factor it out to ensure a neutral coverage of the news.
In this work, we introduce a methodology to capture the latent structure of media's decision process at a large scale. Our contribution is multi-fold. First, we show media coverage to be predictable using personalization techniques, and evaluate our approach on a large set of events collected from the GDELT database. We then show that a personalized and parametrized approach not only exhibits higher accuracy in coverage prediction, but also provides an interpretable representation of the selection bias. Last, we propose a method able to select a set of sources by leveraging the latent representation. These selected sources provide a more diverse and egalitarian coverage, all while retaining the most actively covered events.
You can view the results here, where you will also find the paper.