GSV Training Data Replication Repository README

This document summarizes the included datasets (uploaded to Stanford Digital Repository) and code (located in this repository) to replicate and reproduce tables in Hwang et al.'s (2023) paper "Curating Training Data for Reliable Large-Scale Visual Data Analysis: Lessons from Identifying Trash in Street View Imagery". These materials are intended for users to adapt a similar pipeline to other outcomes or data from other places or utilize our data for other research.

Data

Data are stored in Stanford Digital Repository here.

single_image.csv: This dataset includes all of the images that have single image ratings from Mechanical Turk or coding sessions and all images with Trueskill ratings or ML predictions. Each row corresponds to an image, with columns indicating the answer provided in each of these surveys and/or the relevant Trueskill rating and ML prediction.

pairs.csv: This dataset includes all of the images that were rated as pairs in Mechanical Turk or coding sessions. Each row corresponds to a pair of images.

For both datasets, the accompanying codebook (SMR_Data_Codebook.xlsx) details each column’s values and their meanings.

Note: image names do not indicate the location, but users can request the associated location from the authors.

Scripts

results-input.Rmd uses single_image.csv and pairs.csv to generate all of the tables and figures in the paper.

For similar datasets but other input data, results-anydata.Rmd generates generic reliability measures.

PCA results and cosine similarity for images with discrepancies, as described in the paper, are generated using pca_image_feature_analysis.ipynb.

Class Activation Maps (CAMs) are generated using cams.py.

Discrepancies input for PCA/cosine similarity/CAMs can be generated by user from single_image.csv.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GSV Training Data Replication Repository README

Data

Scripts

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
README.md		README.md
cams.py		cams.py
pca_image_feature_analysis.ipynb		pca_image_feature_analysis.ipynb
results-anydata.Rmd		results-anydata.Rmd
results-input.Rmd		results-input.Rmd

Changing-Cities-Research-Lab/gsv-training-replication-repo

Folders and files

Latest commit

History

Repository files navigation

GSV Training Data Replication Repository README

Data

Scripts

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages