HITMaker

Amazon Mechanical Turk back-end system written in python for managing HITs of the task of annotating graphical elements (Figures, Tables and Captions) in scientific publications.

It is meant to be hosted internally and used by the task's creators. It can even be used by anyone to review submissions and annotate pages without submitting anything to AMT, however multiple people should not use it at the same time (check TODO at the bottom). It uses Sci-Annot as the front-end for reviewing submissions, and sci-annot-eval for automated comparison of annotation sets.

The tool is unfortunately not very generic, but it should be adaptable to similar annotation tasks.

DISCLAIMER: Since this was a one-man project used across a whole year, you will have to dig into the code to figure out things work. This README will only give you the basic overview of what the tool is capable of, and some pointers on how to set it up.

Screenshots

The dashboard shows you the status of all ingested pages, and the distribution of qualification points of all workers. This is useful for tweaking the qualification point threshold.

Review mode allows you to compare, edit, and accept/reject worker submissions quickly.

Prerequisites

python3.9
pip
pipenv
MongoDB v6.0.6

To install the dependencies, run 'pipenv install'

First time setup

You will need to create the following collections in MongoDB manually:

hit_types
pages
pdfs
qual_requirements
workers

Every time you want to start work in a new AMT environment (there are basically only two), you have to follow these steps:

Create at least one .env file (check .env.example)
Create the main HIT type: python3 manage_HITs.py --env ENVFILE create-hit-type -a
Ingest PDFs from parquet file: python3 manage_HITs.py -vv -e ENVFILE ingest DOWNLOADED_PDFS_PARQ
Create the qualification types: python3 manage_HITs.py -vv -e ENVFILE create-qual-types

Usage

To show the usage of this system, run

python3 manage_HITs.py --help

Please check the individual options for each command.

Workflow

Usually, you would first post qualification pages (one worker per page) in order to build a pool of qualified workers which will work on the bulk of the HITs. If you accept a submission, the worker gets +1 qual. point, and -.env.rejected_assignment_penalty if their submission is rejected. You can also edit submissions and save your own version, in which case qual. points will neither be rewarded nor taken away. Changing your mind is always possible after making a decision, however monetary rewards cannot be taken back once given.

A typical workflow would look like this (qualification stage ignored):

Publish some HITs: python3 manage_HITs.py -v -e ENVFILE publish-random 50 -c "Test"
Fetch the results after some time: python3 manage_HITs.py -v -e ENVFILE fetch-results
Automatically evaluate fetched results if possible: python3 manage_HITs.py -v -e ENVFILE eval-retrieved

URL parameters

These parameters change the behavior of the front-end:

crop_whitespace - Boolean param, shows a cropped version of the bounding boxes, but doesn't actually change the underlying data in the DB.
page_status - When an action on a page is performed, selects the status of the next page to be loaded. For example page_status=DEFERRED.

TODO

Write about the security implications of using XML parsers, and how MTurk requesters' servers could be prime targets. They have money after all
Implement race condition handling in the API

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
README_assets		README_assets
enums		enums
web		web
.env.example		.env.example
.gitignore		.gitignore
LICENSE.md		LICENSE.md
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
__init__.py		__init__.py
config.py		config.py
manage_HITs.py		manage_HITs.py
manage_django.py		manage_django.py
mturk_client.py		mturk_client.py
question_form_answers_parser.py		question_form_answers_parser.py
rasterize_pdfs.py		rasterize_pdfs.py
repository.py		repository.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HITMaker

Screenshots

Prerequisites

First time setup

Usage

Workflow

URL parameters

TODO

About

Releases 1

Packages

Languages

License

Dzeri96/sci-annot-HITmaker

Folders and files

Latest commit

History

Repository files navigation

HITMaker

Screenshots

Prerequisites

First time setup

Usage

Workflow

URL parameters

TODO

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages