Skip to content

Commit

Permalink
Remove data.tar.gz from commit: too large for GitHub
Browse files Browse the repository at this point in the history
  • Loading branch information
Berj Chilingirian committed Sep 29, 2016
1 parent d0aadf1 commit 63bb806
Show file tree
Hide file tree
Showing 12 changed files with 283 additions and 245 deletions.
6 changes: 6 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,12 @@ __pycache__/
*.py[cod]
*$py.class

# macOS
.DS_Store

# PyCharm
.idea

# C extensions
*.so

Expand Down
2 changes: 1 addition & 1 deletion AUTHORS
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
Berj K Chilingirian <[email protected]>
Berj K. Chilingirian <[email protected]>
Zara Perumal <[email protected]>
Grahame Bowland <[email protected]>
Ronald L. Rivest <[email protected]>
161 changes: 127 additions & 34 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,57 +1,150 @@
# aus-senate-audit
# Overview

The Australian Senate Audit can be run in three different modes.
The ``aus-senate-audit`` is a command-line tool for auditing the reported outcome of the Australian Senate Election. The intention of this tool is two-fold. First, it provides a platform for researchers to run simulated audits on electronic representations of paper ballots from the Australian Senate Election. Second, it provides a platform for election officials to use when auditing the paper ballots of the Australian Senate Election.

1. Simulation Mode: Runs a simulated audit on fake data using Borda Count.
``aus-senate-audit`` uses the Bayesian audit, a post-election, ballot-polling audit developed by [Rivest and Shen (2012)](https://www.usenix.org/system/files/conference/evtwote12/rivest_bayes_rev_073112.pdf). This software, however, can be extended to support any post-election, ballot-polling audit.

``aus-senate-audit simulation --seed SEED --num-candidates NUM_CANDIDATES --num-ballots NUM_BALLOTS``
# Getting Started

where
You can download the ``aus-senate-audit`` command-line tool with the following command.

SEED is the starting value of the RNG used by the program.
NUM_CANDIDATES is the number of candidates in the simulation (default: 100).
NUM_BALLOTS is the number of cast ballots in the simulation (default: 1000000).
```
$ pip3 install aus-senate-audit
```

You can upgrade your installment of the ``aus-senate-audit`` package to the latest version with the following command.

2. Quick Mode: Runs a Bayesian audit on real data and automates reading paper ballots.
```
$ pip3 install --upgrade aus-senate-audit
```

``aus-senate-audit quick --seed SEED --state STATE --data DATA``
Once you have installed the package, you can see its usage with the following command.

where
```
$ aus-senate-audit --help
usage: aus-senate-audit [-h] [-s SEED] [--num-ballots NUM_BALLOTS]
[--num-candidates NUM_CANDIDATES]
[--state {ACT,NSW,NT,QLD,SA,TAS,VIC,WA}]
[--selected-ballots SELECTED_BALLOTS] [--data DATA]
[--max-ballots MAX_BALLOTS]
[-f UNPOPULAR_FREQUENCY_THRESHOLD]
[--sample-increment-size SAMPLE_INCREMENT_SIZE]
MODE
SEED is the starting value of the RNG used by the program.
STATE is the abbreviated name of the Australian state to run the audit for (e.g. TAS).
DATA is the file path to all Australian senate election data.
positional arguments:
MODE The mode in which to run the audit.
3. Real Mode: Runs a Bayesian audit on real data.
optional arguments:
-h, --help show this help message and exit
-s SEED, --seed SEED The starting value of the random number generator.
--num-ballots NUM_BALLOTS
The number of ballots cast for a simulated senate
election.
--num-candidates NUM_CANDIDATES
The number of candidates for a simulated senate
election.
--state {ACT,NSW,NT,QLD,SA,TAS,VIC,WA}
The abbreviation of the state name to run the senate
election audit for.
--selected-ballots SELECTED_BALLOTS
The path to the CSV file containing the selected
ballots data.
--data DATA The path to all Australian senate election data.
--max-ballots MAX_BALLOTS
The maximum number of ballots to check for a real
senate election audit.
-f UNPOPULAR_FREQUENCY_THRESHOLD, --unpopular-frequency-threshold UNPOPULAR_FREQUENCY_THRESHOLD
The minimum frequency of trials in a single audit
stage a candidate must be elected in order for the
candidate to be deemed unpopular (only applied on the
last audit stage).
--sample-increment-size SAMPLE_INCREMENT_SIZE
The number of ballots to add to the growing sample
during this audit stage.
```

Running a real audit requires two steps. First, the formal preferences must be sampled using
This package is distributed by the Python Package Index (PyPI) [here](https://pypi.python.org/pypi/aus-senate-audit).

``aus-senate-audit real --seed SEED --state STATE --data DATA``
# Running an Audit

This command will generate a ``selected_ballots.csv`` file containing a
sample of ballots that do not contain formal preferences. The auditor must
use the information in the file to retrieve the paper preferences and enter
them into the CSV file. For example, suppose a line in the ``selected_ballots.csv``
file appears as
The ``aus-senate-audit`` can be run in three different modes.

``Denison,POSTAL 3,311,19,42,``
## Simulation Mode

The auditor would retrieve that exact ballot and then add the preferences
read from that ballot to the ``selected_ballots.csv``, as shown below.
Simulation mode runs a simulated audit on fake Australian Senate Election data using Borda Count as the social choice function. This differs from the social choice function used by the actual Australian Senate Election, [single transferable vote (STV)](https://en.wikipedia.org/wiki/Single_transferable_vote).

``Denison,POSTAL 3,311,19,42,",1,2,3,,6,,,,4,,,5,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"``
Simulation mode is indicated by the first (and only) positional argument set as ``simulation``, as shown below.

Upon completing the ``selected_ballots.csv``, the auditor should run
```
aus-senate-audit simulation --seed SEED --num-candidates NUM_CANDIDATES --num-ballots NUM_BALLOTS
```

``aus-senate-audit real --seed SEED --state STATE --selected-ballots SELECTED_BALLOTS_FILE --data DATA``
The ``SEED`` option specifies the starting value of the random number generator (RNG) used by the audit to generate all randomness (defaults to 1). The ``NUM_CANDIDATES`` option is the number of candidates in the simulated election (defaults to 100). The ``NUM_BALLOTS`` option is the number of cast ballots in the simulated election (defaults to 1000000).

SELECTED_BALLOTS_FILE is the path to the ``selected_ballots.csv`` file.
## Quick Mode

This command will run one audit stage on the sample of ballots audited thus far.
One should continue the audit in this manner (repeating step 3), until the audit
terminates (as will be indicated in the printout by the audit).
Quick mode runs a Bayesian audit on real Australian Senate Election data using STV as the social choice function. This mode simulates a "real" Australian Senate Election audit by inspecting electronic representations of paper ballots.

There are a handful of other options for fine tuning the audit. These can be seen by running
Quick mode is indicated by the first positional argument set as ``quick``, as shown below.

``aus-senate-audit -h``
```
aus-senate-audit quick --seed SEED --state STATE --data DATA
```

As before, the ``SEED`` option specifies the starting value of the RNG used by the audit to generate all randomness (defaults to 1). The ``STATE`` option is the abbreviated name of the Australian state to run the audit for (e.g. ``TAS``). The ``DATA`` option is the file path to all Australian Senate Election data (see [Getting Australian Senate Election Data](#getting-australian-senate-election-data) for details).

## Real Mode

Real mode runs a Bayesian audit on real Australian Senate Election data using STV as the social choice function. Unlike quick mode, however, real mode requires user interaction. This interaction occurs in two repeating steps.

First, a user must provide the software with the Australian Senate Election data, as shown below. Note the first positional argument is set as ``real``.

```
aus-senate-audit real --seed SEED --state STATE --data DATA
```

As before, the ``SEED`` option specifies the starting value of the RNG used by the audit to generate all randomness (defaults to 1). The ``STATE`` option is the abbreviated name of the Australian state to run the audit for (e.g. ``TAS``). The ``DATA`` option is the file path to all Australian Senate Election data (see [Getting Australian Senate Election Data](#getting-australian-senate-election-data) for details).

This command generates a CSV file named ``selected_ballots.csv``. This file contains a random sample of ballots from the Australian Senate Election without the preferences marked on the ballot. The auditor must use the information in the file (i.e. location of the paper ballot) to retrieve the paper ballot and enter its preferences into the missing entry in the CSV file.

For example, suppose a line in the ``selected_ballots.csv`` file appears as

```
Denison,POSTAL 3,311,19,42,
```

The user would retrieve the paper ballot corresponding to the given information and add the preferences read from that paper ballot to the ``selected_ballots.csv``, as shown below.

```
Denison,POSTAL 3,311,19,42,",1,2,3,,6,,,,4,,,5,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"
```

Once the user has filled in every row in the ``selected_ballots.csv``, they are ready for the second step, started with the command below.

```
aus-senate-audit real --seed SEED --state STATE --selected-ballots SELECTED_BALLOTS_FILE --data DATA
```

As before, the ``SEED`` option specifies the starting value of the RNG used by the audit to generate all randomness (defaults to 1). The ``STATE`` option is the abbreviated name of the Australian state to run the audit for (e.g. ``TAS``). The ``SELECTED_BALLOTS_FILE`` option is the path to the selected ballots file. The ``DATA`` option is the file path to all Australian Senate Election data (see [Getting Australian Senate Election Data](#getting-australian-senate-election-data) for details).

This command runs one stage of the Bayesian audit on the sample of paper ballots audited thus far.

The user continues these two steps in succession until the audit terminates with one of the two messages below.

```
Audit has looked at all ballots. Done.
```

```
Stopping because audit confirmed outcome:
(28081, 28083, 28085, 28345, 28346, 28348, 28350, 28871, 28873, 28874, 28876, 28877)
Total number of ballots examined: 6116
```

# Getting Australian Senate Election Data

The Australian Senate Election data can be retrieved by downloading the ``data.tar.gz`` file at the top-level of this repository.

This can be done in one of two ways:

(Directions to come soon...)
20 changes: 13 additions & 7 deletions aus_senate_audit/audit_recorder.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,14 +24,16 @@ class AuditRecorder(object):
""" Encapsulates utilities for interacting with information about the audit's progress thus far.
:ivar str state: The abbreviated name of the state whose senate election is being audited.
:ivar str audit_dir_name: The name of the directory storing all audit results.
"""
def __init__(self, state, audit_dir=AUDIT_DIR_NAME):
""" Initializes an :class:`AuditResults` object.
def __init__(self, state, audit_dir_name=AUDIT_DIR_NAME):
""" Initializes an :class:`AuditRecorder` object.
:param str state: The abbreviated name of the state whose senate election is being audited.
:param str audit_dir_name: The name of the directory storing all audit results (default: 'audit_{}').
"""
self._state = state
self.audit_dir = audit_dir
self.audit_dir_name = audit_dir_name
if not exists(self.get_audit_dir_name()):
makedirs('{}/{}'.format(self.get_audit_dir_name(), ROUND_DIR_NAME))
self.record_audit_info(0, 0)
Expand All @@ -46,15 +48,15 @@ def remove_preferences_from_ballot(ballot):
:returns: The given ballot minus the formal preferences recorded by the ballot.
:rtype: str
"""
return ballot.split('"')[0] # Works because preferences column is wrapped in quotation marks.
return ballot.split('"')[0] # Works because the preferences column is wrapped in quotation marks.

def get_audit_dir_name(self):
""" Returns the audit directory name for the given state.
:returns: The audit directory name for the given state.
:rtype: str
"""
return self.audit_dir.format(self._state)
return self.audit_dir_name.format(self._state)

def get_file_path(self, file_name):
""" Returns the file path for the given file name within the audit directory.
Expand Down Expand Up @@ -84,7 +86,7 @@ def add_new_ballots_to_aggregate(self, path_to_selected_ballots_file):
f.write(new_ballot)

def record_audit_info(self, audit_stage, sample_size):
""" Sets information about the audit recored thus far.
""" Sets information about the audit recorded thus far.
:param int audit_stage: The new stage of the audit.
:param int sample_size: The sample size of the audit.
Expand Down Expand Up @@ -137,7 +139,11 @@ def record_selected_ballots(self, audit_stage, sample, quick):
# Write the new ballots in the sample to the selected ballots file, without specifying the original preferences.
with open(SELECTED_BALLOTS_FILE_NAME, 'w') as f:
f.write('{}\n'.format(','.join(COLUMN_HEADERS)))
f.write('\n'.join([ballot if quick else self.remove_preferences_from_ballot(ballot) for ballot in sample]) + '\n')
f.write(
'\n'.join(
[ballot if quick else self.remove_preferences_from_ballot(ballot) for ballot in sample]
) + '\n'
)
# Write the new ballots in the sample to the audit round file.
with open(self.get_file_path(AUDIT_ROUND_FILE_NAME.format(ROUND_DIR_NAME, audit_stage)), 'w') as f:
f.write('{}\n'.format(','.join(COLUMN_HEADERS + MATCH_HEADERS)))
Expand Down
Loading

0 comments on commit 63bb806

Please sign in to comment.