Skip to content

Latest commit

 

History

History
502 lines (339 loc) · 20.4 KB

README.md

File metadata and controls

502 lines (339 loc) · 20.4 KB

gradex-cli

alt text

Command line interface to gradex tool

alt text

Why?

A certain pandemic pushed all our examinations online at short notice, which meant handling a lot of PDF. In a big higher education department, it's preferable to have straightforward, robust procedures. Some of my thinking behind why PDF on its own was not enough for our needs, is here.

teaching-matters

Essentially, paper keeps a permanent record of any student work, and any marking, allowing it to be passed from student to marker to moderator to checker without anyone being able to inadvertently delete or lose work from the previous people in the chain. This does the same, except using PDF.

Features

  • Integrated workflow for preparing PDF for marking, moderating and checking.
  • Headers for page listing exam, student anonymous ID, and page number.
  • colourful sidebars for recording marking, moderating and checking.
  • Flattens (i.e. keeps) all student and staff annotations
  • Flattens (i.e. keeps) all sticky-note comments
  • Uniquely tracks each page, and its processing history, with CRC32-protected data-tags
  • Automatically sorts labelled pages by script or by question
  • Knows where to store files that it receives at each stage
  • Keeps exam exam separately
  • Customisable templates
  • Dynamically reconfigurable text
  • Textfields and Comboboxes using acroforms
  • Parallel processing for increased speed

Installation

Prerequisites

Required

Ghostscript downloads can be found here.

For Windows, choose the 64bit version.

You must put the executable on your path.

For testing

ImageMagick must be installed, and on the path, so as to allow visual comparisons of rendered PDFs.

Optional

Logging messages can read directly from logging file in $GRADEX_CLI_ROOT/var/log/gradex-cli.log. They're in JSON format, one message per line, with the latest message appearing at the bottom of the file.

For debugging and development, I prefer to be able to search the messages, and analyse the log data using Kibana, which can be installed following this guide here. I use Elastic, Logstash, Kibana and Filebeat. The installation is standard, in that Filebeat reads the logfile, passes it to logstash, which is configured with a JSON filter:

input {
  beats {
    port => 5044
  }
}
filter{
    json{
        source => "message"
    }
}
output {
  elasticsearch {
    hosts => ["http://localhost:9200"]
    index => "%{[@metadata][beat]}-%{[@metadata][version]}" 
  }
}

Logstash passes the logging message to elastic, and then you can discover it in Kibana, using the filebeat-* index.

Usage

There are two main work flows. One for marking by script, the other for marking by question.

If it is your first time using gradex-cli, then initialise the file structure by issuing the following command

gradex-cli ingest

Ingest

In order to mark an exam, the first step is ingesting the raw pdf files with scanned student work. This is best done on a per-exam basis while you get used to the system. The system currently assumes all work is submitted through Blackboard Learn's Box, which provides a receipt in the following format .txt file

Name: Demonstrator Alpha (s0000000)
Assignment: Demo
Date Submitted: Tuesday, 21 April 2020 06:34:13 o'clock BST
Current Mark: Needs Marking

Submission Field:
There is no student submission text data for this assignment.

Comments:
There are no student comments for this assignment.

Files:
	Original filename: something from my computer.pdf
	Filename: demo-a.pdf

In the Windows demo release there are three demo PDFs and associated "fake" Learn receipts in ./demo-input.

Place these demo files in $GRADEX_CLI_ROOT/ingest and invoke

gradex-cli ingest

Your files should disappear - although if there is something wrong, they will appear back in the ingest directory, as a way of letting you know what has been rejected.

You can see the files have been ingested into $GRADEX_CLI_ROOT\usr\exam\Demo\02-accepted-receipts and $GRADEX_CLI_ROOT\usr\exam\Demo\02-accepted-papers.

We can prepare them for further processing by flattening them, and embedding information that we will use to anonymously track each page.

We require a system for swapping the known identity for an anonymous one. The identity.csv in $GRADEX_CLI_ROOT\etc\identity\ contains two columns, identity and anonymous. For the demo system, it comes with a simple version containing fake data:

alt text

You can prepare the pages by

gradex-cli flatten new Demo

You can manually inspect the flattened paper directory to see the newly flattened files

ls $GRADEX_CLI_ROOT/usr/exam/Demo/05-anonymous-papers

alt text

Now you can choose whether to mark by script or by question. Let's mark by question. First we need to add labelling side bars so our talented team of labellers (who we shall call X, somewhat mysteriously) can whizz through and tell us which page has what question on it:

gradex-cli label X Demo
gradex-cli export marking X Demo

Then we can send the files we find in $GRADEX_CLI_ROOT/export/Demo-question-ready-X/ to our labllers. Once they send them back, we put them in $GRADEX_CLI_ROOT/ingest and ingest them with

gradex-cli ingest

You can manually inspect them to see that they end up in

ls $GRADEX_CLI_ROOT/usr/exam/Demo/10-question-back/X

Marking

We want to prepare a set of pages for a marker with the initials ABC, so we issue

gradex-cli sort ABC Demo
gradex-cli export marking ABC Demo

We can get the files from export, and send to our marker.

$GRADEX_CLI_ROOT/export/Demo-marker-ready-ABC/

alt text

You can try marking these files yourself, and save direct back to ingest (no need to change the filename, it will see from the hidden data what file it is). With the files back in the ingest directory after marking, we ingest again (same command as before)

Processing marked files

We flatten the files to preserve the comments, read the textfields and optical boxes and store the data in the file, then we assemble documents that merge together the relevant pages for each file, by script.

Each page is categorised into exactly one of four categories, in order lowest to highest priority

-- ```skipped``` - no indication from marker that they saw it
-- ```seen``` - page-ok has had a character entered in a textfield or a stylus mark has been made in more than 2% of the area of the box (or a smaller amount if the box is rectangular)
-- ```marked``` - something has been entered in one of the other textfields, by keyboard or stylus
-- ```bad``` - the page-bad box has been ticked

Note that the priority is used to resolve what status to use when more than one applies. For example, a marked page that is also bad, is given the status bad. A page that is both marked and seen is given status marked.

Page merge rules

Every page in the script is included at least once, for context. There is a merge summary bar on the side of each page, so you can tell at a glance if you should expect to see a duplicate copy of the page. If there are no marked pages, then one of the other pages is chosen. If there are more than one pages that are marked, then all marked pages are included (e.g. if two markers share a question, and there are one or more pages that have material they both ended up marking).

Processing adjustments

Textfields are not easily edited by stylus, so for these markers, we expect them to annotate by hand. Then we'll get someone to key in the mark later. So as to retain the benefits of automation, we can use "optical" methods to check whether hand annotations have been made in the textfields, and if so, trigger the same actions as would have happened by typing into the page-ok and page-bad boxes.

Background colour for optical boxes

We assume a vanilla background (#ffffff) for the boxes, unless the flag --background-vanilla=false is given, e.g.

gradex-cli flatten marked 'Some Exam' --background-vanilla=false

in which case, the background is assumed to be chocolate (#000000).

Optical Box boundaries

There are some occasions when you get false positives from the optical-boxes, which is attributed without 100% certainty to artefacts from the boundary edges. It's even been the case in testing (before default shrinkage was increased to 6 pixels) where one marker's scripts threw 100% false positives on the page-bad box, but the other Marker on that script threw far fewer false positives. If you get a bunch of false positives (no marks in box visually, but pagedata contains "markDetected") then try setting the box shrinkage to a higher number. The number is the number of pixels in each direction. A 10mm by 10mm box at 175dpi has 69x69 pixels. The default shrink reduces that to (69-6-6)x(69-6-6) = 57x57 pixels. If you wanted to shrink some more, you could try for (69-10-10)x(69-10-10) = 49x49 pixels with

gradex-cli flatten marked 'Some exam' --box-shrink=10

Either or both flags can be issued in the same command. Note that flags must come AFTER the exam.

Also note the change from an imperative "mark" from the mark command, to the adjective "marked". Just to keep you on your toes, like. The imperative (command) here is "flatten."

Limitations

This page flattening and merging process should work on the by-question batches (but has not been tested yet for that). Note that the flatten and merge phases of this step are implemented separately behind the scenes (for now), but are always performed at the same time, so the single command "flatten" is used to trigger one after the other.

Moderating

Once our marked work is flattened, we are ready to put on the moderating bars. Since we might be doing this for more than one moderator, we don't link it to the previous step. At this stage of the workflow, both by-script and by-questions processes have return to the same path (26-marked-ready). With many scripts in this folder, the system automatically splits the set of scripts into a set to be actively moderated, with a green sidebar. The rest get a smaller grey "inactive" sidebar. Let's say we have moderate FFF who will moderate 10% or 10 scripts (whichever is greater) for 'Some Exam':

gradex-cli moderate FFF 'Some Exam'
gradex-cli export moderating FFF 'Some Exam'

Note: we don't currently support any other split ratios other than 10% or 10, whichever is bigger, but it is straightforward to add flags to do this if needed.

Entering

For markers who have used a stylus, there is a set of bars that can be added so assistants can key in the stylus marks. This can be done in parallel to moderation.

gradex-cli enter X 'Some-Exam'
gradex-cli export entering X 'Some-Exam'

Note that enter bars will only be added to scripts that have marks in the question boxes, but NO keyed textfield value - so skipped pages are not included, for example.

Checking

After entering, all scripts (including those already keyed) can be prepared for checking.

This step is incomplete - the front cover is currently not yet implemented

gradex-cli ingest
gradex-xli flatten entered 'Some-Exam'
gradex-cli check X 'Some-Exam'
gradex-cli export checking X 'Some-Exam'

Further procesing steps

There are further processing steps which are currently partly supported (check bars etc). These will be updated in a future release.

Guidance to markers

Markers need only use Adobe Acrobat Reader (Free). The onedrive PDF app works on ipad, and Master PDF works on Linux. Most other viewers don't implement enough support for acroforms.

Markers:

  • can use a keyboard, or stylus
  • do not need to rename their file

Tech to avoid:

-- Apple Preview (Quartz PDF) trashes the page catalog and prevents unipdf from reading the file -- Chrome lets you edit, but doesn't save -- Edge doesn't autosize the text in the boxes so it is not nice to use -- Almost everything on linux

What to use on Linux:

Master PDF which can fill forms without registration being required

Custom templates

Some exams will have different marking requirements. These can be accommodated by offering different layout templates that offer the same stages as the default process flow (mark, moderate-active, moderate-inactive, check - note that these are intended to be reused for remark remoderate recheck, but these are not fully supported yet. This modification offers an alternative 5-questions-per-page mark template via usage of the layout-q5.svg layout at mark stage. You can use a custom template with the mark command by issuing the layout flag at the command line. The template path is relative to $GRADEX_CLI_ROOT/etc/overlay/template. For example, for the five-question markbar, issue:

gradex-cli mark <marker> <exam> --layout "layout-q5.svg"

Note that flags need to come after the exam (one of the positional arguments)

For detailed information on how to customise the templates using Inkscape, see here.

Template information

The template information is readily found by inspecting the template information in the raw svg text - this is easier than sorting through all the anchors that are stacked on each other. For example, for pages, search for inkscape:label="pages" to get to the pages layer, inkscape:label="images" for images and so on for the other layers.

Spreadnames

These are the search strings that ingester.Overlay() feeds to parsesvg.Render() so it can find the elements in the layout file it needs for a given task

  • addition
  • check
  • enter-[active/inactive]
  • final
  • flatten
  • label
  • mark
  • merge
  • moderate-[active/inactive]

Anchors

Note that svg- prefix causes BOTH and image (.jpg) AND svg (.svg) elements to be included and it is an error not to provide them The img prefix causes only a static image to be used, although if labelled previous it is a "special" image for legacy reasons, and it was not factored out for convenience. The previous image labels don't seem fully consistent ...

  • ref-anchor
  • img-previous-check
  • img-previous-enter-active
  • img-previous-entry-inactive
  • img-previous-flatten-processed
  • img-previous-label
  • img-previous-moderate-active
  • svg-addition-boxes ➡ sidebar-312pt-addition-10box
  • svg-addition-header ➡ flatten-header
  • svg-check ➡ sidebar-312pt-check-flow
  • svg-enter-active ➡ sidebar-312pt-enter-flow
  • svg-enter-inactive ➡ sidebar-312pt-enter-flow-inactive
  • svg-final-boxes ➡ sidebar-312pt-final-cover-10box
  • svg-final-header ➡ flatten-header
  • svg-label ➡ sidebar-312pt-label
  • svg-mark-ladder ➡ sidebar-312pt-mark-ladder-flow-comment
  • svg-merge-sidebar ➡ merge-sidebar
  • svg-moderate-active ➡ sidebar-312pt-moderate-flow-comment-active
  • svg-moderate-inactive ➡ sidebar-312pt-moderate-inactive

Pages

  • page-dynamic-check
  • page-dynamic-enter-active
  • page-dynamic-enter-inactive
  • page-dynamic-flatten-processed
  • page-dynamic-mark
  • page-dynamic-merge-sidebar
  • page-dynamic-moderate-active
  • page-dynamic-moderate-inactive
  • page-static-addition
  • page-static-final
  • page-static-label

Images

Note there are some inconsistencies here, e.g. use of width, and inconsistent inclusion of active/inactive state for enter

  • image-dynamic-width-previous-enter
  • image-dynamic-previous-enter-inactive
  • image-dynamic-previous-flatten-processed
  • image-dynamic-previous-label
  • image-dynamic-previous-merge-sidebar
  • image-dynamic-previous-moderate-inactive
  • image-dynamic-previous-moderate-active
  • image-dynamic-width-previous-check
  • image-static-previous-mark

Q5 layout

This is an alternative layout with 5 Qs, so it needs different size pages, and different svg names to match up with its design files

Anchors

  • img-previous-merge-sidebar
  • img-previous-enter
  • img-previous-flatten-processed
  • img-previous-moderate-inactive
  • img-previous-check
  • img-previous-moderate-active
  • img-previous-moderate-active
  • img-previous-mark
  • img-previous-label
  • ref-anchor
  • svg-mark-ladder ➡ sidebar-312pt-mark-ladder-flow-comment-q5
  • svg-enter➡ sidebar-312pt-enter-q5
  • svg-label ➡ sidebar-312pt-label
  • svg-moderate-active ➡ sidebar-312pt-moderate-active-q5
  • svg-moderate-inactive ➡ sidebar-312pt-moderate-inactive
  • svg-check ➡ sidebar-312pt-check-q5
TODO
  • svg-addition-boxes ➡ sidebar-312pt-addition-10box

  • svg-addition-header ➡ flatten-header

  • svg-enter-active ➡ sidebar-312pt-enter-flow

  • svg-enter-inactive ➡ sidebar-312pt-enter-flow-inactive

  • svg-final-boxes ➡ sidebar-312pt-final-cover-10box

  • svg-final-header ➡ flatten-header

  • svg-mark-ladder ➡ sidebar-312pt-mark-ladder-flow-comment

  • svg-merge-sidebar ➡ merge-sidebar

  • svg-moderate-active ➡ sidebar-312pt-moderate-flow-comment-active

  • svg-moderate-inactive ➡ sidebar-312pt-moderate-inactive

  • page-static-addition

  • page-static-final

  • svg-label ➡ sidebar-312pt-label

Pages

  • page-dynamic-check<
  • page-dynamic-enter
  • page-dynamic-moderate-active
  • page-static-label
  • page-dynamic-mark
  • page-dynamic-moderate-inactive
  • page-dynamic-flatten-processed
  • page-dynamic-merge-sidebar

Images

  • image-dynamic-previous-moderate-inactive
  • image-dynamic-previous-moderate-active
  • image-static-previous-mark
  • image-dynamic-width-previous-check
  • image-dynamic-previous-label
  • image-dynamic-previous-flatten-processed
  • image-dynamic-previous-merge-sidebar

TODO

A number of major items are now ticked off the list, and our first diet of exam processing is coming to an end, so now concentrating on the features needed to finish off the Boards of Examiners' paperwork

  • Report on audit (graphviz?)

Done

  • integrate optical check box

  • integrate tree view from here

  • handle incoming marked/moderated/checked work

    • merge pages
    • report bad pages detected by markers
    • report results into csv, similar to this
  • report results into csv, similar to this

Deferred

Test coverage

comment	    coverage: 93.8% of statements
ingester	coverage: 58.1% of statements
optical	    coverage: 81.5% of statements
pagedata    coverage: 74.2% of statements
parselearn  coverage: 87.6% of statements
parsesvg    coverage: 81.8% of statements
tree        coverage: 63.4% of statements

Codebase

Now ~16 KLOC ....

--------------------------------------------------------------------------------
 Language             Files        Lines        Blank      Comment         Code
--------------------------------------------------------------------------------
 Go                     100        22524         4939         1592        15993
 Markdown                 9          817          263            0          554
 Plain Text              18          291           72            0          219
 Bourne Shell             3           18            5            6            7
 JSON                     1            1            0            0            1
--------------------------------------------------------------------------------
 Total                  131        23651         5279         1598        16774
--------------------------------------------------------------------------------