Command line interface to gradex tool
A certain pandemic pushed all our examinations online at short notice, which meant handling a lot of PDF. In a big higher education department, it's preferable to have straightforward, robust procedures. Some of my thinking behind why PDF on its own was not enough for our needs, is here.
Essentially, paper keeps a permanent record of any student work, and any marking, allowing it to be passed from student to marker to moderator to checker without anyone being able to inadvertently delete or lose work from the previous people in the chain. This does the same, except using PDF.
- Integrated workflow for preparing PDF for marking, moderating and checking.
- Headers for page listing exam, student anonymous ID, and page number.
- colourful sidebars for recording marking, moderating and checking.
- Flattens (i.e. keeps) all student and staff annotations
- Flattens (i.e. keeps) all sticky-note comments
- Uniquely tracks each page, and its processing history, with CRC32-protected data-tags
- Automatically sorts labelled pages by script or by question
- Knows where to store files that it receives at each stage
- Keeps exam exam separately
- Customisable templates
- Dynamically reconfigurable text
- Textfields and Comboboxes using acroforms
- Parallel processing for increased speed
Ghostscript downloads can be found here.
For Windows, choose the 64bit version.
You must put the executable on your path.
ImageMagick must be installed, and on the path, so as to allow visual comparisons of rendered PDFs.
Logging messages can read directly from logging file in $GRADEX_CLI_ROOT/var/log/gradex-cli.log
. They're in JSON format, one message per line, with the latest message appearing at the bottom of the file.
For debugging and development, I prefer to be able to search the messages, and analyse the log data using Kibana, which can be installed following this guide here. I use Elastic, Logstash, Kibana and Filebeat. The installation is standard, in that Filebeat reads the logfile, passes it to logstash, which is configured with a JSON filter:
input {
beats {
port => 5044
}
}
filter{
json{
source => "message"
}
}
output {
elasticsearch {
hosts => ["http://localhost:9200"]
index => "%{[@metadata][beat]}-%{[@metadata][version]}"
}
}
Logstash passes the logging message to elastic, and then you can discover it in Kibana, using the filebeat-*
index.
There are two main work flows. One for marking by script, the other for marking by question.
If it is your first time using gradex-cli
, then initialise the file structure by issuing the following command
gradex-cli ingest
In order to mark an exam, the first step is ingesting the raw pdf files with scanned student work. This is best done on a per-exam basis while you get used to the system. The system currently assumes all work is submitted through Blackboard Learn's Box, which provides a receipt in the following format .txt
file
Name: Demonstrator Alpha (s0000000)
Assignment: Demo
Date Submitted: Tuesday, 21 April 2020 06:34:13 o'clock BST
Current Mark: Needs Marking
Submission Field:
There is no student submission text data for this assignment.
Comments:
There are no student comments for this assignment.
Files:
Original filename: something from my computer.pdf
Filename: demo-a.pdf
In the Windows demo release there are three demo PDFs and associated "fake" Learn receipts in ./demo-input
.
Place these demo files in $GRADEX_CLI_ROOT/ingest
and invoke
gradex-cli ingest
Your files should disappear - although if there is something wrong, they will appear back in the ingest directory, as a way of letting you know what has been rejected.
You can see the files have been ingested into $GRADEX_CLI_ROOT\usr\exam\Demo\02-accepted-receipts
and $GRADEX_CLI_ROOT\usr\exam\Demo\02-accepted-papers
.
We can prepare them for further processing by flattening
them, and embedding information that we will use to anonymously track each page.
We require a system for swapping the known identity for an anonymous one. The identity.csv
in $GRADEX_CLI_ROOT\etc\identity\
contains two columns, identity
and anonymous
. For the demo system, it comes with a simple version containing fake data:
You can prepare the pages by
gradex-cli flatten new Demo
You can manually inspect the flattened paper directory to see the newly flattened files
ls $GRADEX_CLI_ROOT/usr/exam/Demo/05-anonymous-papers
Now you can choose whether to mark by script or by question. Let's mark by question. First we need to add labelling side bars so our talented team of labellers (who we shall call X
, somewhat mysteriously) can whizz through and tell us which page has what question on it:
gradex-cli label X Demo
gradex-cli export marking X Demo
Then we can send the files we find in $GRADEX_CLI_ROOT/export/Demo-question-ready-X/
to our labllers. Once they send them back, we put them in $GRADEX_CLI_ROOT/ingest
and ingest them with
gradex-cli ingest
You can manually inspect them to see that they end up in
ls $GRADEX_CLI_ROOT/usr/exam/Demo/10-question-back/X
We want to prepare a set of pages for a marker with the initials ABC, so we issue
gradex-cli sort ABC Demo
gradex-cli export marking ABC Demo
We can get the files from export, and send to our marker.
$GRADEX_CLI_ROOT/export/Demo-marker-ready-ABC/
You can try marking these files yourself, and save direct back to ingest (no need to change the filename, it will see from the hidden data what file it is). With the files back in the ingest directory after marking, we ingest again (same command as before)
We flatten the files to preserve the comments, read the textfields and optical boxes and store the data in the file, then we assemble documents that merge together the relevant pages for each file, by script.
Each page is categorised into exactly one of four categories, in order lowest to highest priority
-- ```skipped``` - no indication from marker that they saw it
-- ```seen``` - page-ok has had a character entered in a textfield or a stylus mark has been made in more than 2% of the area of the box (or a smaller amount if the box is rectangular)
-- ```marked``` - something has been entered in one of the other textfields, by keyboard or stylus
-- ```bad``` - the page-bad box has been ticked
Note that the priority is used to resolve what status to use when more than one applies. For example, a marked
page that is also bad
, is given the status bad
. A page that is both marked
and seen
is given status marked
.
Every page in the script is included at least once, for context. There is a merge summary bar on the side of each page, so you can tell at a glance if you should expect to see a duplicate copy of the page. If there are no marked
pages, then one of the other pages is chosen. If there are more than one pages that are marked
, then all marked
pages are included (e.g. if two markers share a question, and there are one or more pages that have material they both ended up marking).
Textfields are not easily edited by stylus, so for these markers, we expect them to annotate by hand. Then we'll get someone to key in the mark later. So as to retain the benefits of automation, we can use "optical" methods to check whether hand annotations have been made in the textfields, and if so, trigger the same actions as would have happened by typing into the page-ok
and page-bad
boxes.
We assume a vanilla background (#ffffff) for the boxes, unless the flag --background-vanilla=false
is given, e.g.
gradex-cli flatten marked 'Some Exam' --background-vanilla=false
in which case, the background is assumed to be chocolate (#000000).
There are some occasions when you get false positives from the optical-boxes, which is attributed without 100% certainty to artefacts from the boundary edges. It's even been the case in testing (before default shrinkage was increased to 6 pixels) where one marker's scripts threw 100% false positives on the page-bad
box, but the other Marker on that script threw far fewer false positives. If you get a bunch of false positives (no marks in box visually, but pagedata contains "markDetected") then try setting the box shrinkage to a higher number. The number is the number of pixels in each direction. A 10mm by 10mm box at 175dpi has 69x69 pixels. The default shrink reduces that to (69-6-6)x(69-6-6) = 57x57 pixels. If you wanted to shrink some more, you could try for (69-10-10)x(69-10-10) = 49x49 pixels with
gradex-cli flatten marked 'Some exam' --box-shrink=10
Either or both flags can be issued in the same command. Note that flags must come AFTER the exam.
Also note the change from an imperative "mark" from the mark command, to the adjective "marked". Just to keep you on your toes, like. The imperative (command) here is "flatten."
This page flattening and merging process should work on the by-question batches (but has not been tested yet for that). Note that the flatten and merge phases of this step are implemented separately behind the scenes (for now), but are always performed at the same time, so the single command "flatten" is used to trigger one after the other.
Once our marked work is flattened, we are ready to put on the moderating bars. Since we might be doing this for more than one moderator, we don't link it to the previous step. At this stage of the workflow, both by-script and by-questions processes have return to the same path (26-marked-ready
). With many scripts in this folder, the system automatically splits the set of scripts into a set to be actively moderated, with a green sidebar. The rest get a smaller grey "inactive" sidebar. Let's say we have moderate FFF who will moderate 10% or 10 scripts (whichever is greater) for 'Some Exam':
gradex-cli moderate FFF 'Some Exam'
gradex-cli export moderating FFF 'Some Exam'
Note: we don't currently support any other split ratios other than 10% or 10, whichever is bigger, but it is straightforward to add flags to do this if needed.
For markers who have used a stylus, there is a set of bars that can be added so assistants can key in the stylus marks. This can be done in parallel to moderation.
gradex-cli enter X 'Some-Exam'
gradex-cli export entering X 'Some-Exam'
Note that enter bars will only be added to scripts that have marks in the question boxes, but NO keyed textfield value - so skipped pages are not included, for example.
After entering, all scripts (including those already keyed) can be prepared for checking.
This step is incomplete - the front cover is currently not yet implemented
gradex-cli ingest
gradex-xli flatten entered 'Some-Exam'
gradex-cli check X 'Some-Exam'
gradex-cli export checking X 'Some-Exam'
There are further processing steps which are currently partly supported (check bars etc). These will be updated in a future release.
Markers need only use Adobe Acrobat Reader (Free). The onedrive PDF app works on ipad, and Master PDF works on Linux. Most other viewers don't implement enough support for acroforms.
- can use a keyboard, or stylus
- do not need to rename their file
-- Apple Preview (Quartz PDF) trashes the page catalog and prevents unipdf from reading the file -- Chrome lets you edit, but doesn't save -- Edge doesn't autosize the text in the boxes so it is not nice to use -- Almost everything on linux
Master PDF which can fill forms without registration being required
Some exams will have different marking requirements. These can be accommodated by offering different layout templates that offer the same stages as the default process flow (mark, moderate-active, moderate-inactive, check - note that these are intended to be reused for remark remoderate recheck, but these are not fully supported yet. This modification offers an alternative 5-questions-per-page mark template via usage of the layout-q5.svg
layout at mark stage. You can use a custom template with the mark command by issuing the layout flag at the command line. The template path is relative to $GRADEX_CLI_ROOT/etc/overlay/template
. For example, for the five-question markbar, issue:
gradex-cli mark <marker> <exam> --layout "layout-q5.svg"
Note that flags need to come after the exam (one of the positional arguments)
For detailed information on how to customise the templates using Inkscape, see here.
The template information is readily found by inspecting the template information in the raw svg text - this is easier than sorting through all the anchors that are stacked on each other.
For example, for pages, search for inkscape:label="pages"
to get to the pages layer, inkscape:label="images"
for images and so on for the other layers.
These are the search strings that ingester.Overlay()
feeds to parsesvg.Render()
so it can find the elements in the layout file it needs for a given task
- addition
- check
- enter-[active/inactive]
- final
- flatten
- label
- mark
- merge
- moderate-[active/inactive]
Note that svg-
prefix causes BOTH and image (.jpg) AND svg (.svg) elements to be included and it is an error not to provide them
The img
prefix causes only a static image to be used, although if labelled previous
it is a "special" image for legacy reasons, and
it was not factored out for convenience. The previous image labels don't seem fully consistent ...
- ref-anchor
- img-previous-check
- img-previous-enter-active
- img-previous-entry-inactive
- img-previous-flatten-processed
- img-previous-label
- img-previous-moderate-active
- svg-addition-boxes ➡ sidebar-312pt-addition-10box
- svg-addition-header ➡ flatten-header
- svg-check ➡ sidebar-312pt-check-flow
- svg-enter-active ➡ sidebar-312pt-enter-flow
- svg-enter-inactive ➡ sidebar-312pt-enter-flow-inactive
- svg-final-boxes ➡ sidebar-312pt-final-cover-10box
- svg-final-header ➡ flatten-header
- svg-label ➡ sidebar-312pt-label
- svg-mark-ladder ➡ sidebar-312pt-mark-ladder-flow-comment
- svg-merge-sidebar ➡ merge-sidebar
- svg-moderate-active ➡ sidebar-312pt-moderate-flow-comment-active
- svg-moderate-inactive ➡ sidebar-312pt-moderate-inactive
- page-dynamic-check
- page-dynamic-enter-active
- page-dynamic-enter-inactive
- page-dynamic-flatten-processed
- page-dynamic-mark
- page-dynamic-merge-sidebar
- page-dynamic-moderate-active
- page-dynamic-moderate-inactive
- page-static-addition
- page-static-final
- page-static-label
Note there are some inconsistencies here, e.g. use of width, and inconsistent inclusion of active/inactive state for enter
- image-dynamic-width-previous-enter
- image-dynamic-previous-enter-inactive
- image-dynamic-previous-flatten-processed
- image-dynamic-previous-label
- image-dynamic-previous-merge-sidebar
- image-dynamic-previous-moderate-inactive
- image-dynamic-previous-moderate-active
- image-dynamic-width-previous-check
- image-static-previous-mark
This is an alternative layout with 5 Qs, so it needs different size pages, and different svg names to match up with its design files
- img-previous-merge-sidebar
- img-previous-enter
- img-previous-flatten-processed
- img-previous-moderate-inactive
- img-previous-check
- img-previous-moderate-active
- img-previous-moderate-active
- img-previous-mark
- img-previous-label
- ref-anchor
- svg-mark-ladder ➡ sidebar-312pt-mark-ladder-flow-comment-q5
- svg-enter➡ sidebar-312pt-enter-q5
- svg-label ➡ sidebar-312pt-label
- svg-moderate-active ➡ sidebar-312pt-moderate-active-q5
- svg-moderate-inactive ➡ sidebar-312pt-moderate-inactive
- svg-check ➡ sidebar-312pt-check-q5
-
svg-addition-boxes ➡ sidebar-312pt-addition-10box
-
svg-addition-header ➡ flatten-header
-
svg-enter-active ➡ sidebar-312pt-enter-flow
-
svg-enter-inactive ➡ sidebar-312pt-enter-flow-inactive
-
svg-final-boxes ➡ sidebar-312pt-final-cover-10box
-
svg-final-header ➡ flatten-header
-
svg-mark-ladder ➡ sidebar-312pt-mark-ladder-flow-comment
-
svg-merge-sidebar ➡ merge-sidebar
-
svg-moderate-active ➡ sidebar-312pt-moderate-flow-comment-active
-
svg-moderate-inactive ➡ sidebar-312pt-moderate-inactive
-
page-static-addition
-
page-static-final
-
svg-label ➡ sidebar-312pt-label
- page-dynamic-check<
- page-dynamic-enter
- page-dynamic-moderate-active
- page-static-label
- page-dynamic-mark
- page-dynamic-moderate-inactive
- page-dynamic-flatten-processed
- page-dynamic-merge-sidebar
- image-dynamic-previous-moderate-inactive
- image-dynamic-previous-moderate-active
- image-static-previous-mark
- image-dynamic-width-previous-check
- image-dynamic-previous-label
- image-dynamic-previous-flatten-processed
- image-dynamic-previous-merge-sidebar
A number of major items are now ticked off the list, and our first diet of exam processing is coming to an end, so now concentrating on the features needed to finish off the Boards of Examiners' paperwork
- Report on audit (graphviz?)
-
integrate optical check box
-
integrate tree view from here
-
handle incoming marked/moderated/checked work
- merge pages
- report bad pages detected by markers
- report results into csv, similar to this
-
report results into csv, similar to this
- integrate optical handwriting recognition
- live marking tool to show staff running averages/totals/percentage completion
comment coverage: 93.8% of statements
ingester coverage: 58.1% of statements
optical coverage: 81.5% of statements
pagedata coverage: 74.2% of statements
parselearn coverage: 87.6% of statements
parsesvg coverage: 81.8% of statements
tree coverage: 63.4% of statements
Now ~16 KLOC ....
--------------------------------------------------------------------------------
Language Files Lines Blank Comment Code
--------------------------------------------------------------------------------
Go 100 22524 4939 1592 15993
Markdown 9 817 263 0 554
Plain Text 18 291 72 0 219
Bourne Shell 3 18 5 6 7
JSON 1 1 0 0 1
--------------------------------------------------------------------------------
Total 131 23651 5279 1598 16774
--------------------------------------------------------------------------------