Skip to content
Keven Ates edited this page Nov 8, 2022 · 7 revisions

Batch Processing via Command Line or Alternative Clients

Open Refine allows for alternative client access through the OpenRefine API. The API is the normal HTTP GET and POST calls to the OpenRefine server that the client-side system uses.

Process

RDF Transform extends the API with its own HTTP GET and POST calls. For the effective batch processing using RDF Transform, a RDF Transform template should already be established and exported from an existing OpenRefine project. The processing proceeds as follows:

  1. A new project is created with a data file (an OpenRefine process)
  2. The RDF Transform template is applied to the new project via the Save RDF Transform command
  3. An export is performed using an RDF Transform export format creating the RDF file

Export Formats:

RDF Transform uses the Apache Jena library to export RDF Formats and uses the RIOT RDFFormat IDs. See Apache Jena RDF Output documentation.

The format for the export is determined by a string match between a format string in the client-side command and the server-side registered exporters. For RDF Transform, the format string is equivalent to the "Jena RDFFormat ID" in the server-side registered exporters.

Pretty Formats:

Jena RDFFormat ID Extension Note
RDFXML_PRETTY .rdf XML Format
TURTLE_PRETTY .ttl
TRIG_PRETTY .trig Turtle Extension
JSONLD_PRETTY .jsonld
RDFJSON .rj Simple JSON

Stream Formats:

Jena RDFFormat ID Extension Note
TURTLE_BLOCKS .ttl
TRIG_BLOCKS .trig Turtle Extension
NTRIPLES .nt UTF-8 Only
NQUADS .nq UTF-8 Only
TRIX .xml Simple XML
RDFNULL .rn Tests the processor
RDF_PROTO .rp Binary Format
RDF_THRIFT .rt Binary Format

Commands

The following commands demonstrate the processing for a OpenRefine project to export RDF formats using RDF Transform.

General Variables

endpoint=<your local endpoint such as> "http://127.0.0.1:3333/"
workdir=<directory where you want to save work>

Save RDF Transform Command

Th "save" process uses a supplied JSON template generated from the RDF Transform client dialog to the project on the server side.

A sample save transform command to save a transform to the server:

p="MyProject"
transform="...the saved transform as a JSON string..."
refine_csrf=$(curl -fs "${endpoint}/command/core/get-csrf-token" | cut -d \" -f 4)"
echo "Save transform to server for project ${p}..."
if curl -fs \
  --data project="${projects[$p]}" \
  --data rdf-transform="${transform}" \
  "${endpoint}/command/rdf-transform/save-rdf-transform$(refine_csrf)"
then
  log "Saved transform to project ${p} (${projects[$p]})"
else
  error "Save transform to ${p} (${projects[$p]}) failed!"
fi

Export Command

A sample export command:

NOTE: Adjust the data engine mode for "record-based" vs "row-based" setting as needed.

p="MyProject"
format="TURTLE_PRETTY"
ext="ttl"
echo "Export project ${p} as ${format} format..."
if curl -fs \
  --data project="${projects[$p]}" \
  --data format="${format}" \
  --data engine='{"facets":[],"mode":"row-based"}' \
  "${endpoint}/command/core/export-rows" \
  > "${workdir}/${p}.${ext}"
then
  log "Exported ${p} (${projects[$p]}) to ${workdir}/${p}.${ext}"
else
  error "Export of ${p} (${projects[$p]}) failed!"
fi