Batch

Batch Processing via Command Line or Alternative Clients

Open Refine allows for alternative client access through the OpenRefine API. The API is the normal HTTP GET and POST calls to the OpenRefine server that the client-side system uses.

Process

RDF Transform extends the API with its own HTTP GET and POST calls. For the effective batch processing using RDF Transform, a RDF Transform template should already be established and exported from an existing OpenRefine project. The processing proceeds as follows:

A new project is created with a data file (an OpenRefine process)
The RDF Transform template is applied to the new project via the Save RDF Transform command
An export is performed using an RDF Transform export format creating the RDF file

Export Formats:

RDF Transform uses the Apache Jena library to export RDF Formats and uses the RIOT RDFFormat IDs. See Apache Jena RDF Output documentation.

The format for the export is determined by a string match between a format string in the client-side command and the server-side registered exporters. For RDF Transform, the format string is equivalent to the "Jena RDFFormat ID" in the server-side registered exporters.

Pretty Formats:

Jena RDFFormat ID	Extension	Note
`RDFXML_PRETTY`	.rdf	XML Format
`TURTLE_PRETTY`	.ttl
`TRIG_PRETTY`	.trig	Turtle Extension
`JSONLD_PRETTY`	.jsonld
`RDFJSON`	.rj	Simple JSON

Stream Formats:

Jena RDFFormat ID	Extension	Note
`TURTLE_BLOCKS`	.ttl
`TRIG_BLOCKS`	.trig	Turtle Extension
`NTRIPLES`	.nt	UTF-8 Only
`NQUADS`	.nq	UTF-8 Only
`TRIX`	.xml	Simple XML
`RDFNULL`	.rn	Tests the processor
`RDF_PROTO`	.rp	Binary Format
`RDF_THRIFT`	.rt	Binary Format

Commands

The following commands demonstrate the processing for a OpenRefine project to export RDF formats using RDF Transform.

General Variables

endpoint=<your local endpoint such as> "http://127.0.0.1:3333/"
workdir=<directory where you want to save work>

Save RDF Transform Command

Th "save" process uses a supplied JSON template generated from the RDF Transform client dialog to the project on the server side.

A sample save transform command to save a transform to the server:

p="MyProject"
transform="...the saved transform as a JSON string..."
refine_csrf=$(curl -fs "${endpoint}/command/core/get-csrf-token" | cut -d \" -f 4)"
echo "Save transform to server for project ${p}..."
if curl -fs \
  --data project="${projects[$p]}" \
  --data rdf-transform="${transform}" \
  "${endpoint}/command/rdf-transform/save-rdf-transform$(refine_csrf)"
then
  log "Saved transform to project ${p} (${projects[$p]})"
else
  error "Save transform to ${p} (${projects[$p]}) failed!"
fi

Export Command

A sample export command:

NOTE: Adjust the data engine mode for "record-based" vs "row-based" setting as needed.

p="MyProject"
format="TURTLE_PRETTY"
ext="ttl"
echo "Export project ${p} as ${format} format..."
if curl -fs \
  --data project="${projects[$p]}" \
  --data format="${format}" \
  --data engine='{"facets":[],"mode":"row-based"}' \
  "${endpoint}/command/core/export-rows" \
  > "${workdir}/${p}.${ext}"
then
  log "Exported ${p} (${projects[$p]}) to ${workdir}/${p}.${ext}"
else
  error "Export of ${p} (${projects[$p]}) failed!"
fi

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Batch

Batch Processing via Command Line or Alternative Clients

Process

Export Formats:

Pretty Formats:

Stream Formats:

Commands

General Variables

Save RDF Transform Command

Export Command

Home

Features

Install

Lucene

Template

Code

Batch

Clone this wiki locally