-
-
Notifications
You must be signed in to change notification settings - Fork 8
Batch
Open Refine allows for alternative client access through the OpenRefine API. The API is the normal HTTP GET and POST calls to the OpenRefine server that the client-side system uses.
RDF Transform extends the API with its own HTTP GET and POST calls. For the effective batch processing using RDF Transform, a RDF Transform template should already be established and exported from an existing OpenRefine project. The processing proceeds as follows:
- A new project is created with a data file (an OpenRefine process)
- The RDF Transform template is applied to the new project via the Save RDF Transform command
- An export is performed using an RDF Transform export format creating the RDF file
RDF Transform uses the Apache Jena library to export RDF Formats and uses the RIOT RDFFormat IDs. See Apache Jena RDF Output documentation.
The format for the export is determined by a string match between a format
string in the client-side command and the server-side registered exporters. For RDF Transform, the format
string is equivalent to the "Jena RDFFormat ID" in the server-side registered exporters.
Jena RDFFormat ID | Extension | Note |
---|---|---|
RDFXML_PRETTY |
.rdf | XML Format |
TURTLE_PRETTY |
.ttl | |
TRIG_PRETTY |
.trig | Turtle Extension |
JSONLD_PRETTY |
.jsonld | |
RDFJSON |
.rj | Simple JSON |
Jena RDFFormat ID | Extension | Note |
---|---|---|
TURTLE_BLOCKS |
.ttl | |
TRIG_BLOCKS |
.trig | Turtle Extension |
NTRIPLES |
.nt | UTF-8 Only |
NQUADS |
.nq | UTF-8 Only |
TRIX |
.xml | Simple XML |
RDFNULL |
.rn | Tests the processor |
RDF_PROTO |
.rp | Binary Format |
RDF_THRIFT |
.rt | Binary Format |
The following commands demonstrate the processing for a OpenRefine project to export RDF formats using RDF Transform.
endpoint=<your local endpoint such as> "http://127.0.0.1:3333/"
workdir=<directory where you want to save work>
Th "save" process uses a supplied JSON template generated from the RDF Transform client dialog to the project on the server side.
A sample save transform command to save a transform to the server:
p="MyProject"
transform="...the saved transform as a JSON string..."
refine_csrf=$(curl -fs "${endpoint}/command/core/get-csrf-token" | cut -d \" -f 4)"
echo "Save transform to server for project ${p}..."
if curl -fs \
--data project="${projects[$p]}" \
--data rdf-transform="${transform}" \
"${endpoint}/command/rdf-transform/save-rdf-transform$(refine_csrf)"
then
log "Saved transform to project ${p} (${projects[$p]})"
else
error "Save transform to ${p} (${projects[$p]}) failed!"
fi
A sample export command:
NOTE: Adjust the data engine mode for "record-based" vs "row-based" setting as needed.
p="MyProject"
format="TURTLE_PRETTY"
ext="ttl"
echo "Export project ${p} as ${format} format..."
if curl -fs \
--data project="${projects[$p]}" \
--data format="${format}" \
--data engine='{"facets":[],"mode":"row-based"}' \
"${endpoint}/command/core/export-rows" \
> "${workdir}/${p}.${ext}"
then
log "Exported ${p} (${projects[$p]}) to ${workdir}/${p}.${ext}"
else
error "Export of ${p} (${projects[$p]}) failed!"
fi